DM-13293: Make BF kernel measurement code fully stack compliant #4

mfisherlevine · 2018-07-17T16:38:44Z

No description provided.

r-owen

This is a start on my review. I made it about halfway through the code.

r-owen · 2018-07-26T21:45:15Z

python/lsst/cp/pipe/bfCoeffCalc.py

+
+    See http://ls.st/ldm-151 Chapter 4, Calibration Products Production for further details
+    regarding the inputs and outputs.
+    """


Please add proper task documentation.

I also suggest removing this reference as it does not appear to contain any useful information (unless I missed something).

Rewritten. But given that it's quite long and from scratch I imagine there will be comments on it.

r-owen · 2018-07-26T21:47:16Z

python/lsst/cp/pipe/bfCoeffCalc.py

+            self.refList += refList
+
+
+class BfTask(pipeBase.CmdLineTask):


I feel this name is too vague in that it could easily be a task that applies b-f correction (which it is not). Consider something along the lines of {{MeasureBrighterFatterTask}} or {{MakeBrighterFatterKernelTask}}.

r-owen · 2018-07-26T21:49:28Z

python/lsst/cp/pipe/bfCoeffCalc.py

+
+    RunnerClass = BfTaskRunner
+    ConfigClass = BfTaskConfig
+    _DefaultName = "bf"


Please rename all of these as per my suggestions for the task name above.

r-owen · 2018-07-26T22:02:57Z

python/lsst/cp/pipe/bfCoeffCalc.py

+        parser = pipeBase.ArgumentParser(name=cls._DefaultName)
+        parser.add_argument("--visitPairs", help="The list of visit pairs to use, as a list of tuples "
+                            "enclosed in quotes e.g. \"(123,456),(789,987),(654,321)\""
+                            "NB: must be comma-separated-tuples with no spaces, enclosed in quotes!")


I know you spent some time on this, so I hate to mention it, but please consider changing this to the usual format for multiple arguments, e.g.:

add_argument("--visitPairs", nargs="*", help="Visit pairs to use. Each pair must be of the form INT,INT e.g. 123,456")

This has several advantages:

It is the standard convention for arguments that take multiple values.

There is no need for the user to quote the data; just leave one or more spaces between each pair, e.g.
--visitPairs 123.456 789,987 654,321

It is easier to provide many pairs, as one can break the list between any pair using \

It is easier to parse the result from argparse, which will be a list of strings, e.g. ["123,456", 789,987", "654,321"]

Here is the resulting help, which shows why I suggested spelling INT in all caps:

-h, --help show this help message and exit --visitPairs [VISITPAIRS [VISITPAIRS ...]] Visit pairs to use. Each pair must be of the form INT,INT, e.g. 123,456

Oh. My. God. THIS! 🤯 This is the answer I was looking for all along! How did nobody else mention that I was reinventing the wheel rather than shouting at me about my lack of regex knowledge! 😄

Thank you, this is indeed the elegant (and pre-existing solution) I needed. nargs="*" for the win! 🎉

It looks to me like I still need a (now very slimmed down) taskRunner to pass through the argument though, is that right? It now just looks like

@staticmethod def getTargetList(parsedCmd, **kwargs): """Parse the visit list and pass through explicitly.""" visitPairs = [(int(visit.split(",")[0]), int(visit.split(",")[1])) for visit in parsedCmd.visitPairs] return pipeBase.TaskRunner.getTargetList(parsedCmd, visitPairs=visitPairs, **kwargs)

I tried passing them through un-parsed with **kwargs and doing the parsing later so that the whole TaskRunner could be done away with but couldn't get it working. Could you confirm that you think this is the best that can be done please?

I think having a custom task runner is very reasonable because of the need to pass extra arguments to runDataRef; that is a classic use case for a custom task runner. Fortunately it can be simple, as you say.

As to where to parse the visit pairs (converting them from a string "123,456" to a pair of integers (123, 456)): clearly it should be done before calling the task, which I think means it has to be done as part of parsing the arguments or in the task runner. Doing it while parsing the command is best, as that catches any problems as soon as possible and allows you to print the usual "I could not parse this command, here is the command help" message. The argument parser does this for other arguments using a callback specified by the action argument to add_argument. See class IdValueAction in argumentParser.py for an example of a callback, and how it is used. There are a number of similar classes defined nearby.

OK, glad I'm not doing it too wrong then. It seems like this one liner is very simple, and much more understandable (to me at least) than using callbacks etc, so as long as you don't think this needs moving elsewhere I'll just leave it as-is.

r-owen · 2018-07-26T22:21:52Z

python/lsst/cp/pipe/bfCoeffCalc.py

+        return parser
+
+    @pipeBase.timeMethod
+    def run(self, dataRef, visitPairs):


I suggest refactoring run as per RFC-352: split run method into two pieces: runDataRef which does all I/O including unpacking the data and saving the result, and another method which does most of the work but does not do any I/O.

Typically that other method would be named run but in this case I suspect that's not reasonable, since you surely don't want to load all the visits into memory at once! I suggest you should have runDataRef call a method that processes one pair of visits. That method should be called something more descriptive and less misleading than run because it is not a method that "does all the work except I/O".

This will require one trivial tweak to the task runner to call runDataRef instead of run, but since this task has its own task runner that should be no problem.

Note that @czwa is doing this on DM-2639 for most of our existing tasks, in preparation for the move to SuperTask. So now is a good time to do it.

Can confirm. This method should be renamed to runDataRef if you want to this to work as a CmdlineTask after DM-2639 gets merged. (Remember that all "dataRef" means in runDataRef is Gen2 butler objects vs Gen3 butler objects.)

For now, as discuss OOB, the method has just been renamed runDataRef() and all is well.

r-owen · 2018-07-26T23:27:59Z

python/lsst/cp/pipe/bfCoeffCalc.py

+            gains = dataRef.get('bfGain')
+            if not gains:
+                self.log.fatal('Failed to retrieved gains for CCD %s'%ccdNum)
+                raise RuntimeError("Must either calculate or supply gains for %s"%ccdNum)


The runtime error message is misleading because in this case the gains were not calculated. Also I think this log message and the exception message should match. I suggest you set a variable to the message and then use it for both the log message and the exception. That violates my suggestion above to defer formatting of log messages, but in this case we have to format this message anyway and fatal log messages are always shown.

I actually don't know why I was logging a FATAL as well as raising the RuntimeError tbh. I moved the message from the log into the RuntimeError so that it's accurate, and removed the logger message. I think that should cover this.

I agree. No point logging and then raising.

r-owen · 2018-07-26T23:29:21Z

python/lsst/cp/pipe/bfCoeffCalc.py

+                self.log.fatal('Failed to retrieved gains for CCD %s'%ccdNum)
+                raise RuntimeError("Must either calculate or supply gains for %s"%ccdNum)
+            self.log.info('Retrieved stored gain for CCD %s'%ccdNum)
+            self.log.debug('Gains for CCD %s: %s'%(ccdNum, gains))


Please delete this debug message. It is a duplicate of the message on the next line and the one below is better because it provides the information regardless of whether the gains were computed or retrieved.

r-owen · 2018-07-26T23:32:04Z

python/lsst/cp/pipe/bfCoeffCalc.py

+            exp2 = self.isr.runDataRef(dataRef).exposure
+            del dataRef.dataId['visit']
+
+            self.log.info('Preparring images for cross correlation calculation for CCD %s'%ccdNum)


"Preparring" -> "Preparing"

I suggest hyphenating "cross-correlation" here and everywhere in the code.

r-owen · 2018-07-26T23:49:12Z

python/lsst/cp/pipe/bfCoeffCalc.py

+        dtype=str, default="CCD",
+        allowed={
+            "AMP": "Every amplifier treated separately",
+            "CCD": "One kernel per CCD",


I suggest "DETECTOR" unless this code is specific to CCDs. Detector is the standard LSST term. At least I hope it is. @TallJimbo may have other opinions since he (or somebody) is using "sensor" in the butler gen3 schema. But we have lsst.afw.cameraGeom.Detector so I'm crossing my fingers.

Haha, I hope this too! It was me commenting that on RFC-484, and who made those changes to obs_lsstCam, so this is quite amusing that I went wrong here 😛

So yes, totally agree, and only 68 instance to change in this file 🤪

This might, however, lead to some weirdness/slight inconsistency, as the dataIds will still contain CCD as most obs_packages haven't been changed yet. I'll change it where I can, and we'll see where we end up is probably best. I'll let you know when it's time to rereview.

OK, this has now been done. I think all the places that ccd remains are more or less appropriate. It's slightly ironic that the convention we've chose to allow for dynamic keying of this is ccdKey rather than detectorKey, but changing this here and not elsewhere is both asking for trouble and well beyond the scope I think. If there are any instances you still object to do shout though.

r-owen · 2018-07-27T00:09:47Z

python/lsst/cp/pipe/bfCoeffCalc.py

+                except:
+                    frameId = 'Im1 diff Im2'
+
+            # depending on level this is either one pass or n_amps


This is cryptic and duplicates the comment on the next line. and, to some extent, the comment at the top of the loop
I suggest just one comment that is clearer, or one here and one at the top of the loop. I think you are trying to say something like this (though I may have it wrong and you certainly should reword it as you see fit):

# Compute cross-correlation and means at the appropriate config.level: # - "DETECTOR": compare the two visits # - "AMP": compare each amplifier of one visit to ... (what? each amplifier of the other visit? the same amplifier of the other visit?)

Yes, this was just a garbage comment, thanks. It now reads:

# Compute the cross-correlation and means at the appropriate config.level: # - "DETECTOR": one key, so compare the two visits to each other # - "AMP": n_amp keys, comparing each amplifier of one visit to the same amplifier in other visit```

r-owen

A few more comments. More to come.

r-owen · 2018-07-27T18:38:12Z

bin/processBf.py

@@ -0,0 +1,26 @@
+#!/usr/bin/env python


This task should be in bin.src not bin

Also I feel the name is too vague, as per my comments about the task name. I hope a more descriptive name for the task will lend itself to a similar name for the bin.src script.

r-owen · 2018-07-27T18:39:10Z

python/lsst/cp/pipe/bfCoeffCalc.py

@@ -0,0 +1,1272 @@
+#


Please rename this file to (at least approximately) match the task name.

r-owen · 2018-07-27T18:40:56Z

python/lsst/cp/pipe/bfCoeffCalc.py

+from scipy import stats
+import numpy as np
+import matplotlib.pyplot as plt
+# following line is not actually unused, it is required for 3d projection


Consider "the following import is required for 3d projection"

r-owen · 2018-07-27T18:44:04Z

python/lsst/cp/pipe/bfCoeffCalc.py

+            - assemble raw amplifier images into an exposure with image, variance and mask planes
+            - perform bias subtraction, flat fielding, etc.
+            - mask known bad pixels
+            - provide a preliminary WCS


The "or load a post-ISR image" is a bit confusing. Consider expanding this to "or load a post-ISR image (depending on the camera and configuration)".

I'm worried that the rest of this string duplicates too much information available elsewhere, and thus will be difficult to keep correct over time. Also the level of detail may fool readers into thinking that is all the ISR task does. Consider dropping it or dramatically shortening it (perhaps with a reference elsewhere).

Oh, you're quite right, I think this was ~copied from the main task and I forgot to change it, and indeed the bit about "loading an image" is just a lie. Totally rewritten.

r-owen · 2018-07-27T18:47:32Z

python/lsst/cp/pipe/bfCoeffCalc.py

+    )
+    maxIterSOR = pexConfig.Field(
+        dtype=int,
+        doc="The maximum number of iterations allowed for the successive over-relaxation method",


Please say more about the "successive over-relaxation method" and what it is used for in this code somewhere. Probably not here, but how about in the doc string for the task? I used a global search to find the _SOR method but it does not provide this information, and since it's a private method, its doc string is probably not the best place for it.

Please expand "SOR" here and in other config field names and the method name _SOR. Our naming conventions require spelling things out unless the acronym is really common. I don't think SOR qualifies.

Failing that, our naming conventions require this to be renamed maxIterSor, which I hope is unpleasant enough to encourage a longer name.

Yes, agreed on all fronts. Have changed all instance of SOR to SuccessiveOverRelaxation, and changed the private method from _SOR() to public successiveOverRelax() and put the description in its docstring. If you feel this should be moved elsewhere in the codebase let me know where and I can put it there, though I feel that there is likely to be pushback if it goes in one of the appropriate places as it's a numerical method with potentially quite a few iterations, implemented in python (and probably not even very efficiently at that), so would prefer to keep this hidden away in here for now if possible.

r-owen · 2018-07-27T18:57:38Z

python/lsst/cp/pipe/bfCoeffCalc.py

+    )
+    nSigmaClipKernelGen = pexConfig.Field(
+        dtype=float,
+        doc="Number of sigma to clip to during pixel-wise clipping when generating the kernel",


Please say more about what measurement this is the sigma of.

I've changed the _generateKernel() to be public (for a few reasons) and have said to see its docstring for more info. I hope that's sufficient.

r-owen · 2018-07-27T19:00:20Z

python/lsst/cp/pipe/bfCoeffCalc.py

+    )
+    maxLag = pexConfig.Field(
+        dtype=int,
+        doc="The maximum lag to use when calculating the cross correlation/kernel",


Please document the units.

My apologies if this is obvious. I don't know what is being cross-correlated to what (other than presumably it's some kind of correlation between the two visits in each pair). That's the sort of thing I'd like to see documented in the task doc string.

Yes, sorry, it was obvious to me, but perhaps not those unfamiliar. It's correlating two images in detector-space, so the answer is just pixels. Have added, and will tweak the docstring too.

r-owen · 2018-07-27T19:03:29Z

python/lsst/cp/pipe/bfCoeffCalc.py

+    )
+    nPixBorderXCorr = pexConfig.Field(
+        dtype=int,
+        doc="The number of border pixels to exclude when calculating the cross correlation/kernel",


Could "/" sensibly be replaced with "and" or a space? Again, my ignorance showing.

No no, quite right, "and" makes more sense.

r-owen · 2018-07-27T19:06:53Z

python/lsst/cp/pipe/bfCoeffCalc.py

+    biasCorr = pexConfig.Field(
+        dtype=float,
+        doc="A scary, empirically determined correction-factor correcting for sigma-clipping" +
+        " a non-Gaussian distribution",


Please remove the hyphen between "correction" and "factor". Also I suggest you remove "scary" (sorry to be a spoilsport).

If possible please explain more about this parameter. For instance:

How is it used?

How would a user determine if the value is too large or too small?

What sort of values are "reasonable". For example if I want to make it a bit bigger or a bit smaller, I'd like some idea of what "a bit larger" or "a bit smaller" would look like. Also if there is an intrinsic limit on sane values, that could be helpful.

If any of this is too complicated then it should, perhaps, be put elsewhere, e.g. in the main task documentation, with a pointer here.

I've done the first bits (the phrasing etc), but the explanation is a bit tricky. Essentially this is something that probably shouldn't be touched except by experts, but I made it a config option because having a hard-coded number in the code filled me with horror, especially as I can imagine this being different in some cases. How it's calculated should indeed be added, and probably methods for doing so added to this module, and therefore those referenced here, but given that hasn't been done yet, I'm going to have to punt on this for now.

I think what I'll do is ticket adding the code to do so to this module, and add the ticket number to this docstring, saying that it should be updated once DM-XXXXX is done. Does that sound like a reasonable solution?

r-owen · 2018-07-27T19:18:04Z

python/lsst/cp/pipe/bfCoeffCalc.py

+        }
+    )
+
+    def validateIsrConfig(self, logger):


It worries me that a method whose name starts with validate is not run by the validate method. I can see two solutions. The simplest is to rename this method and/or move it to the task (and, presumably, call it from the task constructor, which is where I assume it is being called now). Another solution is to split it into two parts: a non-logging part and a logging part:

The non-logging part (which acts like validate) should either keep the current name and be called from validate or else be renamed to validate, as you see fit.

The logging part should get a name that does not start with validate (since it logs but does not validate). I assume the task constructor will call that. I have no strong opinion about whether the non-logging part lives in the config or the task; I can see arguments both ways.

I wish there was a better way to figure out if the post-ISR data has been appropriately processed -- one that did not rely on internal knowledge of the possible ISR tasks. But I don't know of anything yet. Gen3 butler should support this better and meanwhile this seems like a sensible and pragmatic approach.

OK, glad this wasn't too offensive. I think that discussing with someone before they thought that this belonged in the config class rather than the task, but I can see it either way. I've renamed the method to checkIsrConfig() to avoid the v-word, and moved it to the main task. My reading of what you wrote is that this is sufficient, but let me know if you want any further changes there.

If there are parts that can usefully validate the config without logging then they should probably be moved to validate (or a method called by validate in the config. But otherwise your solution seems best.

Yeah, I'm torn. It's not really validation is the thing; they're not intrinsically invalid, they're just ensuring that people aren't Doing It Wrong™️, but they're not testing for a non-compliant pexConfig, so I think it should probably stay as-is really.

I'd say if it raises an exception then it's validation. We do this sort of thing in other places -- check that something was processed with required config settings before being allowed to proceed -- and we call it validation there.

However, I am not saying you have to split this into xConfig.validate (with no logging) and the rest of it. If it feels like a coherent bit of code to you then it's probably best to leave as a single method.

OK, I've just renamed the method to validateIsrConfig() config then, and left it as it was (after the changes I made before of course, so it still lives in the main class).

r-owen

Partial re-review. The new documentation is a big i mprovement. I'm not sure what the issue is with the conditional debug code.

r-owen · 2018-08-02T21:12:07Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+
+    - The configurable isr task is called, which unpersists and assembles the
+      raw images, and performs the selected instrument signature removal tasks.
+      The appropriacy of the selected isr components is checked.


"appropriacy" is an great word (one new to me), but the definitions I found don't work here. Perhaps it would be more helpful to say that certain ISR steps must be performed. If they can be briefly explained then do it here. Then point to the config fields that specify the checks.

OK, will fix. But for reference, the Google definition I get is:

appropriacy əˈprəʊprɪəsi/ noun noun: appropriacy; plural noun: appropriacies the extent to which something is suitable or proper in the circumstances. "the appropriacy of the methodology employed"

which is the way in which I intended it, and seems to fit well, to my ears at least 🙂

I'm not sure what you mean by "the config fields that specify the checks" though, as there are none - the isr config is always checked. I guess you mean "the config fields that are specified by the checks", i.e. which fields are checked, right? If so, I'll mention the most important and most-likely-misconfigured ones, as the list is quite long, and the code can define the rest (they live in three very self-explanatory lists after all).

By "the config fields that specify the checks" I meant the three task config fields that specify which ISR config fields are to be checked. I would avoid listing any specific ISR config fields here as information could easily go out of date. (That said, one or two really obvious ones is fine if you think an example will help -- but even then I'd be more inclined to provide that information in the documentation of the 3 config fields that actually specify which fields to check).

The first definition I found of appropriacy was specific to linguistics, but I looked again and agree it can also be used as a synonym for "appropriateness". Nonetheless, I think a better word here would be "compatibility", "validity", "correctness" or something like that. To me "appropriateness" implies style checking, not validity checking.

Ah, I see, the confusion (as you point out below) was you thought that these hard-coded lists were actually config options, as they should be. I've changed the phrasing already, and will make these config options now.

I'm a little confused about what you mean by "I would avoid listing any specific ISR config fields here" though - don't I have to list them in order to use them? Are you suggesting that the defaults should be empty, bar the really obvious and essential ones, and then allow users to configure in each obs_package? This is somewhat against what I was trying to do, in protecting people from accidentally applying "corrections" which are on by default unless overridden in the obs_package, which would invalidate the results. Maybe we can have a quick chat on Slack about what you'd like to see here. If so, let me know when's good (typing is fine).

All I meant was I would avoid listing them in documentation. Being a bit vague in the documentation reduces the danger that the docs will get out of synch with the code. That said, some examples (preferably in the main task's doc string and/or the help for the fields in question) is helpful. For instance if you know that the ISR task must assemble images you can say that, rather than list a specific flag that makes it happen.

r-owen · 2018-08-02T21:12:48Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+      The appropriacy of the selected isr components is checked.
+
+    - The gain of the each amplifier in the detector is calculated using
+      the PTC method and used to correct the images so that all calculations


Please spell out "PTC"

r-owen · 2018-08-02T21:13:36Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+    - Each image is then cross-correlated with the one it's paired with
+      (with the pairing defined by the --visitPairs command line argument),
+      which is done either the whole-image to whole-image,
+      or amplifier-by-amplifier, depending on the config.level.


I suggest removing "the" before "config.level", but overall this is excellent -- very clear and helpful

r-owen · 2018-08-02T21:14:54Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+      the visitPair-length stack or cross-correlations.
+      The number of sigma clipped to is set by config.nSigmaClipKernelGen.
+      The maximum lag used, in pixels, and hence the size of the half-size
+      of the kernel generated, is given by config.maxLag,


I suggest removing "in pixels" here, since it's documented in the config

I dunno... I think that it reads as it should, and might save people searching around if they're confused. I think it adds clarity, sounds right, and as it will never change having it defined in multiple places is no problem. Unless you feel strongly that it should go I'd rather leave that there. However, as this is defining the shape of the output of the whole task I've moved it up in this section, and I think the whole thing reads a lot better now. Fro convenience, this section now reads:

- Once the cross-correlations have been calculated for each visit pair, these are used to generate the correction kernel. The maximum lag used, in pixels, and hence the size of the half-size of the kernel generated, is given by config.maxLag, i.e. a value of 10 will result in a kernel of size 2n-1 = 19x19 pixels. Outlier values in these cross-correlations are rejected by using a pixel-wise sigma-clipped thresholding to each cross-correlation in the visitPair-length stack of cross-correlations. The number of sigma clipped to is set by config.nSigmaClipKernelGen.

Let me know if that's OK, or if you want changes (or the mention of "in pixels" removed).

I think it's best to use your judgement here.

r-owen · 2018-08-02T21:16:29Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+    - Once DM-15277 has been completed, a method will exist to calculate the
+      empirical correction factor, config.biasCorr.
+      TODO: DM-15277 update this part of the docstring once the ticket is done.
+    """


This new task documentation is great!

r-owen · 2018-08-02T21:23:28Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+                means[det_object].append([_means1[det_object], _means2[det_object]])
+
+            # this is awkward now that code is refactored
+            # not sure how best to make this work without passing everything around


I'm not seeing the issue; can you please explain more (here in github is fine, no need to modify the code)

The solution here is going to be pull out all the commented code and ticket enhancing the debug plot functionality I think. But to explain though:

the issue is that to generate the correct titles for the debug plots you need the objects, but you don't want to be passing everything around just to be able to query things that aren't really used. However, a massive block of code to generate the titles doesn't really belong in the main routine, and even less so as it mostly won't even fire.

I don't think there's a nice solution to this, and we'll probably see this code reappear in this place, though hopefully a little more cleanly/concisely, if/when that ticket gets done, but for now I think just removing it is fine.

r-owen · 2018-08-02T21:26:28Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+        Each amp has borders applied, is rescaled by the gain, and has the sigma-clipped mean subtracted.
+        If the level is 'DETECTOR' then this is done for the whole image in-place so that it can be
+        cross-correlated.
+        If the level is 'AMP' then this is done per-amplifier, and each amp-image returned.


Please consider rewriting in active voice (the usual Python standard for doc strings) e.g. "For each amp apply borders (what does that mean?) rescale by gain, and subtract sigma-clipped mean"...

r-owen · 2018-08-02T21:28:09Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+        forced to go through the origin or not. This defaults to True, fitting var=1/gain * mean,
+        if set to False then var=1/g * mean + const is fitted.
+
+        This is really a PTC gain measurement task. See DM-14063 for results from of a comparison between


Please spell out PTC

r-owen · 2018-08-02T21:41:58Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+            diff.writeFits(os.path.join(self.debug.debugDataPath, filename))
+
+        # Subtract background.  It should be a constant, but it isn't always
+        # xxx do we want some logic here for whether to subtract or not? Or a config option?


Any chance of figuring this out before merging? If not, please file a ticket and reference it here.

I have made it log warnings the background is found to be above a configurable level, so people in the future will know if it's subtracting something large. I think this should do it.

r-owen

I have to run, but here are some comments. I found a few new things, but nothing big. It's looking very nice.

r-owen · 2018-08-08T17:16:17Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+#
+
+"""Calculation of brighter-fatter effect correlations and kernels."""
+from __future__ import print_function


Please remove this and all other from __future__ import... and from builtins import... statements from all python files (unless any are needed by Python 3, which seems unlikely).

r-owen · 2018-08-08T17:17:17Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+# You should have received a copy of the LSST License Statement and
+# the GNU General Public License along with this program.  If not,
+# see <https://www.lsstcorp.org/LegalNotices/>.
+#


Please add __all__ = [...] to this and all other python library files (before the first import statement)

👍 to the __all__, but other modules put this after the imports, and indeed I get an E402 error for every module level import that follows this, as they have to be first according to that rule. Is there a reason you ask for this to be first?

PEP8 changed at some point. It used to require that some imports went before __all__ and then changed (I think a few years ago). Now __all__ should go before imports but after the module level doc string (if there is one).

Here is the current text:
https://www.python.org/dev/peps/pep-0008/#id24 or look up PEP8 "Module Level Dunder Names". I am shocked that your linter is complaining. Mine accepts __all__ in the right location and has for years. Perhaps you put it above the module level doc string?

r-owen · 2018-08-08T17:20:50Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+    )
+    doCalcGains = pexConfig.Field(
+        dtype=bool,
+        doc="Measure the per-amplifier gains using the PTC method",


Please add a question mark.

Also please consider removing "using the PTC method" or putting it in parenthesis. The important point is "should this code measure amplifier gains?" How they are measured is arguably an internal detail (though one that may be important enough to document).

Please spell out PTC if you retain it in the help string.

r-owen · 2018-08-08T17:22:36Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+    )
+    ccdKey = pexConfig.Field(
+        dtype=str,
+        doc="The key by which to pull a detector from a dataId, e.g. \'ccd\'' or \'detector\'",


Nice! However, the backslashes are unnecessary, since you are using single quotes inside a double-quoted string. Also there is an extra single quote after ccd.

r-owen · 2018-08-08T17:23:19Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+    )
+    nSigmaClipGainCalc = pexConfig.Field(
+        dtype=int,
+        doc="Number of sigma to clip to during gain calculation",


Sigma of what? Please say.

r-owen · 2018-08-08T18:33:31Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+                    afwDisp.setDefaultBackend(self.debug.displayBackend)
+                    afwDisp.Display.delAllDisplays()
+                    self.disp1 = afwDisp.Display(0, open=True)
+                    self.disp2 = afwDisp.Display(1, open=True)


Thanks for including the keyword

r-owen · 2018-08-08T18:36:02Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+        """Augment argument parser for the MakeBrighterFatterKernelTask."""
+        parser = pipeBase.ArgumentParser(name=cls._DefaultName)
+        parser.add_argument("--visitPairs", nargs="*",
+                            help="Visit pairs to use. Each pair must be of the form INT,INT e.g. 123,456")


Sorry I didn't catch this earlier, but all our existing command line options are lowercase with hyphens between words so please do that for the visit pairs argument. Use the dest keyword to specify the name for the parsed data, e.g.: add_argument("--visit-pairs", dest="visitPairs", nargs="*", ...). This change will require a one or two tweaks to existing task documentation.

r-owen · 2018-08-08T18:42:28Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+                     'doBrighterFatter', 'doUseOpticsTransmission', 'doUseFilterTransmission',
+                     'doUseSensorTransmission', 'doUseAtmosphereTransmission', 'doGuider', 'doStrayLight',
+                     'doTweakFlat']  # raise if True
+        desirableTrue = ['doBias', 'doDark', 'doCrosstalk', 'doDefect', 'doLinearize']  # WARN if False


I suggest making these three fields config parameters, so an obs package can plug in the appropriate variant ISR task and tweak the tests accordingly. (I thought they already were config fields, which explains an earlier confusion -- I had suggested that a help string point to the config fields which specify these tests).

Unfortunately the field name should probably include "isr", e.g. "isrMandatorySteps", "isrForbiddenSteps" and "isrDesirableSteps" -- with a possible future "isrUndesirableSteps".

If you make this change then I also strongly suggest failing if a specified ISR config field doesn't exist (e.g. by testing and raising, or omitting the test if the resulting error message is clear enough). Otherwise you risk failing to test something important due to a spelling error. In fact that is another argument for making the tests more visible by making them part of the config.

OK, I've moved all this to configs as you said. For the checking, what I've done is to remove the protecting blocks that were in there for the forbidden/mandatory catagories so that you get an error if they're not found, I think that should suffice. This made sense too as they're now overridable per-obs_package, so no reason to be forgiving if they're not there.

If you could just take a look and check that the way I've gone about this is OK with you I think that would be good, thanks.

I just looked at the code. The new config parameters look fine (though I'm disappointed that one still needs to be hard-coded -- assembleCcd.doTrim; if you used dotted names for subtasks you could avoid that). The code I found using these is the validateIsrConfig method and it still logs a warning if a config parameter in one of these lists is missing -- rather than failing. So I'm not sure why you said "you get an error if they are not found". I'd prefer an error, but I think what you have will do.

I think this is the last point remaining, but I've been struggling to understand here. What do you mean by "if you used dotted names for subtasks"? I tried to work it out but failed I'm afraid. Do you think you could give me a tiny example? I think regardless of whether it really needs changing I would learn something here.

Regarding the behaviour; mandatory/forbidden now raise, but desirable WARNs if wrong polarity, and logs an INFOs if the key is not found. I like this behaviour because it suggests to the user that they should configure the equivalent keys in the obs_package config, but this could easily be changed to a raise if not found if you think that's better (though I think it has to WARN if wrong, otherwise it's not "desirable", it's mandatory.) Does that sound good?

I'm totally happy to change this until you're happy though, so if you can confirm what you'd like to see instead I'll go ahead and make the changes (especially if I can move the hard-coded isr.assembleCcd.doTrim into the mandatory config).

I just mean that the name in the config could be "assembleCcd.doTrim". Parsing that is a bit more work, as getattr and setattr cannot handle multiple levels in one call. However, a simple loop like the following would suffice (this is only lightly tested):

def getDottedAttr(obj, name): """Get an attribute that may be several levels deep, e.g. foo.bar.baz """ if not name: raise ValueError("Name is blank") try: nameList = name.split(".") value = getattr(obj, nameList[0]) for subName in nameList[1:]: value = getattr(value, subName) return value except Exception as e: raise AttributeError("Could not find {} in {}: {}".format(name, obj, e))

And just to make it clear: I suggest using this function for all values you are trying to get, not just ones you know are dotted. The loop is not run at all if nameList has only one element, so it'll be quite efficient.

r-owen · 2018-08-08T18:54:52Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+            self.log.info('Beginning gain estimation for detector %s' % detNum)
+            gains, nomGains = self.estimateGains(dataRef, visitPairs)
+            dataRef.put(gains, datasetType='brighterFatterGain')
+            self.log.info('Finished gain estimation for detector %s' % detNum)


Consider making this debug level. I worry about users complaining about excessive verbosity. In which case the "beginning" log message may want to be "Compute gain estimation for detector %s"

r-owen · 2018-08-08T18:59:02Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+                objId = 'detector %s AMP %s' % (detNum, det_object)
+            kernels[det_object] = self.generateKernel(xcorrs[det_object], means[det_object], objId)
+        # writing the level might be useful, but might also mess things up xxx
+        # kernels['level'] = self.config.level


Please don't leave in commented-out code (at least not without a ticket #). It is an interesting issue. It seems like a great idea to include the info, but it's weird to mix data with metadata.

I think either of these would be a cleaner solution but I don't know if it's worth the work:

Make kernels an object that has an explicit field for level.

If you might want to generate kernels at amp level and also kernels at detector level for a given camera, then make a separate dataset type for each.

I think both of those are good suggestions, and I especially like the latter. However, I will just remove for now, and if/when we decide we want that I can make a ticket and new datasets etc, as we're not really losing any code here.

r-owen

A few minor corrections. The main ones I'm concerned about:

Please delete all commented-out and otherwise unused code or write tickets for it and reference them in the comments.
I worry that there are too many "info" level log messages. I encourage you to take another look to see if any should be lowered to "debug" level.
There are pep8 violations (including at least one bare "except", one line that is too long, and lots of missing spaces around "+" and "-".
When you are ready to merge this I suggest you squash all the commits together.

r-owen · 2018-08-08T20:35:16Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+        If the level is 'DETECTOR' then this is done
+        for the whole image in-place so that it can be cross-correlated.
+        If the level is 'AMP' then this is done per-amplifier,
+        and each amp-image returned.


I think the description of the DETECTOR case is wrong: the code never modifies exp and always returns either an exposure or a collection of amplifier exposures.

I am relieved, as I think this code should either always operate "in place" or treat the input exposure as read only, rather than do one for DETECTOR and the other for AMP.

100% agree. I don't know why it said this. I think it's because these docs were adapted from the original code a little too blindly, this was never the case, and certainly code should never work like that!

r-owen · 2018-08-08T20:36:29Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+
+        Notes:
+        ------
+        This function is controlled by the following pexConfig parameters:


A helpful note, but I suggest pexConfig -> config (and look for this elsewhere)

r-owen · 2018-08-08T20:37:20Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+        nSigmaClipXCorr : `float`
+            The number of sigma to be clipped to
+        """
+        # TODO: see if you can do away with temp and rescaleTemp


I suggest you delete this comment. An alternative is to file a ticket, include the ticket number in the comment and consider moving the comment closer to where one of these variables shows up. Still...making a copy does seem reasonable. I'll be impressed if you find you can do without it.

r-owen · 2018-08-08T20:46:01Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+        return pipeBase.Struct(exitStatus=0)
+
+    def _prepareImage(self, exp, gains, level):
+        """Prepare images for cross-correlation calculation.


I suggest you use the term "exposure", "image" or "maskedImage" as appropriate -- both in method names and in documentation. In this case it would be "exposure". Thus _prepareExposure. Also, the method name sounds like it operates in place, but I'm not sure what else to suggest. Perhaps _makeCroppedExposures (even though it should be singular if level is DETECTOR)?

Yeah, I agree, and I'm not too worried about the singular return length with that method name. I've gone for _makeCroppedExposures().

r-owen · 2018-08-08T20:46:55Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+
+        Notes:
+        ------
+        This function is controlled by the following pexConfig parameters:


Another pexConfig -> config

r-owen · 2018-08-08T21:51:29Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+
+    @staticmethod
+    def _getNameOfSet(vals):
+        """Convert a list of numbers to a string, merging consecutive values."""


Please provide more documentation so the user has a clear idea what output to expect for a given input.

Have removed all this code as it was actually unused (it only features in making titles for debug plots), and have just dumped it in DM-15305 for now.

r-owen · 2018-08-08T21:53:43Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+            This is essentially is a sanity check parameter.
+            If this condition is violated there is something unexpected
+            going on in the image, and it is discarded from the stack before
+            the clipped-mean is calculated.


Please document the default value (which is config.xcorrCheckRejectLevel)

Still not documented?

Not sure, but I think you just need to scroll down. I see:
rejectLevel : `float`, optional This is essentially is a sanity check parameter. If this condition is violated there is something unexpected going on in the image, and it is discarded from the stack before the clipped-mean is calculated. If not provided, a default value of 0.2 is picked up from the task config.

I suggest: If not provided then config.xcorrCheckRejectLevel is used. It is risky to document the default value of a config parameter anywhere (except in the config itself, where it is set) because it may change, and of course obs packages may override it.

r-owen · 2018-08-08T21:56:00Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+            if corr[0, 0] > 0:
+                self.log.warn('Skipped item %s due to unexpected value of (variance-mean)' % corrNum)
+                continue
+            corr /= -float(1.0*(mean1**2+mean2**2))


You don't need the float in Python 3. Also please add spaces around the +

What should happen if mean1 == mean2 == 0?

I think that if the means are identically zero then things are sufficiently bad that a division by zero is OK actually.

r-owen · 2018-08-08T22:00:03Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+        resid = np.zeros([source.shape[0]+2, source.shape[1]+2])
+        rhoSpe = np.cos(np.pi/source.shape[0])  # Here a square grid is assummed
+
+        inError = 0


I suggest deleting this assignment as it is not being used

r-owen · 2018-08-08T22:20:29Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+    return xcorr, np.sum(means1)
+
+
+def simulateXcorr(fluxLevels, imageShape, addCorrelations=False, correlationStrength=0.1, nSigma=5, border=3,


I don't see any use of this in the package. Can you get rid of it, along with _crossCorrelateSimulate above? If you have unresolved issues to work out then at least file a ticket and add a TODO message to the doc string for both of these functions and consider moving it to a separate module.

OK, this has been tidied up significantly. The code must live on, and I think this is the right place for it, but it certainly needs a little more work, as there are some known problems. I've written a new function that unifies the generating images part and allows switching which code is called to do the calculations, and filed DM-15756 to finish up that bit of work, which will then remove the vestigial function here I think. If you have any comments on how you think that should go (including if you disagree that it should be in this module) do please leave them on the ticket, but I hope this is OK for now.

That sounds fine to me.

r-owen · 2018-08-08T22:30:03Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+        except:
+            pass
+
+        xcorr /= float(mean)


You don't need the float in Python 3

r-owen

I found a few pep8 violations; please run flake8 before merging and also look for badly formatted lines.

r-owen · 2018-08-08T23:00:48Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+                    frameId1 = exp1.getMetadata().get("FRAMEID")
+                    frameId2 = exp2.getMetadata().get("FRAMEID")
+                    frameId = '_diff_'.join(frameId1, frameId2)
+                except:


Please don't use bare "except" and please run flake8

r-owen · 2018-08-08T23:02:03Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+            # calculate the cross-correlation
+            for xlag in range(maxLag + 1):
+                for ylag in range(maxLag + 1):
+                    dim_xy = diffAmpIm[border+xlag:border+xlag + w, border+ylag: border+ylag + h, afwImage.LOCAL].clone()


This line is too long and the formatting is confusing and not compliant with our standards (e.g. spaces around "+"). I suggest:

dim_xy = diffAmpIm[border + xlag: border + xlag + w, border + ylag: border + ylag + h, afwImage.LOCAL].clone()

r-owen · 2018-08-08T23:04:09Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+            ampName = amp.getName()
+
+            diffAmpIm = diff[amp.getBBox()].clone()
+            diffAmpImCrop = diffAmpIm[border:-border-maxLag, border:-border-maxLag, afwImage.LOCAL]


Please add spaces after ":" and around "-" (or before, in the case of negating border). I suggest a global regex search for + and - without surrounding spaces

Done! (I think)

r-owen

I don't see the changes I requested. I also found one more worth considering: using scipy.signal.convolve2d instead of python loops.

r-owen · 2018-09-17T19:52:05Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+    @staticmethod
+    def getTargetList(parsedCmd, **kwargs):
+        """Parse the visit list and pass through explicitly."""
+        visitPairs = [(int(visit.split(",")[0]), int(visit.split(",")[1])) for visit in parsedCmd.visitPairs]


Please add better error handling, as per the comment above.

r-owen · 2018-09-17T19:56:56Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+            outError = 0
+            if nIter%2 == 0:
+                for i in range(1, func.shape[0] - 1, 2):
+                    for j in range(1, func.shape[0] - 1, 2):


Is func.shape guaranteed to be square? If not, I would expect the index to be different for i and j (e.g. [0] in one case and [1] in the other (e.g. as above for "Calculate the initial error"). Even if it is guaranteed to be square now, I suggest changing this to be less surprising to users and in case rectangular support is added in the future.

Fair point. I think I will add an assert though as well, because a non-square kernel sounds like madness to me, in this case at the very least.

r-owen · 2018-09-17T20:14:21Z

python/lsst/cp/pipe/makeBrighterFatterKernel.py

+                for i in range(2, func.shape[0] - 1, 2):
+                    for j in range(2, func.shape[0] - 1, 2):
+                        resid[i, j] = float(func[i, j - 1] + func[i, j + 1] + func[i - 1, j] +
+                                            func[i + 1, j] - 4.0*func[i, j] - dx*dx*source[i - 1, j - 1])


Both here and elsewhere in this method you would probably get better speed and possibly code clarity by using scipy.signal.convolve to apply one kernel to func and one to source: resid = func convolved with [0, 1, 0] [1, 4, 1] [0, 1, 0]] - source convolved with [[0, 0, 0] [0, 0, 0] [dx**2, 0, 0]]. It's a 1-liner (modulo defining the two kernels, which are always the same and can be created once and re-used).

As discussed OOB I'm going to leave this as-is for now because despite being both ugly and slow, it will only save milliseconds of compute to change it, and testing that the behaviour is identical will take a lot longer.

mfisherlevine force-pushed the tickets/DM-13293 branch 4 times, most recently from a4cdb9f to 6fa42ab Compare July 25, 2018 17:09

r-owen reviewed Jul 27, 2018

View reviewed changes

mfisherlevine force-pushed the tickets/DM-13293 branch 4 times, most recently from 9c03ef8 to 757ac41 Compare August 2, 2018 19:13

r-owen reviewed Aug 2, 2018

View reviewed changes

mfisherlevine force-pushed the tickets/DM-13293 branch 2 times, most recently from d95079f to abee516 Compare August 7, 2018 15:05

r-owen changed the title ~~First cut at the task~~ DM-13293: Make BF kernel measurement code fully stack compliant Aug 7, 2018

r-owen reviewed Aug 8, 2018

View reviewed changes

r-owen approved these changes Aug 8, 2018

View reviewed changes

r-owen reviewed Aug 8, 2018

View reviewed changes

mfisherlevine force-pushed the tickets/DM-13293 branch 3 times, most recently from a4c4cfc to 306fc26 Compare August 10, 2018 16:16

r-owen reviewed Sep 17, 2018

View reviewed changes

mfisherlevine force-pushed the tickets/DM-13293 branch from 66b1751 to ed4d884 Compare September 18, 2018 11:11

Add a task for calculating brighter-fatter correction kernels

2715e95

mfisherlevine force-pushed the tickets/DM-13293 branch from 88a56c2 to 2715e95 Compare October 11, 2018 17:38

mfisherlevine merged commit 2715e95 into master Oct 11, 2018

		return xcorr, np.sum(means1)


		def simulateXcorr(fluxLevels, imageShape, addCorrelations=False, correlationStrength=0.1, nSigma=5, border=3,

DM-13293: Make BF kernel measurement code fully stack compliant #4

DM-13293: Make BF kernel measurement code fully stack compliant #4

Conversation

mfisherlevine commented Jul 17, 2018

r-owen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r-owen Jul 26, 2018 • edited

Choose a reason for hiding this comment

mfisherlevine Aug 2, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r-owen Aug 2, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r-owen Jul 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfisherlevine Aug 2, 2018 • edited

Choose a reason for hiding this comment

r-owen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfisherlevine Aug 2, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r-owen Jul 27, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r-owen Aug 2, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r-owen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r-owen Aug 7, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r-owen Jul 26, 2018 •

edited

mfisherlevine Aug 2, 2018 •

edited

r-owen Aug 2, 2018 •

edited

r-owen Jul 26, 2018 •

edited

mfisherlevine Aug 2, 2018 •

edited

mfisherlevine Aug 2, 2018 •

edited

r-owen Jul 27, 2018 •

edited

r-owen Aug 2, 2018 •

edited

r-owen Aug 7, 2018 •

edited