New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-9147: global sky subtraction #23
Conversation
@@ -699,6 +713,43 @@ def write(self, butler, exposure, dataId): | |||
self.log.info("Writing %s on %s" % (dataId, NODE)) | |||
butler.put(exposure, self.calibName, dataId) | |||
|
|||
def makeCameraImage(self, butler, dataId, calibs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to leave butler
in the run
method? For example, take camera
as input and the calib_camera as output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... after reading a bit more, I realized bulter is being passed around in multiple places in constructCalibs
... I thought we are discouraged from doing so inside Tasks
?
|
||
|
||
def robustMean(array, rej=3.0): | ||
"""Measure a robust mean of an array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does afw
provide something like this?
|
||
# def putSkyData(self, butler, calibId, bgExp, pistons=None): | ||
# self.addPistonHeaders(bgExp, pistons) | ||
# butler.put(bgExp, "sky", calibId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out codes?
|
||
@staticmethod | ||
def exposureToBackground(bgExp): | ||
"""Convert an exposure to background model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversion between an Exposure
and a background model (exposureToBackground
and backgroundToExposure
) and some background operations (averageBackgrounds
and so on) look rather generic; would they better live somewhere higher up in the package chain? For example afw
or pipe_tasks
or some meas packages?
WARNING: We clobber the calexp in the data repository! This may not | ||
be desirable, but nor do we want to introduce multiple datasets that | ||
the user has to select down the road. The user should write to a | ||
different rerun or output data repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true the user can write to a different output repo, but I would rather introducing another butler dataset type, and let the user select which one to use later. Overwriting data files make it very difficult for provenance, workflow, to name a few. Maybe butler+supertask will come up with some brilliant solutions later, but before that happens I do not like the idea of sharing dataset type names.
@@ -515,7 +552,7 @@ def process(self, cache, ccdId, outputName="postISRCCD"): | |||
else: | |||
self.log.info( | |||
"Using previously persisted processed exposure for %s" % (sensorRef.dataId,)) | |||
exposure = sensorRef.get(outputName, immediate=True) | |||
exposure = sensorRef.get(outputName, immediate=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC-286 gave me the impression that there were no real use cases of immediate=False
; seems we may have one now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, and it may not actually work. We don't have this enabled with pybind11 now. We deferred the decision on which types to proxy to DM-9563.
15d38e0
to
1a9c0ab
Compare
e035a77
to
a7da064
Compare
"MEDIAN": "median",}) | ||
clip = Field(doc="Clipping threshold for background", dtype=float, default=3.0) | ||
nIter = Field(doc="Clipping iterations for background", dtype=int, default=3) | ||
mask = ListField(doc="Mask planes to reject", dtype=str, default=["SAT", "DETECTED", "BAD", "NO_DATA",]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume "EDGE" is not relevant here because it is for the full focal plane. But what about "DETECTED_NEGATIVE"?
assert all(len(bg) == 1 for bg in bgList), "Mixed bgList: %s" % ([len(bg) for bg in bgList],) | ||
images = [bg[0][0].getStatsImage() for bg in bgList] | ||
boxes = [bg[0][0].getImageBBox() for bg in bgList] | ||
assert len(set((box.getMinX(), box.getMinY(), box.getMaxX(), box.getMaxY()) for box in boxes)) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should an overlap also work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add a message (e.g. "bounding boxes not all equal") instead of having a user parse this on failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having some overlap would work so long as it's built in consistently.
maskVal = afwImage.Mask.getPlaneBitMask("BAD") | ||
for img in images: | ||
bad = numpy.isnan(img.getImage().getArray()) | ||
img.getMask().getArray()[bad] = maskVal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add an option to skip this for efficiency? Maybe this has already been done on input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The background images are small, so it's not a big hit on efficiency, and I don't think it hurts to be thorough.
|
||
stats = afwMath.StatisticsControl() | ||
stats.setAndMask(maskVal) | ||
stats.setNanSafe(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed if you masked? Or equivalently, do you need to mask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not needed (maybe cargo-culted from somewhere), but doesn't hurt.
array = combined.getImage().getArray() | ||
bad = numpy.isnan(array) | ||
mean = robustMean(array[~bad], self.config.skyRej) | ||
array[bad] = mean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe in case of a gradient? Wouldn't an interpolation be better?
for x in range(width): | ||
if numpy.any(isBad[:, x]) and numpy.any(isGood[:, x]): | ||
array[:, x][isBad[:, x]] = interpolate1D(method, yIndices[isGood[:, x]], | ||
array[:, x][isGood[:, x]], yIndices[isBad[:, x]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if all were bad in one of these cases?
@@ -228,9 +231,41 @@ def getCcdIdListFromExposures(expRefList, level="sensor", ccdKeys=["ccd"]): | |||
ccdLists[name] = [] | |||
ccdLists[name].append(ccdId) | |||
|
|||
for ccd in ccdLists: | |||
ccdLists[ccd] = sorted(ccdLists[ccd], key=lambda dd: dictToTuple(dd, sorted(dd.keys()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each item is a dict and you are first sorting each item by key, then lexicographically the list by value? Comment might be useful here.
@@ -515,7 +552,7 @@ def process(self, cache, ccdId, outputName="postISRCCD"): | |||
else: | |||
self.log.info( | |||
"Using previously persisted processed exposure for %s" % (sensorRef.dataId,)) | |||
exposure = sensorRef.get(outputName, immediate=True) | |||
exposure = sensorRef.get(outputName, immediate=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, and it may not actually work. We don't have this enabled with pybind11 now. We deferred the decision on which types to proxy to DM-9563.
fullOutputId = {k: ccdName[i] for i, k in enumerate(self.config.ccdKeys)} | ||
fullOutputId.update(outputId) | ||
self.addMissingKeys(fullOutputId, butler) | ||
fullOutputId.update(outputId) # must be after the call to queryMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it?
# Set detected/bad pixels to background to ensure they don't corrupt the background | ||
maskVal = image.getMask().getPlaneBitMask(self.config.mask) | ||
isBad = image.getMask().getArray() & maskVal > 0 | ||
bgLevel = np.median(image.getImage().getArray()[~isBad]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming there are no gradients. And why median here and (clipped) mean in other cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a regular image (as opposed to a background model) that we know has a lot of junk (like stars and galaxies) in it, so the median is a bit more robust. It shouldn't matter, because those pixels are masked, but I'm being careful.
num = result.getValue(afwMath.NPOINT) | ||
if not numpy.isfinite(mean) or not numpy.isfinite(num): | ||
continue | ||
warped.set(xx, yy, mean*num) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to this in the previous comment. Let's say num << npixels
then you get a significantly smaller value here. I thought that instead you wanted to pretend that all pixels in the box had the mean value. But apparently that is not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretending that num
pixels have the same value.
Pixels from a different CCD may overlap this superpixel, so those are going to get included as well, weighted by the number of pixels.
Since we are iterating over exposures and CCDs, it's important that the order be consistent. Otherwise, we might accidentally mix CCDs from different exposures.
Why, oh why would you put a line break there???
This will make it easier to subclass later. * Split process pool into two parts, so subclass can access the one they want. * Have CalibTask.run return result, so the subclass can use it. * Move the bulk of scatterProcess into a free function that can be used by anything that wants a matrix of results. * Allow passing of additional arguments to processSingle. * Allow the butler to use the read proxy when processing, since subclasses may not use the result. * Move code to get fully-qualified outputId into its own method. This makes it easier to override the 'combine' method.
We stitch the resultant calib together to form a single (binned) image with all the CCDs in it. This allows the user to quickly view the result.
There's no good data there, so only trouble can result from including them.
A sky frame is the dominant response of the camera to the sky. It is constructed by first subtracting a large-scale focal plane background model (e.g., 1024x1024) from all frames to remove large-scale gradients, binning pixels in each CCD (e.g., 64x64) and then averaging those.
We combine a large-scale background model and a scaled sky frame and write a background model consisting of these (and un-doing the original) that can be used downstream to correct the calexp. This does a decent job over most of the field of view of HSC.
a309d19
to
be221da
Compare
No description provided.