Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-9147: global sky subtraction #23

Merged
merged 7 commits into from Nov 15, 2017
Merged

DM-9147: global sky subtraction #23

merged 7 commits into from Nov 15, 2017

Conversation

PaulPrice
Copy link
Contributor

No description provided.

@@ -699,6 +713,43 @@ def write(self, butler, exposure, dataId):
self.log.info("Writing %s on %s" % (dataId, NODE))
butler.put(exposure, self.calibName, dataId)

def makeCameraImage(self, butler, dataId, calibs):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to leave butler in the run method? For example, take camera as input and the calib_camera as output?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... after reading a bit more, I realized bulter is being passed around in multiple places in constructCalibs... I thought we are discouraged from doing so inside Tasks?



def robustMean(array, rej=3.0):
"""Measure a robust mean of an array
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does afw provide something like this?


# def putSkyData(self, butler, calibId, bgExp, pistons=None):
# self.addPistonHeaders(bgExp, pistons)
# butler.put(bgExp, "sky", calibId)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out codes?


@staticmethod
def exposureToBackground(bgExp):
"""Convert an exposure to background model
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion between an Exposure and a background model (exposureToBackground and backgroundToExposure) and some background operations (averageBackgrounds and so on) look rather generic; would they better live somewhere higher up in the package chain? For example afw or pipe_tasks or some meas packages?

WARNING: We clobber the calexp in the data repository! This may not
be desirable, but nor do we want to introduce multiple datasets that
the user has to select down the road. The user should write to a
different rerun or output data repository.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true the user can write to a different output repo, but I would rather introducing another butler dataset type, and let the user select which one to use later. Overwriting data files make it very difficult for provenance, workflow, to name a few. Maybe butler+supertask will come up with some brilliant solutions later, but before that happens I do not like the idea of sharing dataset type names.

@@ -515,7 +552,7 @@ def process(self, cache, ccdId, outputName="postISRCCD"):
else:
self.log.info(
"Using previously persisted processed exposure for %s" % (sensorRef.dataId,))
exposure = sensorRef.get(outputName, immediate=True)
exposure = sensorRef.get(outputName, immediate=False)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC-286 gave me the impression that there were no real use cases of immediate=False; seems we may have one now?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, and it may not actually work. We don't have this enabled with pybind11 now. We deferred the decision on which types to proxy to DM-9563.

"MEDIAN": "median",})
clip = Field(doc="Clipping threshold for background", dtype=float, default=3.0)
nIter = Field(doc="Clipping iterations for background", dtype=int, default=3)
mask = ListField(doc="Mask planes to reject", dtype=str, default=["SAT", "DETECTED", "BAD", "NO_DATA",])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume "EDGE" is not relevant here because it is for the full focal plane. But what about "DETECTED_NEGATIVE"?

assert all(len(bg) == 1 for bg in bgList), "Mixed bgList: %s" % ([len(bg) for bg in bgList],)
images = [bg[0][0].getStatsImage() for bg in bgList]
boxes = [bg[0][0].getImageBBox() for bg in bgList]
assert len(set((box.getMinX(), box.getMinY(), box.getMaxX(), box.getMaxY()) for box in boxes)) == 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should an overlap also work?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a message (e.g. "bounding boxes not all equal") instead of having a user parse this on failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having some overlap would work so long as it's built in consistently.

maskVal = afwImage.Mask.getPlaneBitMask("BAD")
for img in images:
bad = numpy.isnan(img.getImage().getArray())
img.getMask().getArray()[bad] = maskVal
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add an option to skip this for efficiency? Maybe this has already been done on input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The background images are small, so it's not a big hit on efficiency, and I don't think it hurts to be thorough.


stats = afwMath.StatisticsControl()
stats.setAndMask(maskVal)
stats.setNanSafe(True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed if you masked? Or equivalently, do you need to mask?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not needed (maybe cargo-culted from somewhere), but doesn't hurt.

array = combined.getImage().getArray()
bad = numpy.isnan(array)
mean = robustMean(array[~bad], self.config.skyRej)
array[bad] = mean
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe in case of a gradient? Wouldn't an interpolation be better?

for x in range(width):
if numpy.any(isBad[:, x]) and numpy.any(isGood[:, x]):
array[:, x][isBad[:, x]] = interpolate1D(method, yIndices[isGood[:, x]],
array[:, x][isGood[:, x]], yIndices[isBad[:, x]])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if all were bad in one of these cases?

@@ -228,9 +231,41 @@ def getCcdIdListFromExposures(expRefList, level="sensor", ccdKeys=["ccd"]):
ccdLists[name] = []
ccdLists[name].append(ccdId)

for ccd in ccdLists:
ccdLists[ccd] = sorted(ccdLists[ccd], key=lambda dd: dictToTuple(dd, sorted(dd.keys())))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each item is a dict and you are first sorting each item by key, then lexicographically the list by value? Comment might be useful here.

@@ -515,7 +552,7 @@ def process(self, cache, ccdId, outputName="postISRCCD"):
else:
self.log.info(
"Using previously persisted processed exposure for %s" % (sensorRef.dataId,))
exposure = sensorRef.get(outputName, immediate=True)
exposure = sensorRef.get(outputName, immediate=False)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, and it may not actually work. We don't have this enabled with pybind11 now. We deferred the decision on which types to proxy to DM-9563.

fullOutputId = {k: ccdName[i] for i, k in enumerate(self.config.ccdKeys)}
fullOutputId.update(outputId)
self.addMissingKeys(fullOutputId, butler)
fullOutputId.update(outputId) # must be after the call to queryMetadata
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it?

# Set detected/bad pixels to background to ensure they don't corrupt the background
maskVal = image.getMask().getPlaneBitMask(self.config.mask)
isBad = image.getMask().getArray() & maskVal > 0
bgLevel = np.median(image.getImage().getArray()[~isBad])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming there are no gradients. And why median here and (clipped) mean in other cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a regular image (as opposed to a background model) that we know has a lot of junk (like stars and galaxies) in it, so the median is a bit more robust. It shouldn't matter, because those pixels are masked, but I'm being careful.

num = result.getValue(afwMath.NPOINT)
if not numpy.isfinite(mean) or not numpy.isfinite(num):
continue
warped.set(xx, yy, mean*num)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to this in the previous comment. Let's say num << npixels then you get a significantly smaller value here. I thought that instead you wanted to pretend that all pixels in the box had the mean value. But apparently that is not the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretending that num pixels have the same value.
Pixels from a different CCD may overlap this superpixel, so those are going to get included as well, weighted by the number of pixels.

Since we are iterating over exposures and CCDs, it's important that the
order be consistent. Otherwise, we might accidentally mix CCDs from
different exposures.
Why, oh why would you put a line break there???
This will make it easier to subclass later.
* Split process pool into two parts, so subclass can access the one
  they want.
* Have CalibTask.run return result, so the subclass can use it.
* Move the bulk of scatterProcess into a free function that can be used
  by anything that wants a matrix of results.
* Allow passing of additional arguments to processSingle.
* Allow the butler to use the read proxy when processing, since
  subclasses may not use the result.
* Move code to get fully-qualified outputId into its own method. This
  makes it easier to override the 'combine' method.
We stitch the resultant calib together to form a single (binned) image
with all the CCDs in it. This allows the user to quickly view the result.
There's no good data there, so only trouble can result from including them.
A sky frame is the dominant response of the camera to the sky. It is
constructed by first subtracting a large-scale focal plane background model
(e.g., 1024x1024) from all frames to remove large-scale gradients, binning
pixels in each CCD (e.g., 64x64) and then averaging those.
We combine a large-scale background model and a scaled sky frame and
write a background model consisting of these (and un-doing the original)
that can be used downstream to correct the calexp.  This does a decent job
over most of the field of view of HSC.
@PaulPrice PaulPrice merged commit be221da into master Nov 15, 2017
@ktlim ktlim deleted the tickets/DM-9147 branch August 25, 2018 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants