Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-37357: Update masking in parallel overscan #248

Merged
merged 10 commits into from Jan 31, 2023
Merged

DM-37357: Update masking in parallel overscan #248

merged 10 commits into from Jan 31, 2023

Conversation

czwa
Copy link
Contributor

@czwa czwa commented Jan 19, 2023

This update switches from a fractional-based masking that turns the parallel overscan correction on and off to a fixed threshold based mask that attempts to patch the fully masked columns from unmasked columns. The mask generated from the threshold is applied across all amplifiers, to ensure crosstalk ghosts of the original bright bleed are also excluded from the calculations.

The existing code was too aggressive, and ignored gradients in the
data.  I've removed that simple threshold, and updated the
`collapseArray` function (which was handling masking previously) to
construct a reasonable estimate to fill fully masked columns.
 * Added config options to control parallel overscan masking
 * Moved masking function to OverscanTask
 * Pull method to fill masked pixels out of collapseArray.
 * Add documentation.
@czwa czwa requested a review from erykoff January 19, 2023 23:17
Copy link
Contributor

@erykoff erykoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first pass through the code. I need some clarifications before I can fully understand what's going on. Meanwhile, I'm concerned about the lack of tests here. Can we make a little synth data that exercises the code; or I'm even happy with taking a couple problematic example amps from the test data on the ticket and running on that and checking some basic statistics of the output.

python/lsst/ip/isr/overscan.py Show resolved Hide resolved
parallelOverscanMaskThreshold = pexConfig.Field(
dtype=int,
doc="Threshold above which pixels in the parallel overscan are masked as bleeds.",
default=100000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a config that will be per camera, per detector, or per amp? I know we don't currently support running with different isr configs, but I'm curious about this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably needs to be something that the per-amp overscan config will need to include, but I think it's the same per device (with possible exceptions https://lsstc.slack.com/archives/CBV7K0DK6/p1674082048054619 suggests some amplifiers may allow bleeds at lower flux levels).

parallelOverscanMaskGrowSize = pexConfig.Field(
dtype=int,
doc="Number of pixels masks created from saturated bleeds should be grown"
" while masking the parallel overscan.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This phrasing took me a while to parse. Maybe something like "Masks created from saturated bleeds should be grown parallelOverscaneMaskGrowSize pixels during construction of the parallel overscan mask."

python/lsst/ip/isr/overscan.py Show resolved Hide resolved
residualSigma = (residualSigma, parallelResults.overscanSigmaResidual)
# The serial overscan correction has removed some signal
# from the parallel overscan region, but that is largely a
# constant offset. The collapseArray method now attempts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say "constant offset" but that's constant per what unit of area?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very confused about the reference to collapseArray here because I don't see how it's called in the parallel overscan code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's calculated from the double-overscan corner, subtracting one value per row. All rows should be the same within the read noise, as there's no real signal. For the images I've been testing, the serial overscan median value is ~13000-15000, and subtracting it pulls the parallel overscan region down so that the parallel overscan median value is ~10-20.

The overscan code is far more confusing than I'd like (definitely my fault). The parallel overscan uses the same code path as the serial, just with some extra transposes. correctOverscan calls fitOverscan, which calls either measureConstantOverscan or measureVectorOverscan. Continuing with the vector case, that method calls collapseArray for all except MEDIAN_PER_ROW, which instead calls fitOverscanImage (the C++ code), then the fillMaskedPixels method that is also called by collapseArray. I'm not happy with the code-spaghetti I've made.

# overscan do not create new outlier points. The
# MEDIAN_PER_ROW method does this filling as a separate
# operation, using the same method.
parallelResults = self.correctOverscan(exposure, amp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment there, but correctOverscan doesn't have a docstring.

# Replace pixels that have excessively large stdev values
# with the median of stdev values. A large stdev likely
# indicates a bleed is spilling into the overscan.
axisStdev = np.where(axisStdev > 2.0 * np.median(axisStdev),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this 2.0 come from and should it be configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can take almost any value and do a similar thing. It was added for the case below:

overscan = [[0.0 0.0 0.0 0.0 0.0] # Furthest row from imaging section.
            [0.0 0.0 100000.0 0.0 0.0]
            [0.0 0.0 100000.0 0.0 0.0]] # Closest row to the imaging section.
axisMedian = [0.0 0.0 100000.0 0.0 0.0]
axisStdev = [0.0 0.0 37000.0 0.0 0.0]

With these values, the "bleed" column rejects nothing, as all pixels are within 3 sigma, so some sigma clipping of the STDEV values is required. I chose 2.0 because in all the plots I looked at, the real STDEV values were tightly clustered about the median, so the penalty for over clipping makes little difference to the masking step.

mask slice, and takes the median of those values to fill the
slice. If this isn't possible, the median of all non-masked
values is used. This updates the masked_array to clear the
mask for the pixels filled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand "This updates the masked_array to clear the mask for the pixels filled."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The mask is removed for the pixels filled."

* Fix a mistake that set the doubly-overscan region to the data value.
* Create a more realistic bleed in the overscan region.
* Ensure that both constant, generic vector, and MEDIAN_PER_ROW are tested.
def test_badParallelOverscanCorrection(self):
# Test the output value for the serial and parallel overscans
self.assertEqual(oscanResults.overscanMean[0], 2.0)
self.assertEqual(oscanResults.overscanMean[1], 4.5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are selling the correction short. I looked at the ramp image and the MEDIAN and MEDIAN_PER_ROW corrections, and the MEDIAN_PER_ROW removes the ramp (and drives statAfter[1] to 0.0). (The MEDIAN does not fix this). So I think that there should be an additional test in the MEDIAN_PER_ROW case that checks that the stddev from statAfter either goes to zero or at least is reduced by the parallel overscan correction.

self.assertIsInstance(oscanResults.overscanImage, afwImage.ExposureF)

statAfter = computeImageMedianAndStd(exposureCopy.image[amp.getRawDataBBox()])
self.assertLess(statAfter[0], statBefore[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, with the bleed, the std after correction does not go to zero for the MEDIAN or POLY case but it is significantly reduced (from 2.87 raw/median to 0.7 (median_per_row) or 0.17 (poly). Which makes sense given the bleed interruption and the shape of the ramp. I think that the reduction of the std should be checked here, which shows that the correction is working.


statAfter = computeImageMedianAndStd(exposureCopy.image[amp.getRawDataBBox()])
self.assertLess(statAfter[0], statBefore[0])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, in this case the std goes crazy with the MEDIAN_PER_ROW case, which should be confirmed.

@czwa czwa merged commit 1283521 into main Jan 31, 2023
@czwa czwa deleted the tickets/DM-37357 branch January 31, 2023 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants