Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38163: Update PTC to avoid potential failures #174

Merged
merged 6 commits into from Mar 7, 2023
Merged

Conversation

czwa
Copy link
Contributor

@czwa czwa commented Mar 1, 2023

PTC improvements:

  • Ensure all outputRefs have a matching outputCovariance (this is a problem blocking DM-37383).
  • Try to get read noise from the exposure header.
  • Get ampNames for the solve task from a non-dummy input, and anticipate associated bug fix on the PTC dataset in ip_isr.
  • Update tests to pass full set of taskMetadata to mimic real butler behavior.

@czwa czwa requested a review from plazas March 1, 2023 00:10
# and a pair of references at that index.
for expRef, expId in expRefs:
# This yields an exposure ref and an exposureId.
exposure = expRef.get()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just get the metadata component rather than the full exposure? Or is there something else we need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply didn't think of this at the time. Switched to get just the metadata.

Copy link
Contributor

@plazas plazas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes. It looks good in general, but I would just like to understand them a bit better their motivation (and probably add a bit more explanation in the code) before accepting.

butlerQC.put(outputs, outputRefs)

def _guaranteeOutputs(self, inputDims, outputs, outputRefs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. Why do we need this? I thought that cpPtcExtract was already introducing dummy exposures in order to match the inputs: https://github.com/lsst/cp_pipe/blob/main/python/lsst/cp/pipe/ptc/cpExtractPtcTask.py#L311
    or are you building a pipeline that will only use the classes in cpPtcSolve without running cpPtcExtract first?
  2. It's not that clear to me how the function is implementing the matching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue arises when running in the cp_testing environment. There is a check during the ISR stage to restrict the inputs to flat exposures only, returning pipeBase.NoWorkFound for the other exposures. This is fine in other stages, such as flat combine, as whatever exposures are passed as input are used to construct a single output. However, cpPtcExtract is expecting to write an output for all inputs to the pipeline, so it ends up in a situation where (for my test case) len(inputRefs.inputExp) = 124 but len(outputRefs.outputCovariances) = 229. The existing code can supply a dummy dataset for each input, but that leaves 105 expected outputs that are not generated. This new function iterates over the full set of outputRefs.outputCovariances, constructing the missing outputs and inserting them into the expected output location.
As for the matching: the function iterates over all of the outputs, and if that output's exposureId is found in the inputDims list, then it needs to extract that entry from the existing outputs (which are currently indexed by the ordering of exposureId in inputDims) and insert it into the location expected in the list of outputRefs. If the expected output is not in the existing output list, then the input exposure was ignored during ISR, and so a dummy dataset needs to be inserted into that location.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you end up with len(outputRefs.outputCovariances) = 229 after running cpPtcExtract, if the input number was only 124?

@@ -792,40 +843,35 @@ def getGainFromFlatPair(self, im1Area, im2Area, imStatsCtrl, mu1, mu2,

return gain

def getReadNoiseFromMetadata(self, taskMetadata, ampName):
def getReadNoise(self, exposure, taskMetadata, ampName):
"""Gets readout noise for an amp from ISR metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update dosctring with explanation if the function is now doing more.

# overscan-subtracted overscan during ISR.
expectedKey = f"RESIDUAL STDEV {ampName}"
# Try from the exposure first.
expectedKey = f"LSST ISR OVERSCAN RESIDUAL SERIAL STDEV {ampName}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoes does this noise from the exposure metadata potentially differ from the noise form the task metadata? And why did you need to add this extra check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be identical. I added this because I was finding that all my read noise values were being set to NaN. The cp_testing pipeline renames the IsrTask stage of the pipeline to cptPtcIsr to make it unique within the pipeline, and that changes the key within the taskMetadata object that stores that information. I originally tried making that key configurable, but decided that that was a work-around to a work-around, so I added the lookup in the exposure metadata instead. As this header key should be present in all ISR processed exposures from now on, this change is also the first step to deprecating and removing the taskMetadata lookup and connection entirely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is also the first step to deprecating and removing the taskMetadata lookup and connection entirely"--> :) :)

@@ -219,8 +219,14 @@ def run(self, inputCovariances, camera=None, detId=0):
means, variances, and exposure times
(`lsst.ip.isr.PhotonTransferCurveDataset`).
"""
# Find the ampNames from a non-dummy ptc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cpPtcExtract, the dummy datasets had the ampNames: https://github.com/lsst/cp_pipe/blob/main/python/lsst/cp/pipe/ptc/cpExtractPtcTask.py#L314
So are you actually building a pipeline that does not have cpPtcExtract as a previous step of cpPTcSolve?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code was relying on all input PTC datasets having the same unique set of ampNames. The bug in ip_isr allowed each instance of the PhotonTransferCurveDataset to update the global list of ampNames for all instances, as it updated the class definition instead of just the instance definition. This meant that if someone was lazy about constructed the additional padding datasets (me), then the Solve task would attempt to iterate over the union of all ampNames found in all datasets. Fixing the bug ensured that the inputs only knew about their own ampNames, so the np.unique no longer was needed to prune the duplicates from the first input, but it did require that the first input have the same set of ampNames as all other inputs. Adding the requirement the ampNames come from a not-DUMMY dataset allows the additional padding datasets to be lazy, and makes the content of the DUMMY datasets unimportant; they only serve to make the middleware accounting correct, and have no science value.

@czwa czwa merged commit 3a5729a into main Mar 7, 2023
@czwa czwa deleted the tickets/DM-38163 branch March 7, 2023 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants