New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-38163: Update PTC to avoid potential failures #174
Conversation
# and a pair of references at that index. | ||
for expRef, expId in expRefs: | ||
# This yields an exposure ref and an exposureId. | ||
exposure = expRef.get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just get the metadata component rather than the full exposure? Or is there something else we need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply didn't think of this at the time. Switched to get just the metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these changes. It looks good in general, but I would just like to understand them a bit better their motivation (and probably add a bit more explanation in the code) before accepting.
butlerQC.put(outputs, outputRefs) | ||
|
||
def _guaranteeOutputs(self, inputDims, outputs, outputRefs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- Why do we need this? I thought that
cpPtcExtract
was already introducing dummy exposures in order to match the inputs: https://github.com/lsst/cp_pipe/blob/main/python/lsst/cp/pipe/ptc/cpExtractPtcTask.py#L311
or are you building a pipeline that will only use the classes incpPtcSolve
without runningcpPtcExtract
first? - It's not that clear to me how the function is implementing the matching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue arises when running in the cp_testing
environment. There is a check during the ISR stage to restrict the inputs to flat exposures only, returning pipeBase.NoWorkFound
for the other exposures. This is fine in other stages, such as flat combine, as whatever exposures are passed as input are used to construct a single output. However, cpPtcExtract
is expecting to write an output for all inputs to the pipeline, so it ends up in a situation where (for my test case) len(inputRefs.inputExp) = 124
but len(outputRefs.outputCovariances) = 229
. The existing code can supply a dummy dataset for each input, but that leaves 105 expected outputs that are not generated. This new function iterates over the full set of outputRefs.outputCovariances
, constructing the missing outputs and inserting them into the expected output location.
As for the matching: the function iterates over all of the outputs, and if that output's exposureId
is found in the inputDims
list, then it needs to extract that entry from the existing outputs (which are currently indexed by the ordering of exposureId
in inputDims
) and insert it into the location expected in the list of outputRefs
. If the expected output is not in the existing output list, then the input exposure was ignored during ISR, and so a dummy dataset needs to be inserted into that location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you end up with len(outputRefs.outputCovariances) = 229
after running cpPtcExtract
, if the input number was only 124
?
@@ -792,40 +843,35 @@ def getGainFromFlatPair(self, im1Area, im2Area, imStatsCtrl, mu1, mu2, | |||
|
|||
return gain | |||
|
|||
def getReadNoiseFromMetadata(self, taskMetadata, ampName): | |||
def getReadNoise(self, exposure, taskMetadata, ampName): | |||
"""Gets readout noise for an amp from ISR metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update dosctring with explanation if the function is now doing more.
# overscan-subtracted overscan during ISR. | ||
expectedKey = f"RESIDUAL STDEV {ampName}" | ||
# Try from the exposure first. | ||
expectedKey = f"LSST ISR OVERSCAN RESIDUAL SERIAL STDEV {ampName}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hoes does this noise from the exposure metadata potentially differ from the noise form the task metadata? And why did you need to add this extra check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should be identical. I added this because I was finding that all my read noise values were being set to NaN. The cp_testing
pipeline renames the IsrTask
stage of the pipeline to cptPtcIsr
to make it unique within the pipeline, and that changes the key within the taskMetadata
object that stores that information. I originally tried making that key configurable, but decided that that was a work-around to a work-around, so I added the lookup in the exposure metadata instead. As this header key should be present in all ISR processed exposures from now on, this change is also the first step to deprecating and removing the taskMetadata
lookup and connection entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"is also the first step to deprecating and removing the taskMetadata lookup and connection entirely"--> :) :)
@@ -219,8 +219,14 @@ def run(self, inputCovariances, camera=None, detId=0): | |||
means, variances, and exposure times | |||
(`lsst.ip.isr.PhotonTransferCurveDataset`). | |||
""" | |||
# Find the ampNames from a non-dummy ptc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cpPtcExtract
, the dummy datasets had the ampNames: https://github.com/lsst/cp_pipe/blob/main/python/lsst/cp/pipe/ptc/cpExtractPtcTask.py#L314
So are you actually building a pipeline that does not have cpPtcExtract
as a previous step of cpPTcSolve
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code was relying on all input PTC datasets having the same unique set of ampNames
. The bug in ip_isr
allowed each instance of the PhotonTransferCurveDataset
to update the global list of ampNames
for all instances, as it updated the class definition instead of just the instance definition. This meant that if someone was lazy about constructed the additional padding datasets (me), then the Solve
task would attempt to iterate over the union of all ampNames
found in all datasets. Fixing the bug ensured that the inputs only knew about their own ampNames
, so the np.unique
no longer was needed to prune the duplicates from the first input, but it did require that the first input have the same set of ampNames
as all other inputs. Adding the requirement the ampNames
come from a not-DUMMY dataset allows the additional padding datasets to be lazy, and makes the content of the DUMMY datasets unimportant; they only serve to make the middleware accounting correct, and have no science value.
PTC improvements:
ip_isr
.