DM-38163: Update PTC to avoid potential failures #174

czwa · 2023-03-01T00:10:22Z

PTC improvements:

Ensure all outputRefs have a matching outputCovariance (this is a problem blocking DM-37383).
Try to get read noise from the exposure header.
Get ampNames for the solve task from a non-dummy input, and anticipate associated bug fix on the PTC dataset in ip_isr.
Update tests to pass full set of taskMetadata to mimic real butler behavior.

erykoff · 2023-03-01T01:04:23Z

python/lsst/cp/pipe/ptc/cpExtractPtcTask.py

+            # and a pair of references at that index.
+            for expRef, expId in expRefs:
+                # This yields an exposure ref and an exposureId.
+                exposure = expRef.get()


Can't we just get the metadata component rather than the full exposure? Or is there something else we need?

I simply didn't think of this at the time. Switched to get just the metadata.

plazas

Thanks for these changes. It looks good in general, but I would just like to understand them a bit better their motivation (and probably add a bit more explanation in the code) before accepting.

plazas · 2023-03-01T01:50:46Z

python/lsst/cp/pipe/ptc/cpExtractPtcTask.py

        butlerQC.put(outputs, outputRefs)

+    def _guaranteeOutputs(self, inputDims, outputs, outputRefs):


Two questions:

Why do we need this? I thought that cpPtcExtract was already introducing dummy exposures in order to match the inputs: https://github.com/lsst/cp_pipe/blob/main/python/lsst/cp/pipe/ptc/cpExtractPtcTask.py#L311
or are you building a pipeline that will only use the classes in cpPtcSolve without running cpPtcExtract first?

It's not that clear to me how the function is implementing the matching.

The issue arises when running in the cp_testing environment. There is a check during the ISR stage to restrict the inputs to flat exposures only, returning pipeBase.NoWorkFound for the other exposures. This is fine in other stages, such as flat combine, as whatever exposures are passed as input are used to construct a single output. However, cpPtcExtract is expecting to write an output for all inputs to the pipeline, so it ends up in a situation where (for my test case) len(inputRefs.inputExp) = 124 but len(outputRefs.outputCovariances) = 229. The existing code can supply a dummy dataset for each input, but that leaves 105 expected outputs that are not generated. This new function iterates over the full set of outputRefs.outputCovariances, constructing the missing outputs and inserting them into the expected output location.
As for the matching: the function iterates over all of the outputs, and if that output's exposureId is found in the inputDims list, then it needs to extract that entry from the existing outputs (which are currently indexed by the ordering of exposureId in inputDims) and insert it into the location expected in the list of outputRefs. If the expected output is not in the existing output list, then the input exposure was ignored during ISR, and so a dummy dataset needs to be inserted into that location.

How do you end up with len(outputRefs.outputCovariances) = 229 after running cpPtcExtract, if the input number was only 124?

plazas · 2023-03-01T01:51:42Z

python/lsst/cp/pipe/ptc/cpExtractPtcTask.py

@@ -792,40 +843,35 @@ def getGainFromFlatPair(self, im1Area, im2Area, imStatsCtrl, mu1, mu2,

        return gain

-    def getReadNoiseFromMetadata(self, taskMetadata, ampName):
+    def getReadNoise(self, exposure, taskMetadata, ampName):
        """Gets readout noise for an amp from ISR metadata.


Update dosctring with explanation if the function is now doing more.

plazas · 2023-03-01T01:52:34Z

python/lsst/cp/pipe/ptc/cpExtractPtcTask.py

-        # overscan-subtracted overscan during ISR.
-        expectedKey = f"RESIDUAL STDEV {ampName}"
+        # Try from the exposure first.
+        expectedKey = f"LSST ISR OVERSCAN RESIDUAL SERIAL STDEV {ampName}"


Hoes does this noise from the exposure metadata potentially differ from the noise form the task metadata? And why did you need to add this extra check?

They should be identical. I added this because I was finding that all my read noise values were being set to NaN. The cp_testing pipeline renames the IsrTask stage of the pipeline to cptPtcIsr to make it unique within the pipeline, and that changes the key within the taskMetadata object that stores that information. I originally tried making that key configurable, but decided that that was a work-around to a work-around, so I added the lookup in the exposure metadata instead. As this header key should be present in all ISR processed exposures from now on, this change is also the first step to deprecating and removing the taskMetadata lookup and connection entirely.

"is also the first step to deprecating and removing the taskMetadata lookup and connection entirely"--> :) :)

plazas · 2023-03-01T01:54:07Z

python/lsst/cp/pipe/ptc/cpSolvePtcTask.py

@@ -219,8 +219,14 @@ def run(self, inputCovariances, camera=None, detId=0):
                means, variances, and exposure times
                (`lsst.ip.isr.PhotonTransferCurveDataset`).
        """
+        # Find the ampNames from a non-dummy ptc.


In cpPtcExtract, the dummy datasets had the ampNames: https://github.com/lsst/cp_pipe/blob/main/python/lsst/cp/pipe/ptc/cpExtractPtcTask.py#L314
So are you actually building a pipeline that does not have cpPtcExtract as a previous step of cpPTcSolve?

The original code was relying on all input PTC datasets having the same unique set of ampNames. The bug in ip_isr allowed each instance of the PhotonTransferCurveDataset to update the global list of ampNames for all instances, as it updated the class definition instead of just the instance definition. This meant that if someone was lazy about constructed the additional padding datasets (me), then the Solve task would attempt to iterate over the union of all ampNames found in all datasets. Fixing the bug ensured that the inputs only knew about their own ampNames, so the np.unique no longer was needed to prune the duplicates from the first input, but it did require that the first input have the same set of ampNames as all other inputs. Adding the requirement the ampNames come from a not-DUMMY dataset allows the additional padding datasets to be lazy, and makes the content of the DUMMY datasets unimportant; they only serve to make the middleware accounting correct, and have no science value.

czwa added 5 commits February 28, 2023 15:21

Ensure that all outputRefs have an associated output data product.

a20f65c

Get amplifier names from a non-dummy input PTC.

d95fb30

Get read noise from exposure header if possible.

63a9d8d

Fix test to match expected butler results.

d81ce0b

Fix missing call.

6ff0598

czwa requested a review from plazas March 1, 2023 00:10

erykoff reviewed Mar 1, 2023

View reviewed changes

plazas requested changes Mar 1, 2023

View reviewed changes

plazas approved these changes Mar 3, 2023

View reviewed changes

czwa force-pushed the tickets/DM-38163 branch from eaee346 to 234ec56 Compare March 6, 2023 22:02

Get only the metadata if only the metadata is needed. Docstring update.

4aff330

czwa force-pushed the tickets/DM-38163 branch from 234ec56 to 4aff330 Compare March 6, 2023 22:36

czwa merged commit 3a5729a into main Mar 7, 2023

czwa deleted the tickets/DM-38163 branch March 7, 2023 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-38163: Update PTC to avoid potential failures #174

DM-38163: Update PTC to avoid potential failures #174

czwa commented Mar 1, 2023

erykoff Mar 1, 2023

czwa Mar 2, 2023

plazas left a comment

plazas Mar 1, 2023

czwa Mar 2, 2023

plazas Mar 3, 2023

plazas Mar 1, 2023

plazas Mar 1, 2023

czwa Mar 2, 2023

plazas Mar 3, 2023

plazas Mar 1, 2023

czwa Mar 2, 2023

		butlerQC.put(outputs, outputRefs)

		def _guaranteeOutputs(self, inputDims, outputs, outputRefs):

DM-38163: Update PTC to avoid potential failures #174

DM-38163: Update PTC to avoid potential failures #174

Conversation

czwa commented Mar 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

plazas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment