DM-38309: Update PhotonTransferCurveDataset to reduce memory and fix index errors. #254

erykoff · 2023-03-20T20:18:34Z

No description provided.

This is a workaround for astropy issue astropy/astropy#4708

plazas · 2023-03-21T16:29:10Z

python/lsst/ip/isr/ptcDataset.py

+        if covSqrtWeights is None:
+            covSqrtWeights = nanMatrix
+
+        # This first one I'm not sure of the type; list of tuples is okay?


A list (or NumPy array, if you changed everything to NumPy arrays) of tuples should be fine, I would think. If so, don't forget to delete this comment.

Oops, that was a vestigial comment, I'll remove it.

plazas · 2023-03-21T16:30:13Z

python/lsst/ip/isr/ptcDataset.py

+        try:
+            expIdsUsed = [(exp1, exp2) for ((exp1, exp2), m) in zip(pairs, mask) if m]
+        except ValueError:
+            warnings.warn("The PTC file was written incorrectly; you should rerun the "


How could the PTC file have been written incorrectly?

I'm trying to decide if more details in this message would be useful for the user that encounters this.

And why would re-running ptcSolve fix it? I'm also thinking of the subset of cpPtc.yaml that only runs isr and cpPtcExtract.

The file was incorrectly written because of the bug here which I fixed lsst/cp_pipe@a95b017

Notice the first line in the original version of that commit, which meant it was appending a list of a list to a list, giving too many dimensions.

plazas · 2023-03-21T16:39:49Z

python/lsst/ip/isr/ptcDataset.py

+                if m:
+                    expIdsUsed.append(pairList[0])
+
+        return expIdsUsed

    def getGoodAmps(self):


Do we need a docstring here too?

plazas · 2023-03-21T16:43:15Z

tests/test_ptcDataset.py

+        dataset.badAmps.append("C:0,1")
+        self.assertTrue(dataset.getGoodAmps() == [amp for amp in self.ampNames if amp != "C:0,1"])
+
+    def test_ptcDataset_pre_dm38309(self):


Do we need an analogous test for the partial PTC datasets created by cpPtcExtract?

Those are fine, in that the partial datasets didn't have the extra dimension. But it's also not something that is a common use-case anyway as those are not really user-facing.

plazas · 2023-03-21T16:47:41Z

python/lsst/ip/isr/ptcDataset.py

        Dictionary keyed by amp names containing the masked average of the
        means of the exposures in each flat pair. If needed, each array
        will be right-padded with np.nan to match the length of
        rawExpTimes.
-    photoCharge : `dict`, [`str`, `list`]
+    photoCharges : `dict`, [`str`, `np.ndarray`]
         Dictionary keyed by amp names containing the integrated photocharge
         for linearity calibration.



Blow this line is the version. Does it need to be increased after the changes in this ticket? If so, will it affect backward compatibility?

I went back and forth on that, because as far as I can tell none of the changes here affect backwards compatibility. We have no code that uses photoCharge for example, and I haven't modified the table format. (Though I do worry that some old tables may have been written incorrectly, but I think they would only be wrong if they were serialized to fits, read, and re-written, which we don't do for the final dataset.

czwa

I'm a bit concerned that removing all of the padding will cause issues if one amplifier has a different length than the rest, but if this updated version does work in that case without all the mess, then that's fine.

czwa · 2023-03-20T22:42:45Z

python/lsst/ip/isr/ptcDataset.py

+            Average of the means of the exposures in this pair.
+        rawVar : `float`, optional
+            Variance of the difference of the exposures in this pair.
+        photoCharge : `float`, optional


This is now plural in the definition in __init__, and so should be updated here.

Note that these are all singular (intentionally) since they are single values to be set for a partial PTC dataset.

czwa · 2023-03-20T22:48:44Z

python/lsst/ip/isr/ptcDataset.py

+                'COVARIANCES_MODEL': self.covariancesModel[ampName].ravel(),
+                'COVARIANCES_SQRT_WEIGHTS': self.covariancesSqrtWeights[ampName].ravel(),
+                'COVARIANCES_MODEL_NO_B': self.covariancesModelNoB[ampName].ravel(),
+                'FINAL_VARS': self.finalVars[ampName],


Is this safe against amplifiers having different lengths, or does the promotion of these to numpy.array allow astropy to ignore those differences?

Neither, really. The construction of the dataset is such that for a given detector, all amps have the same input pairs and therefore are all filled the same (though it may be with nans). I believe that previously the reason that this did not work is because (a) we were filtering out bad values on read, which messed up the lengths per amp; (b) we were filtering out bad values on read inconsistently because of the astropy bug, which messed up the lengths within an amp.

Ok. I think I've convinced myself that we're fine, as the thing that concerned me (https://github.com/lsst/cp_pipe/blob/tickets/DM-38309/python/lsst/cp/pipe/ptc/cpSolvePtcTask.py#L438) is still being padded to keep the lengths consistent (https://github.com/lsst/cp_pipe/blob/main/python/lsst/cp/pipe/ptc/cpSolvePtcTask.py#L976)
All good from me on this ticket then.

Ah yes, I left all of this part the same. It's only the re-padding in the dataset that I removed.

This also fixes various index errors, and simplifies the serialization by avoiding the need for padding.

erykoff added 5 commits March 10, 2023 14:05

Move ptcDataset tests from cp_pipe and add test for DM-38309 patch.

83d1dc7

Add ability to return expIdsUsed for old ptc datasets.

bbcbae5

Add protections for bad amps when converting to a table.

c1c5c00

Force astropy to use nans when reading tables.

56b7808

This is a workaround for astropy issue astropy/astropy#4708

Fix format of test data and add additional tests.

a54d903

erykoff requested review from czwa and plazas March 20, 2023 20:28

plazas reviewed Mar 21, 2023

View reviewed changes

plazas approved these changes Mar 21, 2023

View reviewed changes

czwa approved these changes Mar 21, 2023

View reviewed changes

Update PhotonTransferCurveDataset to use numpy arrays.

8730212

This also fixes various index errors, and simplifies the serialization by avoiding the need for padding.

erykoff force-pushed the tickets/DM-38309 branch from 570b244 to 8730212 Compare March 21, 2023 20:39

erykoff merged commit 1885775 into main Mar 22, 2023
1 check passed

erykoff deleted the tickets/DM-38309 branch March 22, 2023 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-38309: Update PhotonTransferCurveDataset to reduce memory and fix index errors. #254

DM-38309: Update PhotonTransferCurveDataset to reduce memory and fix index errors. #254

erykoff commented Mar 20, 2023

plazas Mar 21, 2023

erykoff Mar 21, 2023

plazas Mar 21, 2023

plazas Mar 21, 2023

plazas Mar 21, 2023

erykoff Mar 21, 2023

erykoff Mar 21, 2023

plazas Mar 21, 2023

plazas Mar 21, 2023

erykoff Mar 21, 2023

plazas Mar 21, 2023

erykoff Mar 21, 2023

czwa left a comment

czwa Mar 20, 2023

erykoff Mar 21, 2023

czwa Mar 20, 2023

erykoff Mar 21, 2023

czwa Mar 21, 2023

erykoff Mar 21, 2023

DM-38309: Update PhotonTransferCurveDataset to reduce memory and fix index errors. #254

DM-38309: Update PhotonTransferCurveDataset to reduce memory and fix index errors. #254

Conversation

erykoff commented Mar 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

czwa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment