Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38309: Update PhotonTransferCurveDataset to reduce memory and fix index errors. #254

Merged
merged 6 commits into from Mar 22, 2023

Conversation

erykoff
Copy link
Contributor

@erykoff erykoff commented Mar 20, 2023

No description provided.

@erykoff erykoff requested review from czwa and plazas March 20, 2023 20:28
if covSqrtWeights is None:
covSqrtWeights = nanMatrix

# This first one I'm not sure of the type; list of tuples is okay?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A list (or NumPy array, if you changed everything to NumPy arrays) of tuples should be fine, I would think. If so, don't forget to delete this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that was a vestigial comment, I'll remove it.

try:
expIdsUsed = [(exp1, exp2) for ((exp1, exp2), m) in zip(pairs, mask) if m]
except ValueError:
warnings.warn("The PTC file was written incorrectly; you should rerun the "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could the PTC file have been written incorrectly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to decide if more details in this message would be useful for the user that encounters this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why would re-running ptcSolve fix it? I'm also thinking of the subset of cpPtc.yaml that only runs isr and cpPtcExtract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file was incorrectly written because of the bug here which I fixed lsst/cp_pipe@a95b017

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice the first line in the original version of that commit, which meant it was appending a list of a list to a list, giving too many dimensions.

if m:
expIdsUsed.append(pairList[0])

return expIdsUsed

def getGoodAmps(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a docstring here too?

dataset.badAmps.append("C:0,1")
self.assertTrue(dataset.getGoodAmps() == [amp for amp in self.ampNames if amp != "C:0,1"])

def test_ptcDataset_pre_dm38309(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an analogous test for the partial PTC datasets created by cpPtcExtract?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are fine, in that the partial datasets didn't have the extra dimension. But it's also not something that is a common use-case anyway as those are not really user-facing.

Dictionary keyed by amp names containing the masked average of the
means of the exposures in each flat pair. If needed, each array
will be right-padded with np.nan to match the length of
rawExpTimes.
photoCharge : `dict`, [`str`, `list`]
photoCharges : `dict`, [`str`, `np.ndarray`]
Dictionary keyed by amp names containing the integrated photocharge
for linearity calibration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blow this line is the version. Does it need to be increased after the changes in this ticket? If so, will it affect backward compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on that, because as far as I can tell none of the changes here affect backwards compatibility. We have no code that uses photoCharge for example, and I haven't modified the table format. (Though I do worry that some old tables may have been written incorrectly, but I think they would only be wrong if they were serialized to fits, read, and re-written, which we don't do for the final dataset.

Copy link
Contributor

@czwa czwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned that removing all of the padding will cause issues if one amplifier has a different length than the rest, but if this updated version does work in that case without all the mess, then that's fine.

Average of the means of the exposures in this pair.
rawVar : `float`, optional
Variance of the difference of the exposures in this pair.
photoCharge : `float`, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now plural in the definition in __init__, and so should be updated here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that these are all singular (intentionally) since they are single values to be set for a partial PTC dataset.

'COVARIANCES_MODEL': self.covariancesModel[ampName].ravel(),
'COVARIANCES_SQRT_WEIGHTS': self.covariancesSqrtWeights[ampName].ravel(),
'COVARIANCES_MODEL_NO_B': self.covariancesModelNoB[ampName].ravel(),
'FINAL_VARS': self.finalVars[ampName],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe against amplifiers having different lengths, or does the promotion of these to numpy.array allow astropy to ignore those differences?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither, really. The construction of the dataset is such that for a given detector, all amps have the same input pairs and therefore are all filled the same (though it may be with nans). I believe that previously the reason that this did not work is because (a) we were filtering out bad values on read, which messed up the lengths per amp; (b) we were filtering out bad values on read inconsistently because of the astropy bug, which messed up the lengths within an amp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think I've convinced myself that we're fine, as the thing that concerned me (https://github.com/lsst/cp_pipe/blob/tickets/DM-38309/python/lsst/cp/pipe/ptc/cpSolvePtcTask.py#L438) is still being padded to keep the lengths consistent (https://github.com/lsst/cp_pipe/blob/main/python/lsst/cp/pipe/ptc/cpSolvePtcTask.py#L976)
All good from me on this ticket then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I left all of this part the same. It's only the re-padding in the dataset that I removed.

This also fixes various index errors, and simplifies the
serialization by avoiding the need for padding.
@erykoff erykoff merged commit 1885775 into main Mar 22, 2023
1 check passed
@erykoff erykoff deleted the tickets/DM-38309 branch March 22, 2023 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants