New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-38309: Update PhotonTransferCurveDataset to reduce memory and fix index errors. #254
Conversation
This is a workaround for astropy issue astropy/astropy#4708
python/lsst/ip/isr/ptcDataset.py
Outdated
if covSqrtWeights is None: | ||
covSqrtWeights = nanMatrix | ||
|
||
# This first one I'm not sure of the type; list of tuples is okay? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A list (or NumPy array, if you changed everything to NumPy arrays) of tuples should be fine, I would think. If so, don't forget to delete this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, that was a vestigial comment, I'll remove it.
try: | ||
expIdsUsed = [(exp1, exp2) for ((exp1, exp2), m) in zip(pairs, mask) if m] | ||
except ValueError: | ||
warnings.warn("The PTC file was written incorrectly; you should rerun the " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could the PTC file have been written incorrectly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to decide if more details in this message would be useful for the user that encounters this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why would re-running ptcSolve
fix it? I'm also thinking of the subset of cpPtc.yaml
that only runs isr
and cpPtcExtract
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file was incorrectly written because of the bug here which I fixed lsst/cp_pipe@a95b017
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice the first line in the original version of that commit, which meant it was appending a list of a list to a list, giving too many dimensions.
if m: | ||
expIdsUsed.append(pairList[0]) | ||
|
||
return expIdsUsed | ||
|
||
def getGoodAmps(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a docstring here too?
dataset.badAmps.append("C:0,1") | ||
self.assertTrue(dataset.getGoodAmps() == [amp for amp in self.ampNames if amp != "C:0,1"]) | ||
|
||
def test_ptcDataset_pre_dm38309(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an analogous test for the partial PTC datasets created by cpPtcExtract
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are fine, in that the partial datasets didn't have the extra dimension. But it's also not something that is a common use-case anyway as those are not really user-facing.
Dictionary keyed by amp names containing the masked average of the | ||
means of the exposures in each flat pair. If needed, each array | ||
will be right-padded with np.nan to match the length of | ||
rawExpTimes. | ||
photoCharge : `dict`, [`str`, `list`] | ||
photoCharges : `dict`, [`str`, `np.ndarray`] | ||
Dictionary keyed by amp names containing the integrated photocharge | ||
for linearity calibration. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blow this line is the version. Does it need to be increased after the changes in this ticket? If so, will it affect backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on that, because as far as I can tell none of the changes here affect backwards compatibility. We have no code that uses photoCharge
for example, and I haven't modified the table format. (Though I do worry that some old tables may have been written incorrectly, but I think they would only be wrong if they were serialized to fits, read, and re-written, which we don't do for the final dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned that removing all of the padding will cause issues if one amplifier has a different length than the rest, but if this updated version does work in that case without all the mess, then that's fine.
Average of the means of the exposures in this pair. | ||
rawVar : `float`, optional | ||
Variance of the difference of the exposures in this pair. | ||
photoCharge : `float`, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now plural in the definition in __init__
, and so should be updated here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that these are all singular (intentionally) since they are single values to be set for a partial PTC dataset.
'COVARIANCES_MODEL': self.covariancesModel[ampName].ravel(), | ||
'COVARIANCES_SQRT_WEIGHTS': self.covariancesSqrtWeights[ampName].ravel(), | ||
'COVARIANCES_MODEL_NO_B': self.covariancesModelNoB[ampName].ravel(), | ||
'FINAL_VARS': self.finalVars[ampName], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe against amplifiers having different lengths, or does the promotion of these to numpy.array
allow astropy to ignore those differences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither, really. The construction of the dataset is such that for a given detector, all amps have the same input pairs and therefore are all filled the same (though it may be with nan
s). I believe that previously the reason that this did not work is because (a) we were filtering out bad values on read, which messed up the lengths per amp; (b) we were filtering out bad values on read inconsistently because of the astropy bug, which messed up the lengths within an amp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I think I've convinced myself that we're fine, as the thing that concerned me (https://github.com/lsst/cp_pipe/blob/tickets/DM-38309/python/lsst/cp/pipe/ptc/cpSolvePtcTask.py#L438) is still being padded to keep the lengths consistent (https://github.com/lsst/cp_pipe/blob/main/python/lsst/cp/pipe/ptc/cpSolvePtcTask.py#L976)
All good from me on this ticket then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I left all of this part the same. It's only the re-padding in the dataset that I removed.
This also fixes various index errors, and simplifies the serialization by avoiding the need for padding.
570b244
to
8730212
Compare
No description provided.