New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-41635: Add option to output model parameters and cov #37
Conversation
8068485
to
d684231
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions and pedantic comments, except for the one about checking covariance matrix symmetry which is actually important.
modelParams = pipeBase.connectionTypes.Output( | ||
doc="WCS parameter covariance.", | ||
name="gbdesAstrometricFit_modelParams", | ||
storageClass="ArrowNumpyDict", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About ArrowNumpyDicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I don't see any other uses in the stack yet (probably because most devs also don't know it exists).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, now that I'm thinking about it, I remember that this task's outputs are what triggered Eli adding this storage class.
dtype=bool, | ||
doc=( | ||
"Save the parameters and covariance of the WCS model. Default to " | ||
"false because this can be very large." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if these needs to be added to the docstring necessarily but how many parameters is typical? Or is the possible range very large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The range is really large because the number of model coefficients depends on the polynomial order for the two mappings, the number of detectors, and the number of visits. For a tract of HSC RC2, it's ~5000 parameters.
@@ -517,9 +539,17 @@ def run( | |||
outputWCSs = self._make_outputs(wcsf, inputVisitSummaries, exposureInfo) | |||
outputCatalog = wcsf.getOutputCatalog() | |||
starCatalog = wcsf.getStarCatalog() | |||
if self.config.saveModelParams: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer one-line ternary modelParams = self._output_model_params(wcsf) if self.config.saveModelParams else None
, but your taste may vary.
|
||
def _output_model_params(self, wcsf): | ||
"""Get the WCS model parameters and covariance and convert to a | ||
dictionary that will be readable as a pandas dataframe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming it returns a numpy ndarray by default (with column/row names?), so while I guess you can convert it to a dataframe why would you want to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ArrowNumpyDict
is a dictionary of numpy arrays. I guess I could just say that it can be converted to a table of some sort. Turning it into a dataframe is helpful for me for keeping track of what the parameters are, but maybe not the most efficient path.
@@ -1233,3 +1263,47 @@ def _make_outputs(self, wcsf, visitSummaryTables, exposureInfo): | |||
catalogs[visit] = catalog | |||
|
|||
return catalogs | |||
|
|||
def _output_model_params(self, wcsf): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this _compute_model_params
if it's returning them and not actually doing output itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just _get_model_params
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's just copying values and not really computing anything, ok.
modelParamDict = wcsf.mapCollection.getParamDict() | ||
modelCovariance = wcsf.getModelCovariance() | ||
|
||
modelParams = {"mapName": [], "coordinate": [], "parameter": [], "coefficientNumber": []} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't particularly matter with a short list but this could be a dict comp or defaultdict.
tests/test_gbdesAstrometricFit.py
Outdated
self.assertEqual(shape[0] + 4, shape[1]) | ||
# Check that covariance matrix is symmetric. | ||
covariance = (modelParams.iloc[:, 4:]).to_numpy() | ||
np.testing.assert_array_almost_equal(covariance, covariance.T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why almost? Shouldn't they be identical, otherwise it's not a covariance matrix? Note that the default for this check is 1e-6 precision and that's way more than typical roundoff error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test, the max absolute difference is 2e-19, but the max relative difference is 2e-6, so it doesn't pass (for 2 out of 10614564 elements).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to use assert_allclose instead then, since that allow for setting separate absolute and relative tolerances.
d684231
to
6d7d5ec
Compare
6d7d5ec
to
d09b7f4
Compare
No description provided.