Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-41635: Add option to output model parameters and cov #37

Merged
merged 1 commit into from Dec 13, 2023

Conversation

cmsaunders
Copy link
Collaborator

No description provided.

Copy link

@taranu taranu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions and pedantic comments, except for the one about checking covariance matrix symmetry which is actually important.

modelParams = pipeBase.connectionTypes.Output(
doc="WCS parameter covariance.",
name="gbdesAstrometricFit_modelParams",
storageClass="ArrowNumpyDict",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About ArrowNumpyDicts?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I don't see any other uses in the stack yet (probably because most devs also don't know it exists).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, now that I'm thinking about it, I remember that this task's outputs are what triggered Eli adding this storage class.

dtype=bool,
doc=(
"Save the parameters and covariance of the WCS model. Default to "
"false because this can be very large."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if these needs to be added to the docstring necessarily but how many parameters is typical? Or is the possible range very large?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range is really large because the number of model coefficients depends on the polynomial order for the two mappings, the number of detectors, and the number of visits. For a tract of HSC RC2, it's ~5000 parameters.

@@ -517,9 +539,17 @@ def run(
outputWCSs = self._make_outputs(wcsf, inputVisitSummaries, exposureInfo)
outputCatalog = wcsf.getOutputCatalog()
starCatalog = wcsf.getStarCatalog()
if self.config.saveModelParams:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer one-line ternary modelParams = self._output_model_params(wcsf) if self.config.saveModelParams else None, but your taste may vary.


def _output_model_params(self, wcsf):
"""Get the WCS model parameters and covariance and convert to a
dictionary that will be readable as a pandas dataframe.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming it returns a numpy ndarray by default (with column/row names?), so while I guess you can convert it to a dataframe why would you want to?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ArrowNumpyDict is a dictionary of numpy arrays. I guess I could just say that it can be converted to a table of some sort. Turning it into a dataframe is helpful for me for keeping track of what the parameters are, but maybe not the most efficient path.

@@ -1233,3 +1263,47 @@ def _make_outputs(self, wcsf, visitSummaryTables, exposureInfo):
catalogs[visit] = catalog

return catalogs

def _output_model_params(self, wcsf):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call this _compute_model_params if it's returning them and not actually doing output itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just _get_model_params?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's just copying values and not really computing anything, ok.

modelParamDict = wcsf.mapCollection.getParamDict()
modelCovariance = wcsf.getModelCovariance()

modelParams = {"mapName": [], "coordinate": [], "parameter": [], "coefficientNumber": []}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't particularly matter with a short list but this could be a dict comp or defaultdict.

python/lsst/drp/tasks/gbdesAstrometricFit.py Show resolved Hide resolved
self.assertEqual(shape[0] + 4, shape[1])
# Check that covariance matrix is symmetric.
covariance = (modelParams.iloc[:, 4:]).to_numpy()
np.testing.assert_array_almost_equal(covariance, covariance.T)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why almost? Shouldn't they be identical, otherwise it's not a covariance matrix? Note that the default for this check is 1e-6 precision and that's way more than typical roundoff error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test, the max absolute difference is 2e-19, but the max relative difference is 2e-6, so it doesn't pass (for 2 out of 10614564 elements).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use assert_allclose instead then, since that allow for setting separate absolute and relative tolerances.

@cmsaunders cmsaunders merged commit b73e2dd into main Dec 13, 2023
3 checks passed
@cmsaunders cmsaunders deleted the tickets/DM-41635 branch December 13, 2023 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants