Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-42927: Update cp_verify connections/classes/outputs for analysis_tools #46

Merged
merged 12 commits into from
May 1, 2024

Conversation

czwa
Copy link
Collaborator

@czwa czwa commented Apr 12, 2024

This reworks the output data products to use ArrowAstropy tables instead of yaml, but does not fully remove the existing yaml (which I've added DM-43893 to handle).

@czwa czwa requested a review from erykoff April 12, 2024 22:36
Copy link
Contributor

@erykoff erykoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done a first pass with a bunch of comments. But more importantly, I don't have a picture of how this is supposed to work, let alone how it works. A discussion may be in order, plus more documentation.

python/lsst/cp/verify/mergeResults.py Show resolved Hide resolved
python/lsst/cp/verify/mergeResults.py Show resolved Hide resolved
defaults[column] = max(defaults[column],
*[table[column].shape[1] for table in inputResults])

# Pad vectors shorter than this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This padding stuff always makes me nervous. I actually don't know what vectors are being padded here (more comments would be helpful) and why some would be shorter. If it's something like a missing amp then padding may be wrong because we don't know which are the missing amps (e.g. padding was causing off-by-one errors in the ptc previously).

Copy link
Collaborator Author

@czwa czwa Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is doing the wrong thing in at least some cases, but I haven't been able to debug that yet (this ticket: https://rubinobs.atlassian.net/browse/DM-43877). For more standard vectors (the serial and parallel profile, for example) this seems to be working correctly.

Copy link
Contributor

@erykoff erykoff Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is going into this table exactly? And what are the rows that need padding?
If the standard vectors don't need padding we shouldn't have padding here at all. I'd rather let something crash here with unexpected input than pass on garbage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible this isn't needed. I've added a raise here, and am seeing what ci_cpp yields. If that passes, then the padding is fully unnecessary, and I'll remove it. Otherwise, I'll see what is causing the lengths to differ.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ci_cpp ran without complaint, so I think this means the padding can be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not be shocked to find something around here crashing when we run the full camera, but then we can/should directly fix whatever is giving inconsistent outputs. So removing the padding is the right thing to do.

python/lsst/cp/verify/mergeResults.py Show resolved Hide resolved
def repackStats(self, statisticsDict, dimensions):
"""Repack information into flat tables.

This method should be redefined in subclasses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this should be redefined, why isn't it an ABC with no code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another dropped docstring. This is a basic repacker for the simplest cases, and more complicated cases will need to replace it. Docstring updated here and in verifyStats.py as well.

def repackStats(self, statisticsDict, dimensions):
"""Repack information into flat tables.

This method should be redefined in subclasses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a subclass redefinition, it doesn't actually need a docstring. You can add a comment # docstring inherited to make that clear.

@czwa
Copy link
Collaborator Author

czwa commented Apr 19, 2024

I've added a short introduction to how cp_verify works to the documentation directory. This builds, so I have not made any significant errors in the formatting of that document, but do not intend for this to be a final draft. As cp_verify is not currently part of the documentation build, I've added a new ticket DM-43993 that will finalize and correct this draft, add the remaining missing documentation, and ensure that cp_verify appears on pipelines.lsst.io. I'm happy to rewrite this draft for clarity and factual errors, but would prefer to keep formatting and fine details for that future ticket.


* ``metadataStatKeywords``: these options define the tests that will be run on the input task_metadata input. These tests need to be implemented by subclasses.

* ``catalogStatKeywords``: these options define the tests that will be run on the input catalog data. As before, these tests need to be implemented by subclasses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of catalogs are these? And where are they input from?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are catalogs from CharacterizeImageTask. The main case for this is in the brighter-fatter verification, where we expect that objects will have smaller sizes with BF correction than without. Text updated to indicate that a prior step in the pipeline should generate these, and a pointer to CharacterizeImageTask.

Verifying calibrations with cp_verify
#####################################

`cp_verify` is designed to make the verification of calibrations against the criteria listed in `DMTN-101 <https://dmtn-101.lsst.io>`_ as simple as possible. However, this has resulted in the tasks being rather abstract, with no clear connection between steps. This document should help explain how the tests are run, how those results are stored, and how they are passed to `analysis_tools` for conversion into metrics and plots.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you don't want a full review of formatting and everything, but the standard is that each sentence gets its own line in the rst doc files. This is important to do before the first merge so that subsequent changes are easier to read in the github diff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here so the actual documentation ticket will have the cleaner diffs.

* ``catalogStatKeywords``: these options define the tests that will be run on the input catalog data. As before, these tests need to be implemented by subclasses.

The final two configuration options that control the behavior of the verification task are the ``useIsrStatistics`` configuration option, which defines if there are results from the ``isrStatistics`` input that should be considered, and the ``hasMatrixCatalog`` option, which indicates if a matrix catalog output will be constructed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after reading the following paragraph, I'm still not sure what a "matrix" is. I think a concrete example of what a matrix quantity is would help a lot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated text in the outputMatrix description:

An example use case for this kind of matrix catalog is the crosstalk coefficients, which would be represented with columns for source and target detector and amplifier, with the value column containing the coefficient at which the source pixels imprint onto the target pixels.

def exposureStatistics(self, statisticsDict):
@staticmethod
def mergeTable(inputResults, newStats=None):
"""Merge input tables, padding columns as needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we're doing padding any more.

def pack(self, statisticsDict, dimensions, outKey):
"""Repack information into flat tables.

This method should be redefined in subclasses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should or may?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment removed here, retained as "should" in the base class (newly added as this method was skipped), with the clarification that that's only if new stats have been calculated that need to be packed.

@@ -8,5 +8,6 @@ setupRequired(pex_exceptions)
setupRequired(pipe_tasks)
setupRequired(utils)
setupRequired(pipe_base)
setupRequired(analysis_tools)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an okay dependency direction?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so. analysis_tools depends on a subset of the cp_verify dependencies, in addition to daf_butler, skymap, and geom. Those three only depend on even more fundamental packages, so I think this is safe from circular issues.

@czwa czwa merged commit 157a059 into main May 1, 2024
3 checks passed
@czwa czwa deleted the tickets/DM-42927 branch May 1, 2024 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants