DM-42927: Update cp_verify connections/classes/outputs for analysis_tools #46

czwa · 2024-04-12T22:34:47Z

This reworks the output data products to use ArrowAstropy tables instead of yaml, but does not fully remove the existing yaml (which I've added DM-43893 to handle).

Try to reduce complexity by making one code class with multiple subclasses for particular dimension combinations.

…sis_tools.

erykoff

I have done a first pass with a bunch of comments. But more importantly, I don't have a picture of how this is supposed to work, let alone how it works. A discussion may be in order, plus more documentation.

python/lsst/cp/verify/mergeResults.py

erykoff · 2024-04-16T16:29:49Z

python/lsst/cp/verify/mergeResults.py

+                defaults[column] = max(defaults[column],
+                                       *[table[column].shape[1] for table in inputResults])
+
+        # Pad vectors shorter than this:


This padding stuff always makes me nervous. I actually don't know what vectors are being padded here (more comments would be helpful) and why some would be shorter. If it's something like a missing amp then padding may be wrong because we don't know which are the missing amps (e.g. padding was causing off-by-one errors in the ptc previously).

I think this is doing the wrong thing in at least some cases, but I haven't been able to debug that yet (this ticket: https://rubinobs.atlassian.net/browse/DM-43877). For more standard vectors (the serial and parallel profile, for example) this seems to be working correctly.

What is going into this table exactly? And what are the rows that need padding?
If the standard vectors don't need padding we shouldn't have padding here at all. I'd rather let something crash here with unexpected input than pass on garbage.

It's possible this isn't needed. I've added a raise here, and am seeing what ci_cpp yields. If that passes, then the padding is fully unnecessary, and I'll remove it. Otherwise, I'll see what is causing the lengths to differ.

ci_cpp ran without complaint, so I think this means the padding can be removed.

I would not be shocked to find something around here crashing when we run the full camera, but then we can/should directly fix whatever is giving inconsistent outputs. So removing the padding is the right thing to do.

python/lsst/cp/verify/mergeResults.py

erykoff · 2024-04-16T16:34:29Z

python/lsst/cp/verify/verifyCalib.py

+    def repackStats(self, statisticsDict, dimensions):
+        """Repack information into flat tables.
+
+        This method should be redefined in subclasses.


If this should be redefined, why isn't it an ABC with no code?

Another dropped docstring. This is a basic repacker for the simplest cases, and more complicated cases will need to replace it. Docstring updated here and in verifyStats.py as well.

erykoff · 2024-04-16T16:35:34Z

python/lsst/cp/verify/verifyCrosstalk.py

+    def repackStats(self, statisticsDict, dimensions):
+        """Repack information into flat tables.
+
+        This method should be redefined in subclasses.


If this is a subclass redefinition, it doesn't actually need a docstring. You can add a comment # docstring inherited to make that clear.

czwa · 2024-04-19T23:11:07Z

I've added a short introduction to how cp_verify works to the documentation directory. This builds, so I have not made any significant errors in the formatting of that document, but do not intend for this to be a final draft. As cp_verify is not currently part of the documentation build, I've added a new ticket DM-43993 that will finalize and correct this draft, add the remaining missing documentation, and ensure that cp_verify appears on pipelines.lsst.io. I'm happy to rewrite this draft for clarity and factual errors, but would prefer to keep formatting and fine details for that future ticket.

erykoff · 2024-04-22T16:45:24Z

doc/lsst.cp.verify/using-cp-verify.rst

+
+* ``metadataStatKeywords``:  these options define the tests that will be run on the input task_metadata input.  These tests need to be implemented by subclasses.
+
+* ``catalogStatKeywords``:  these options define the tests that will be run on the input catalog data.  As before, these tests need to be implemented by subclasses.


What sort of catalogs are these? And where are they input from?

These are catalogs from CharacterizeImageTask. The main case for this is in the brighter-fatter verification, where we expect that objects will have smaller sizes with BF correction than without. Text updated to indicate that a prior step in the pipeline should generate these, and a pointer to CharacterizeImageTask.

erykoff · 2024-04-22T16:46:22Z

doc/lsst.cp.verify/using-cp-verify.rst

+Verifying calibrations with cp_verify
+#####################################
+
+`cp_verify` is designed to make the verification of calibrations against the criteria listed in `DMTN-101 <https://dmtn-101.lsst.io>`_ as simple as possible.  However, this has resulted in the tasks being rather abstract, with no clear connection between steps.  This document should help explain how the tests are run, how those results are stored, and how they are passed to `analysis_tools` for conversion into metrics and plots.


I know you don't want a full review of formatting and everything, but the standard is that each sentence gets its own line in the rst doc files. This is important to do before the first merge so that subsequent changes are easier to read in the github diff.

Done here so the actual documentation ticket will have the cleaner diffs.

erykoff · 2024-04-22T16:48:10Z

doc/lsst.cp.verify/using-cp-verify.rst

+* ``catalogStatKeywords``:  these options define the tests that will be run on the input catalog data.  As before, these tests need to be implemented by subclasses.
+
+The final two configuration options that control the behavior of the verification task are the ``useIsrStatistics`` configuration option, which defines if there are results from the ``isrStatistics`` input that should be considered, and the ``hasMatrixCatalog`` option, which indicates if a matrix catalog output will be constructed.
+


Even after reading the following paragraph, I'm still not sure what a "matrix" is. I think a concrete example of what a matrix quantity is would help a lot.

Updated text in the outputMatrix description:

An example use case for this kind of matrix catalog is the crosstalk coefficients, which would be represented with columns for source and target detector and amplifier, with the value column containing the coefficient at which the source pixels imprint onto the target pixels.

erykoff · 2024-04-22T16:51:57Z

python/lsst/cp/verify/mergeResults.py

-    def exposureStatistics(self, statisticsDict):
+    @staticmethod
+    def mergeTable(inputResults, newStats=None):
+        """Merge input tables, padding columns as needed.


I don't think we're doing padding any more.

erykoff · 2024-04-22T16:54:32Z

python/lsst/cp/verify/verifyFlat.py

+    def pack(self, statisticsDict, dimensions, outKey):
+        """Repack information into flat tables.
+
+        This method should be redefined in subclasses.


Should or may?

Comment removed here, retained as "should" in the base class (newly added as this method was skipped), with the clarification that that's only if new stats have been calculated that need to be packed.

erykoff · 2024-04-22T16:55:37Z

ups/cp_verify.table

@@ -8,5 +8,6 @@ setupRequired(pex_exceptions)
 setupRequired(pipe_tasks)
 setupRequired(utils)
 setupRequired(pipe_base)
+setupRequired(analysis_tools)


Is this an okay dependency direction?

I believe so. analysis_tools depends on a subset of the cp_verify dependencies, in addition to daf_butler, skymap, and geom. Those three only depend on even more fundamental packages, so I think this is safe from circular issues.

czwa added 6 commits April 12, 2024 14:54

Move data repacking into the merge steps.

b0bda38

Try to reduce complexity by making one code class with multiple subclasses for particular dimension combinations.

Update verify tasks to handle construction of Table outputs for analy…

03b3f74

…sis_tools.

Add dependency on analysis_tools.

ed89003

Update pipelines for integration with analysis_tools.

03c96d1

Update unit tests for changes on this ticket.

c7e4001

Fix flake8.

0832d15

czwa requested a review from erykoff April 12, 2024 22:36

erykoff reviewed Apr 16, 2024

View reviewed changes

czwa added 4 commits April 17, 2024 13:00

Clarify docstrings.

4619158

Switch to clearer 'mergeDimension' name.

92b3068

Docstring and comment cleanup.

8d2e78b

Remove padding in merges.

babfe3a

czwa force-pushed the tickets/DM-42927 branch from 8761466 to babfe3a Compare April 18, 2024 20:40

Add introductory documentation.

7587422

erykoff reviewed Apr 22, 2024

View reviewed changes

Update documentation.

50b4584

czwa merged commit 157a059 into main May 1, 2024
3 checks passed

czwa deleted the tickets/DM-42927 branch May 1, 2024 23:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-42927: Update cp_verify connections/classes/outputs for analysis_tools #46

DM-42927: Update cp_verify connections/classes/outputs for analysis_tools #46

czwa commented Apr 12, 2024

erykoff left a comment

erykoff Apr 16, 2024

czwa Apr 17, 2024 •

edited

erykoff Apr 17, 2024 •

edited

czwa Apr 17, 2024

czwa Apr 17, 2024

erykoff Apr 17, 2024

erykoff Apr 16, 2024

czwa Apr 17, 2024

erykoff Apr 16, 2024

czwa commented Apr 19, 2024

erykoff Apr 22, 2024

czwa Apr 24, 2024

erykoff Apr 22, 2024

czwa Apr 24, 2024

erykoff Apr 22, 2024

czwa Apr 24, 2024

erykoff Apr 22, 2024

erykoff Apr 22, 2024

czwa Apr 24, 2024

erykoff Apr 22, 2024

czwa Apr 24, 2024


		* ``metadataStatKeywords``: these options define the tests that will be run on the input task_metadata input. These tests need to be implemented by subclasses.

		* ``catalogStatKeywords``: these options define the tests that will be run on the input catalog data. As before, these tests need to be implemented by subclasses.

		* ``catalogStatKeywords``: these options define the tests that will be run on the input catalog data. As before, these tests need to be implemented by subclasses.

		The final two configuration options that control the behavior of the verification task are the ``useIsrStatistics`` configuration option, which defines if there are results from the ``isrStatistics`` input that should be considered, and the ``hasMatrixCatalog`` option, which indicates if a matrix catalog output will be constructed.

DM-42927: Update cp_verify connections/classes/outputs for analysis_tools #46

DM-42927: Update cp_verify connections/classes/outputs for analysis_tools #46

Conversation

czwa commented Apr 12, 2024

erykoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

czwa Apr 17, 2024 • edited

Choose a reason for hiding this comment

erykoff Apr 17, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

czwa commented Apr 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

czwa Apr 17, 2024 •

edited

erykoff Apr 17, 2024 •

edited