-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-42927: Update cp_verify connections/classes/outputs for analysis_tools #46
Conversation
Try to reduce complexity by making one code class with multiple subclasses for particular dimension combinations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done a first pass with a bunch of comments. But more importantly, I don't have a picture of how this is supposed to work, let alone how it works. A discussion may be in order, plus more documentation.
defaults[column] = max(defaults[column], | ||
*[table[column].shape[1] for table in inputResults]) | ||
|
||
# Pad vectors shorter than this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This padding stuff always makes me nervous. I actually don't know what vectors are being padded here (more comments would be helpful) and why some would be shorter. If it's something like a missing amp then padding may be wrong because we don't know which are the missing amps (e.g. padding was causing off-by-one errors in the ptc previously).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is doing the wrong thing in at least some cases, but I haven't been able to debug that yet (this ticket: https://rubinobs.atlassian.net/browse/DM-43877). For more standard vectors (the serial and parallel profile, for example) this seems to be working correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is going into this table exactly? And what are the rows that need padding?
If the standard vectors don't need padding we shouldn't have padding here at all. I'd rather let something crash here with unexpected input than pass on garbage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible this isn't needed. I've added a raise here, and am seeing what ci_cpp yields. If that passes, then the padding is fully unnecessary, and I'll remove it. Otherwise, I'll see what is causing the lengths to differ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ci_cpp ran without complaint, so I think this means the padding can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not be shocked to find something around here crashing when we run the full camera, but then we can/should directly fix whatever is giving inconsistent outputs. So removing the padding is the right thing to do.
python/lsst/cp/verify/verifyCalib.py
Outdated
def repackStats(self, statisticsDict, dimensions): | ||
"""Repack information into flat tables. | ||
|
||
This method should be redefined in subclasses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this should be redefined, why isn't it an ABC with no code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another dropped docstring. This is a basic repacker for the simplest cases, and more complicated cases will need to replace it. Docstring updated here and in verifyStats.py
as well.
def repackStats(self, statisticsDict, dimensions): | ||
"""Repack information into flat tables. | ||
|
||
This method should be redefined in subclasses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a subclass redefinition, it doesn't actually need a docstring. You can add a comment # docstring inherited
to make that clear.
I've added a short introduction to how cp_verify works to the documentation directory. This builds, so I have not made any significant errors in the formatting of that document, but do not intend for this to be a final draft. As cp_verify is not currently part of the documentation build, I've added a new ticket DM-43993 that will finalize and correct this draft, add the remaining missing documentation, and ensure that cp_verify appears on pipelines.lsst.io. I'm happy to rewrite this draft for clarity and factual errors, but would prefer to keep formatting and fine details for that future ticket. |
|
||
* ``metadataStatKeywords``: these options define the tests that will be run on the input task_metadata input. These tests need to be implemented by subclasses. | ||
|
||
* ``catalogStatKeywords``: these options define the tests that will be run on the input catalog data. As before, these tests need to be implemented by subclasses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What sort of catalogs are these? And where are they input from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are catalogs from CharacterizeImageTask
. The main case for this is in the brighter-fatter verification, where we expect that objects will have smaller sizes with BF correction than without. Text updated to indicate that a prior step in the pipeline should generate these, and a pointer to CharacterizeImageTask
.
Verifying calibrations with cp_verify | ||
##################################### | ||
|
||
`cp_verify` is designed to make the verification of calibrations against the criteria listed in `DMTN-101 <https://dmtn-101.lsst.io>`_ as simple as possible. However, this has resulted in the tasks being rather abstract, with no clear connection between steps. This document should help explain how the tests are run, how those results are stored, and how they are passed to `analysis_tools` for conversion into metrics and plots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you don't want a full review of formatting and everything, but the standard is that each sentence gets its own line in the rst doc files. This is important to do before the first merge so that subsequent changes are easier to read in the github diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done here so the actual documentation ticket will have the cleaner diffs.
* ``catalogStatKeywords``: these options define the tests that will be run on the input catalog data. As before, these tests need to be implemented by subclasses. | ||
|
||
The final two configuration options that control the behavior of the verification task are the ``useIsrStatistics`` configuration option, which defines if there are results from the ``isrStatistics`` input that should be considered, and the ``hasMatrixCatalog`` option, which indicates if a matrix catalog output will be constructed. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even after reading the following paragraph, I'm still not sure what a "matrix" is. I think a concrete example of what a matrix quantity is would help a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated text in the outputMatrix
description:
An example use case for this kind of matrix catalog is the crosstalk coefficients, which would be represented with columns for source and target detector and amplifier, with the value column containing the coefficient at which the source pixels imprint onto the target pixels.
def exposureStatistics(self, statisticsDict): | ||
@staticmethod | ||
def mergeTable(inputResults, newStats=None): | ||
"""Merge input tables, padding columns as needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we're doing padding any more.
python/lsst/cp/verify/verifyFlat.py
Outdated
def pack(self, statisticsDict, dimensions, outKey): | ||
"""Repack information into flat tables. | ||
|
||
This method should be redefined in subclasses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should or may?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment removed here, retained as "should" in the base class (newly added as this method was skipped), with the clarification that that's only if new stats have been calculated that need to be pack
ed.
@@ -8,5 +8,6 @@ setupRequired(pex_exceptions) | |||
setupRequired(pipe_tasks) | |||
setupRequired(utils) | |||
setupRequired(pipe_base) | |||
setupRequired(analysis_tools) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an okay dependency direction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so. analysis_tools
depends on a subset of the cp_verify
dependencies, in addition to daf_butler
, skymap
, and geom
. Those three only depend on even more fundamental packages, so I think this is safe from circular issues.
This reworks the output data products to use ArrowAstropy tables instead of yaml, but does not fully remove the existing yaml (which I've added DM-43893 to handle).