New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-25329: Fix LATISS instrument pipeline definitions for cp_pipe #228
Conversation
cpFlatMeasure: | ||
class: lsst.cp.pipe.cpFlatNormTask.CpFlatMeasureTask | ||
config: | ||
connections.inputExp: 'cpFlatProc' | ||
connections.outputStats: 'flatStats' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why did you add this block? It seems it was not there before. I guess this is the part that is defining the task that actually produces the master flat, correct? Or is the name indicating that this is (or will be) measuring some statistic on an already created master flat?
- The connections seem to be defining inputs and outputs. Where can the definitions be found? What is, for example,
cpFlatProc
("processed"?), and where can that be looked up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new block is needed to pick up the cp_pipe changes in how flats are made. Previously, the ISR amplifier statistics written to the isr_metadata files were being used. However, for HSC, this is incorrect, because the vignetting at the edge of the field of view skews the statistics measured there. The CpFlatMeasureTask was added to apply vignetting and remeasure the amp and detector level statistics.
The flat generation process follows the task order listed in the pipeline: IsrTask processes the raw exposures in parallel into cpFlatProc exposures, according to the config options listed. CpFlatMeasureTask measures and writes out statistics (again in parallel) for each amp/detector of the inputs. All of those statistics are then read by CpFlatNormTask (a single process for the flat construction process) to solve for a set of exposure-level scale factors that need to be applied to the cpFlatProc images before coadding them into the final flat.
The connection names are somewhat arbitrary and only need to be consistent between outputs/inputs of tasks in the pipeline. The definitions are defined in the Connections class associated with the Config class associated with the Task: https://github.com/lsst/cp_pipe/blob/master/python/lsst/cp/pipe/cpFlatNormTask.py#L35
Having flexible dataset definitions with connections means we don't need to worry about mappers or defining datasets in a yaml file in gen3.
cpFlatNorm: | ||
class: lsst.cp.pipe.cpFlatNormTask.CpFlatNormalizationTask | ||
config: | ||
connections.inputExps: 'isr_metadata' | ||
connections.outputScales: 'cpFlatNorm_yaml' | ||
connections.inputMDs: 'flatStats' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my comment above, what is "MD" and how do I find out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MD is an abbreviation for metadata. https://github.com/lsst/cp_pipe/blob/master/python/lsst/cp/pipe/cpFlatNormTask.py#L138 has the definition. This change is related to the fact that the input is no longer an Exposure, but a PropertyList containing the measured statistics.
class: lsst.cp.pipe.cpFlatNormTask.CpFlatMeasureTask | ||
config: | ||
connections.inputExp: 'cpFlatProc' | ||
connections.outputStats: 'flatStats' | ||
cpFlatNorm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a separate task that normalizes the master flats? Or the individual input flats? If so, why do we need a separate task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the "reduce"/"gather" phase of the parallelization. All of the statistics need to be collected to create the normalization used in the final combination. The exact string "cpFlatNorm" is the task label used to construct the pipeline. This allows the same python Task to be used multiple times in the same pipeline, as long as they have unique lable names.
contracts: | ||
- isr.doFlat == False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, what are the "contracts"? (forgive me for the naïve questions). I see that under "isr" you already have "doFlat: False"; why put that here again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contract is a way to force something to be set one way, with the pipeline construction process failing if the contract is violated (so it fails fast instead of running with the wrong configuration). Contracts can also be used to configure parameters to match between tasks (like a list of mask planes to be set in one task, and then ignored in a second task).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that most of the changes are just correcting typos or adding some extra fields. I have a few comments just asking for clarification because I'm not that familiar with gen3 (in that sense, thank you for letting me review this ticket to gain more familiarity with it). I wonder if, in general, if some comment could be added at the top saying something like "for definitions see such and such documentation", but maybe these types of YAML files are not meant to have a lot of comments.
Fix LATISS pipeline instrument name and inconsistencies.