Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-25329: Fix LATISS instrument pipeline definitions for cp_pipe #228

Merged
merged 1 commit into from Jul 14, 2020

Conversation

czwa
Copy link
Contributor

@czwa czwa commented Jul 1, 2020

Fix LATISS pipeline instrument name and inconsistencies.

@plazas plazas self-requested a review July 3, 2020 22:21
Comment on lines +24 to +28
cpFlatMeasure:
class: lsst.cp.pipe.cpFlatNormTask.CpFlatMeasureTask
config:
connections.inputExp: 'cpFlatProc'
connections.outputStats: 'flatStats'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Why did you add this block? It seems it was not there before. I guess this is the part that is defining the task that actually produces the master flat, correct? Or is the name indicating that this is (or will be) measuring some statistic on an already created master flat?
  • The connections seem to be defining inputs and outputs. Where can the definitions be found? What is, for example, cpFlatProc ("processed"?), and where can that be looked up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new block is needed to pick up the cp_pipe changes in how flats are made. Previously, the ISR amplifier statistics written to the isr_metadata files were being used. However, for HSC, this is incorrect, because the vignetting at the edge of the field of view skews the statistics measured there. The CpFlatMeasureTask was added to apply vignetting and remeasure the amp and detector level statistics.

The flat generation process follows the task order listed in the pipeline: IsrTask processes the raw exposures in parallel into cpFlatProc exposures, according to the config options listed. CpFlatMeasureTask measures and writes out statistics (again in parallel) for each amp/detector of the inputs. All of those statistics are then read by CpFlatNormTask (a single process for the flat construction process) to solve for a set of exposure-level scale factors that need to be applied to the cpFlatProc images before coadding them into the final flat.

The connection names are somewhat arbitrary and only need to be consistent between outputs/inputs of tasks in the pipeline. The definitions are defined in the Connections class associated with the Config class associated with the Task: https://github.com/lsst/cp_pipe/blob/master/python/lsst/cp/pipe/cpFlatNormTask.py#L35

Having flexible dataset definitions with connections means we don't need to worry about mappers or defining datasets in a yaml file in gen3.

cpFlatNorm:
class: lsst.cp.pipe.cpFlatNormTask.CpFlatNormalizationTask
config:
connections.inputExps: 'isr_metadata'
connections.outputScales: 'cpFlatNorm_yaml'
connections.inputMDs: 'flatStats'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my comment above, what is "MD" and how do I find out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MD is an abbreviation for metadata. https://github.com/lsst/cp_pipe/blob/master/python/lsst/cp/pipe/cpFlatNormTask.py#L138 has the definition. This change is related to the fact that the input is no longer an Exposure, but a PropertyList containing the measured statistics.

class: lsst.cp.pipe.cpFlatNormTask.CpFlatMeasureTask
config:
connections.inputExp: 'cpFlatProc'
connections.outputStats: 'flatStats'
cpFlatNorm:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a separate task that normalizes the master flats? Or the individual input flats? If so, why do we need a separate task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "reduce"/"gather" phase of the parallelization. All of the statistics need to be collected to create the normalization used in the final combination. The exact string "cpFlatNorm" is the task label used to construct the pipeline. This allows the same python Task to be used multiple times in the same pipeline, as long as they have unique lable names.

Comment on lines 44 to 45
contracts:
- isr.doFlat == False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, what are the "contracts"? (forgive me for the naïve questions). I see that under "isr" you already have "doFlat: False"; why put that here again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contract is a way to force something to be set one way, with the pipeline construction process failing if the contract is violated (so it fails fast instead of running with the wrong configuration). Contracts can also be used to configure parameters to match between tasks (like a list of mask planes to be set in one task, and then ignored in a second task).

Copy link
Contributor

@plazas plazas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that most of the changes are just correcting typos or adding some extra fields. I have a few comments just asking for clarification because I'm not that familiar with gen3 (in that sense, thank you for letting me review this ticket to gain more familiarity with it). I wonder if, in general, if some comment could be added at the top saying something like "for definitions see such and such documentation", but maybe these types of YAML files are not meant to have a lot of comments.

@czwa czwa merged commit 667a8ce into master Jul 14, 2020
@timj timj deleted the tickets/DM-25329 branch May 12, 2022 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants