New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-28394: Add Tasks to write, transform, and consolidate ForcedSources #571
Conversation
c668d73
to
2c8707c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good to me. I just have a few questions, mostly about documentation.
I wasn't sure if I should see a connection between the changes in the class PostprocessAnalysis
and in functors.py and the main work of adding the ForcedSources tasks. Were these just incidental?
@@ -114,7 +114,7 @@ class Functor(object): | |||
index levels defined by the `_dfLevels` attribute; by default, this is | |||
`column`. | |||
|
|||
The `_columnLevels` and `_dfLevels` attributes should generally not need to | |||
The `_dfLevels` attributes should generally not need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_columnLevels
is still mentioned in the documentation above here at line 110
key = lsst.pex.config.Field( | ||
doc="Column on which to join the two input tables on and make the primary key of the output", | ||
dtype=str, | ||
default="objectId", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic question, but is it absolutely guaranteed that forced_src
and forced_diff
will have the same objectIds for the same objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not guaranteed. You can construct a pipeline where you seed forced_src with one reference catalog and forced_diff with another. Most likely you'll get 0 objectIds matching. Which reminds me that I should add some contracts to the DRP.yaml so that users can't construct that pipeline.
|
||
Transforms each wide, per-detector forcedSource parquet table per the | ||
specification file (per-camera defaults found in ForcedSource.yaml). | ||
All epochs that overlap-the patch are aggregated into one per-patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove dash after 'overlap'.
dimensions=("instrument", "visit", "detector", "skymap", "tract")): | ||
|
||
inputCatalog = connectionTypes.Input( | ||
doc="Primary multi-epoch, per-detector, forced photometry catalog. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this only for one visit, not multi-epoch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any time I say forced photometry I try to add prepend it with the modifier "multi-epoch" or "multi-band" since we do both and it's confusing. Let's call it single-epoch here.
`detect_isTractInner`,`detect_isPatchInner`, so that user may dedupe for | ||
science or compare duplicates for QA. | ||
|
||
The resulting table includes multiple bands. Epochs are other useful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Epochs and other useful'?
|
||
No de-duplication of rows is performed. Duplicate resolutions flags are | ||
pulled in from the referenceCatalog: `detect_isPrimary`, | ||
`detect_isTractInner`,`detect_isPatchInner`, so that user may dedupe for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing 'dedupe' to 'de-duplicate' would be a little easier to understand.
0e32b3b
to
13e29d2
Compare
They were some specializations that needed to be removed in order to transfrorm a multi-level DataFrame that wasn't deepCoadd_obj. I split them out into their own commit to make it more clear. |
13e29d2
to
c632db9
Compare
- Functor was originally written to be applied to deepCoadd_obj tables. The hard-coded _columnLevels was unnecessary and was removed to enable the application of Functors to other multi-level DataFrames - Remove default coord_ra and coord_dec functors, which pulled from the ref dataset only, from PostprocessAnalysis.
This series of tasks produces the ForcedSource table as specified in the DPDD.
c632db9
to
cf146b2
Compare
This series of tasks produces the ForcedSource table
specified in the DPDD.