Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-28394: Add Tasks to write, transform, and consolidate ForcedSources #571

Merged
merged 2 commits into from Sep 16, 2021

Conversation

yalsayyad
Copy link
Contributor

This series of tasks produces the ForcedSource table
specified in the DPDD.

@yalsayyad yalsayyad force-pushed the tickets/DM-28394 branch 3 times, most recently from c668d73 to 2c8707c Compare September 9, 2021 04:18
Copy link
Contributor

@cmsaunders cmsaunders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me. I just have a few questions, mostly about documentation.

I wasn't sure if I should see a connection between the changes in the class PostprocessAnalysis and in functors.py and the main work of adding the ForcedSources tasks. Were these just incidental?

@@ -114,7 +114,7 @@ class Functor(object):
index levels defined by the `_dfLevels` attribute; by default, this is
`column`.

The `_columnLevels` and `_dfLevels` attributes should generally not need to
The `_dfLevels` attributes should generally not need to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_columnLevels is still mentioned in the documentation above here at line 110

key = lsst.pex.config.Field(
doc="Column on which to join the two input tables on and make the primary key of the output",
dtype=str,
default="objectId",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basic question, but is it absolutely guaranteed that forced_src and forced_diff will have the same objectIds for the same objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not guaranteed. You can construct a pipeline where you seed forced_src with one reference catalog and forced_diff with another. Most likely you'll get 0 objectIds matching. Which reminds me that I should add some contracts to the DRP.yaml so that users can't construct that pipeline.


Transforms each wide, per-detector forcedSource parquet table per the
specification file (per-camera defaults found in ForcedSource.yaml).
All epochs that overlap-the patch are aggregated into one per-patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove dash after 'overlap'.

dimensions=("instrument", "visit", "detector", "skymap", "tract")):

inputCatalog = connectionTypes.Input(
doc="Primary multi-epoch, per-detector, forced photometry catalog. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this only for one visit, not multi-epoch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time I say forced photometry I try to add prepend it with the modifier "multi-epoch" or "multi-band" since we do both and it's confusing. Let's call it single-epoch here.

`detect_isTractInner`,`detect_isPatchInner`, so that user may dedupe for
science or compare duplicates for QA.

The resulting table includes multiple bands. Epochs are other useful
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Epochs and other useful'?


No de-duplication of rows is performed. Duplicate resolutions flags are
pulled in from the referenceCatalog: `detect_isPrimary`,
`detect_isTractInner`,`detect_isPatchInner`, so that user may dedupe for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing 'dedupe' to 'de-duplicate' would be a little easier to understand.

@yalsayyad
Copy link
Contributor Author

Overall this looks good to me. I just have a few questions, mostly about documentation.

I wasn't sure if I should see a connection between the changes in the class PostprocessAnalysis and in functors.py and the main work of adding the ForcedSources tasks. Were these just incidental?

They were some specializations that needed to be removed in order to transfrorm a multi-level DataFrame that wasn't deepCoadd_obj. I split them out into their own commit to make it more clear.

- Functor was originally written to be applied to deepCoadd_obj tables.
  The hard-coded _columnLevels was unnecessary and was removed to
  enable the application of Functors to other multi-level DataFrames
- Remove default coord_ra and coord_dec functors, which pulled from
  the ref dataset only, from PostprocessAnalysis.
This series of tasks produces the ForcedSource table
as specified in the DPDD.
@yalsayyad yalsayyad merged commit ec8a224 into master Sep 16, 2021
@yalsayyad yalsayyad deleted the tickets/DM-28394 branch September 16, 2021 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants