DM-26082: Persist source-to-external reference matched catalogs in pipe_analysis to parquet #293

laurenam · 2020-08-30T20:21:43Z

The scripts in pipe_analysis perform a matching between sources and
external reference catalogs (those used in the calibration stages) to
create comparison plots. This adds datasets to persist those matched
catalogs at the tract/visit level (although any sub-selection of
patch/ccd can be made for a given run) to parquet tables for use in
further QA/validation pursuits.

erykoff

Good, except for the term "denormalized".

erykoff · 2020-09-04T15:21:08Z

policy/datasets.yaml

@@ -1333,6 +1333,14 @@ analysisVisitTable_commonZp:
    storage: ParquetStorage
    python: lsst.pipe.tasks.parquetTable.ParquetTable
    template: plots/%(filter)s/tract-%(tract)d/visit-%(visit)d%(subdir)s/%(tract)d_%(visit)d_commonZp.parq
+analysisMatchRefVisitTable:
+    description: >
+        Per-visit table (for specific tract) of matched and denormalized


I'm not sure what "denormalized" means.

I appreciate your confusion as I had the same when I first encountered this term in the stack. I inherited/adopted it since it is fairly widely used, e.g. https://github.com/search?q=org%3Alsst+denormalized&type=Code
(Note, in particular, meas_astrom/python/lsst/meas/astrom/denormalizeMatches.py & https://github.com/lsst/obs_base/blob/master/policy/datasets.yaml#L676-L684, which may actually suggest I should includeFull in the name 😉).
The crux of it is that a “normalized” catalog only contains data IDs (and maybe a coord) to minimize space when persisting. I’m explicitly referring to these as “demoralized” to indicate that they include the full set of catalog info for both src & ref cats. If this adds more confusion than help, I’m happy to leave out that term (or use a more self-explanatory one if you have a suggestion!).

Can you replace this with denormalized (contains all columns from XX)?

Of course. And looking at the precedent set above, I am leaning strongly towards adding "Full" to the dataset names, i.e

analysisMatchFullRefVisitTable analysisMatchFullRefCoaddTable_forced analysisMatchFullRefCoaddTable_unforced

Do you agree?

erykoff · 2020-09-04T15:21:21Z

policy/datasets.yaml

@@ -1347,6 +1355,22 @@ analysisCoaddTable_unforced:
    storage: ParquetStorage
    python: lsst.pipe.tasks.parquetTable.ParquetTable
    template: plots/%(filter)s/tract-%(tract)d%(subdir)s/%(tract)d_unforced.parq
+analysisMatchRefCoaddTable_forced:
+    description: >
+        Per-tract table of matched and denormalized source-to-external reference


erykoff · 2020-09-04T15:21:30Z

policy/datasets.yaml

+    template: plots/%(filter)s/tract-%(tract)d%(subdir)s/%(tract)d_matchRef_forced.parq
+analysisMatchRefCoaddTable_unforced:
+    description: >
+        Per-tract table of matched and denormalized source-to-external reference


The scripts in pipe_analysis perform a matching between sources and external reference catalogs (those used in the calibration stages) to create comparison plots. This adds datasets to persist those matched catalogs at the tract/visit level (although any sub-selection of patch/ccd can be made for a given run) to parquet tables for use in further QA/validation pursuits. The persisted tables are "denormalized", i.e. contain all fields from the original source and external catalogs (but with "src_" and "ref_" prefixes on the column names).

laurenam force-pushed the tickets/DM-26082 branch from 065f786 to 0ba124f Compare September 1, 2020 18:01

erykoff approved these changes Sep 4, 2020

View reviewed changes

laurenam force-pushed the tickets/DM-26082 branch from 0ba124f to b678b4e Compare September 9, 2020 18:49

laurenam force-pushed the tickets/DM-26082 branch from b678b4e to 7ef4920 Compare September 9, 2020 19:08

laurenam merged commit 6fa9fb8 into master Sep 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-26082: Persist source-to-external reference matched catalogs in pipe_analysis to parquet #293

DM-26082: Persist source-to-external reference matched catalogs in pipe_analysis to parquet #293

laurenam commented Aug 30, 2020

erykoff left a comment

erykoff Sep 4, 2020

laurenam Sep 5, 2020

erykoff Sep 9, 2020

laurenam Sep 9, 2020

erykoff Sep 4, 2020

erykoff Sep 4, 2020

DM-26082: Persist source-to-external reference matched catalogs in pipe_analysis to parquet #293

DM-26082: Persist source-to-external reference matched catalogs in pipe_analysis to parquet #293

Conversation

laurenam commented Aug 30, 2020

erykoff left a comment

Choose a reason for hiding this comment

erykoff Sep 4, 2020

Choose a reason for hiding this comment

laurenam Sep 5, 2020

Choose a reason for hiding this comment

erykoff Sep 9, 2020

Choose a reason for hiding this comment

laurenam Sep 9, 2020

Choose a reason for hiding this comment

erykoff Sep 4, 2020

Choose a reason for hiding this comment

erykoff Sep 4, 2020

Choose a reason for hiding this comment