DM-37072: Expand HSC calibs and templates in central repo #44

hsinfang · 2022-12-16T23:25:49Z

No description provided.

kfindeisen

Everything looks reasonable, but MiddlewareInterface makes two assumptions that will break with the new repo. One is a hack that should be easy to patch, but the other is a fundamental assumption in how _export_skymap_and_templates handles tracts/patches. I'm not sure what's the best way to deal with that; can you let me know what you think?

kfindeisen · 2023-01-06T18:00:55Z

python/activator/middleware_interface.py

                                                skymap=self.skymap_name,
-                                                where=template_where))
+                                                where=template_where,
+                                                findFirst=True))


Would it be possible to add an (unticketed) TODO comment for adding findFirst to the calibs query? I think we would want to add it as soon as it's available.

kfindeisen · 2023-01-06T18:52:21Z

bin.src/make_hsc_rc2_export.py

+        # Need all detectors, even those without data, for visit definition
+        contents.saveDataIds(
+            butler.registry.queryDataIds(
+                {"detector"},
+                collections="HSC/RC2/defaults",
+                datasets="raw",
+            ).expanded()
+        )


I don't think this block is necessary here -- the other file had it because ap_verify_ci_cosmos_pdr2 (and similar ap_verify datasets) avoid using full focal planes to keep the download small.

kfindeisen · 2023-01-06T18:57:33Z

bin.src/make_hsc_rc2_export.py

+        logging.debug("Selecting refcats datasets")
+        records = butler.registry.queryDatasets(
+            datasetType=..., collections="refcats/DM-*"
+        )
+        contents.saveDatasets(records)


I don't understand the collections filter on this query. Why not ask for specific collections, like for coadds, or just everything in the refcats chain?

I think we want just everything in the refcats chain. So it's now modified and just uses the chain instead of picking the runs.

kfindeisen · 2023-01-06T19:03:40Z

bin.src/make_hsc_rc2_export.py

+
+        # Save calibration collection
+        for collection in butler.registry.queryCollections(
+            expression=re.compile("^(HSC).*"),


I suggest just searching for "HSC/calib*" -- it's a bit less permissive than "HSC*" (we don't actually need a regular expression to represent this filter...)

kfindeisen · 2023-01-06T19:14:31Z

bin.src/make_hsc_rc2_export.py

+        logging.debug("Selecting datasets in HSC/calib")
+        records = butler.registry.queryDatasets(
+            datasetType=..., collections=re.compile("HSC/calib")
+        )
+        contents.saveDatasets(records)


I realize this is what I asked for, but out of curiosity, how much space do we need to copy all the calibs? Did you look into that with your local test repo?

(Also, unnecessary re.compile).

I estimate ~825G for these calibs

kfindeisen · 2023-01-06T19:38:23Z

bin.src/make_remote_butler.py

+            "HSC/calib/gen2/20180117",
+            "HSC/calib/DM-28636",
+            "HSC/calib/gen2/20180117/unbounded",
+            "HSC/calib/DM-28636/unbounded",


These two "unbounded" collections are runs, not calibration collections; can you check whether you actually need them in the chain? I thought I remembered that they were for something DRP-specific, but now I'm wondering whether it's the problem we tried to solve at https://github.com/lsst-dm/prompt_prototype/blob/main/python/activator/middleware_interface.py#L168 (that code will need to be updated either way, since right now it assumes there's an HSC/calib/unbounded).

Can the other specific collections be replaced with a query for calibration collections that start with HSC/calib/?

Can the other specific collections be replaced with a query for calibration collections that start with HSC/calib/?

Not really, because there are two other CALIBRATION collections in /repo/main with that prefix, and including them would mean including data that aren't needed.

You are right. I don't really need the "unbounded" RUN collections. Datasets in the CALIBRATION collections have pointers to other RUN collections.

From this slack thread it looks to me that https://github.com/lsst-dm/prompt_prototype/blob/main/python/activator/middleware_interface.py#L168 is the right way to go for the moment. To be consistent I'm adding the chain to the new repo (7a817ed )

It also means I'll make prompt processing's central repo more similar to /repo/main , and less similar to ap_verify_ci_cosmos_pdr2.

kfindeisen · 2023-01-06T19:42:18Z

bin.src/make_remote_butler.py

+    # Chain rerun collections to templates
+    current = butler.registry.getCollectionChain("templates")
+    addition = butler.registry.queryCollections("HSC/runs/*",
+                                                collectionTypes=CollectionType.RUN)


I suggest adding a comment that the export script should have guaranteed that there are only coadds in these collections.

kfindeisen · 2023-01-06T20:39:51Z

bin.src/make_hsc_rc2_export.py

+        logging.debug("Selecting skymaps datasets")
+        records = butler.registry.queryDatasets(
+            datasetType=..., collections="skymaps")
+        contents.saveDatasets(records)


Currently, MiddlewareInterface assumes that there is exactly one skymap in the central repository. Would it be possible to edit this script to return a specific skymap used by RC2? (Alternatively, you could edit MiddlewareInterface to be less restrictive, but since that involves rewriting how we identify which tract we want in _export_skymap_and_templates, it would be much harder.)

Now it exports only the skymap used by HSC-RC2.

kfindeisen · 2023-01-10T21:27:20Z

Sorry, the standup meeting reminded me of one more issue: can you please update https://github.com/lsst-dm/prompt_prototype/blob/main/pipelines/HSC/ApPipe.yaml to use coaddName: goodSeeing? I suspect the task configurations in that pipeline also don't match the new calibs and refcats. (The parameters labeled as workarounds for DM-30210 don't need to be updated -- they are obsolete and can be safely deleted.)

hsinfang · 2023-01-11T23:25:07Z

About the repo assumptions from MiddlewareInterface, by importing just one skymap and adding a HSC/calib/unbounded chain collection, I think they should be okay now. Sounds reasonable?

kfindeisen

Thanks for the fixes! My only remaining comment is about the pipeline: I think we need to get rid of the calibrate override and might be able to get rid of the isr one, but please double-check my reasoning.

kfindeisen · 2023-01-11T23:55:18Z

pipelines/HSC/ApPipe.yaml

+  coaddName: goodSeeing
 tasks:
  isr:
    class: lsst.ip.isr.IsrTask


Having looked a bit closer, I'm pretty sure the calibrate override in this file is not appropriate for the new refcats -- the included file looks for catalogs called gaia and panstarrs, which is the ap_verify convention. I'm not sure about the isr override -- do we have all the brighter-fatter and transmission curve HSC calibrations now?

You are right. I confirm that the new repo will have all those calibration data and special config overrides are no longer needed.

I'm leaving the DECam part of the pipeline configs untouched. Despite that we don't use DECam data for testing at the moment, we might in the future (?).

We use the DECam pipeline for unit(ish) testing; the repository for it is in tests/data/central_repo.

kfindeisen · 2023-01-12T00:09:29Z

Sorry, I just spotted one potential problem with the calibs change: it looks like MiddlewareInterface exports HSC/defaults and exports the calibration collections, but it does not export HSC/calib now that it's a chained collection. I suggest adding a line to https://github.com/lsst-dm/prompt_prototype/blob/tickets/DM-37072/python/activator/middleware_interface.py#L458.

Using findFirst so that we can store multiple versions of templates in one chain collection and let butler search for the dataset to use. New collections with the updated templates can be added to the chain collection. The order in the chain collection determines the query results, like the usual butler convention. findFirst is not used in querying calibration because findFirst query in CALIBRATION-type collection is not supported yet.

It makes a butler export file for selected HSC-RC2 dataset.

The current convention is to have a ticket number in the collection names as a workaround to version the unbounded datasets, and have the "HSC/calib/unbounded" (i.e. instrument.makeUnboundedCalibrationRunName()) CHAINED collection point to the latest set.

Throughout the codebase two methods are used and effectively they result in the same name (such as "HSC/defaults"). Just using one same method makes it less confusing.

Previously the top-level calibration collection (e.g. "HSC/calib") in the test repo has been a CALIBRATION collection, so it got exported with other CALIBRATION collections in the above lines. In the new repo, the top-level calibration collection will be a CHAINED collections of selected CALIBRATION collections, so it needs to be exported separatedly.

The new butler repo will have more calibration data in it. Therefore, config overrides in isr and calibrate are no longer needed. We can turn on the default calibrations, and just use ps1_pv3_3pi_20170110 as the reference catalog like in the default ap_pipe configs.

hsinfang force-pushed the tickets/DM-37072 branch 3 times, most recently from 2e37073 to cf5f45e Compare December 19, 2022 22:42

kfindeisen requested changes Jan 6, 2023

View reviewed changes

hsinfang force-pushed the tickets/DM-37072 branch from 9fdc7a1 to b47d98b Compare January 9, 2023 23:25

hsinfang force-pushed the tickets/DM-37072 branch 2 times, most recently from c1ef512 to b181ff5 Compare January 11, 2023 20:58

kfindeisen approved these changes Jan 11, 2023

View reviewed changes

hsinfang added 8 commits January 12, 2023 16:29

Add make_hsc_rc2_export.py script

10296ad

It makes a butler export file for selected HSC-RC2 dataset.

Add an option to make a butler repo specifically for HSC-RC2

d88e281

Remove obsolete workarounds in pipeline yamls

3206327

Switch from using deepCoadd to using goodSeeingCoadd in HSC

c8e76ac

Unify the way to get the default collection name

54b9119

Throughout the codebase two methods are used and effectively they result in the same name (such as "HSC/defaults"). Just using one same method makes it less confusing.

hsinfang force-pushed the tickets/DM-37072 branch from 7a817ed to b9d12d0 Compare January 13, 2023 00:30

hsinfang merged commit 8d18aa6 into main Jan 13, 2023

hsinfang deleted the tickets/DM-37072 branch January 13, 2023 21:02

hsinfang mentioned this pull request Feb 14, 2023

DM-37072: Expand HSC calibs and templates in central repo #47

Merged

DM-37072: Expand HSC calibs and templates in central repo #44

DM-37072: Expand HSC calibs and templates in central repo #44

Uh oh!

Conversation

hsinfang commented Dec 16, 2022

Uh oh!

kfindeisen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfindeisen Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfindeisen commented Jan 10, 2023

Uh oh!

hsinfang commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfindeisen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfindeisen Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfindeisen commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kfindeisen Jan 6, 2023 •

edited

Loading

hsinfang commented Jan 11, 2023 •

edited

Loading

kfindeisen Jan 13, 2023 •

edited

Loading

kfindeisen commented Jan 12, 2023 •

edited

Loading