-
Notifications
You must be signed in to change notification settings - Fork 0
DM-35051: Allow Prompt Processing to set up a local repo with existing files #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
021967c
Fix typo in raw upload script.
kfindeisen 4d48020
Refactor WCS calculation from prep_butler.
kfindeisen 54cbce9
Refactor bounding circle calculation from prep_butler.
kfindeisen ad7aa53
Refactor calculations out of prep_butler's export/import block.
kfindeisen aab8e93
Actually test for dataset existence when calling datasetExists.
kfindeisen ffe6aca
Factor out queries from test_prep_butler.
kfindeisen 2613a51
Implement _query_missing_datasets.
kfindeisen 8e89eb9
Use _query_missing_datasets to skip redundant export/imports.
kfindeisen 76f00a9
Allow _check_imports to have a variable detector.
kfindeisen 6d14661
Allow _check_imports to have variable shards.
kfindeisen cfc04fc
Expand double-registration test.
kfindeisen 4475078
Add string representation for Visit.
kfindeisen 9396b46
Clarify image-specific log messages.
kfindeisen e80460a
Handle clean failure when all exposures time out.
kfindeisen 6694101
Add debugging logs for datasets.
kfindeisen 9a74163
Log local repository creation.
kfindeisen 57ef059
Refresh local butler before prepping it.
kfindeisen c00d100
Remove TODO that was addressed on Confluence.
kfindeisen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,6 +21,8 @@ | |
|
||
__all__ = ["MiddlewareInterface"] | ||
|
||
import collections.abc | ||
import itertools | ||
import logging | ||
import os | ||
import os.path | ||
|
@@ -144,33 +146,75 @@ def _init_ingester(self): | |
self.rawIngestTask = lsst.obs.base.RawIngestTask(config=config, | ||
butler=self.butler) | ||
|
||
def _predict_wcs(self, detector: lsst.afw.cameraGeom.Detector, visit: Visit) -> lsst.afw.geom.SkyWcs: | ||
kfindeisen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Calculate the expected detector WCS for an incoming observation. | ||
|
||
Parameters | ||
---------- | ||
detector : `lsst.afw.cameraGeom.Detector` | ||
The detector for which to generate a WCS. | ||
visit : `Visit` | ||
Predicted observation metadata for the detector. | ||
|
||
Returns | ||
------- | ||
wcs : `lsst.afw.geom.SkyWcs` | ||
An approximate WCS for ``visit``. | ||
""" | ||
boresight_center = lsst.geom.SpherePoint(visit.ra, visit.dec, lsst.geom.degrees) | ||
orientation = lsst.geom.Angle(visit.rot, lsst.geom.degrees) | ||
flip_x = True if self.instrument.getName() == "DECam" else False | ||
kfindeisen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return lsst.obs.base.createInitialSkyWcsFromBoresight(boresight_center, | ||
orientation, | ||
detector, | ||
flipX=flip_x) | ||
|
||
def _detector_bounding_circle(self, detector: lsst.afw.cameraGeom.Detector, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is another good one to have refactored out, but I wonder if it shouldn't live in afw? Maybe in |
||
wcs: lsst.afw.geom.SkyWcs | ||
) -> (lsst.geom.SpherePoint, lsst.geom.Angle): | ||
# Could return a sphgeom.Circle, but that would require a lot of | ||
# sphgeom->geom conversions downstream. Even their Angles are different! | ||
kfindeisen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Compute a small sky circle that contains the detector. | ||
kfindeisen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Parameters | ||
---------- | ||
detector : `lsst.afw.cameraGeom.Detector` | ||
The detector for which to compute an on-sky bounding circle. | ||
wcs : `lsst.afw.geom.SkyWcs` | ||
The conversion from detector to sky coordinates. | ||
|
||
Returns | ||
------- | ||
center : `lsst.geom.SpherePoint` | ||
The center of the bounding circle. | ||
radius : `lsst.geom.Angle` | ||
The opening angle of the bounding circle. | ||
""" | ||
radii = [] | ||
center = wcs.pixelToSky(detector.getCenter(lsst.afw.cameraGeom.PIXELS)) | ||
for corner in detector.getCorners(lsst.afw.cameraGeom.PIXELS): | ||
radii.append(wcs.pixelToSky(corner).separation(center)) | ||
return center, max(radii) | ||
|
||
def prep_butler(self, visit: Visit) -> None: | ||
"""Prepare a temporary butler repo for processing the incoming data. | ||
|
||
Parameters | ||
---------- | ||
visit : Visit | ||
visit : `Visit` | ||
Group of snaps from one detector to prepare the butler for. | ||
""" | ||
_log.info(f"Preparing Butler for visit '{visit}'") | ||
_log.info(f"Preparing Butler for visit {visit!r}") | ||
|
||
detector = self.camera[visit.detector] | ||
wcs = self._predict_wcs(detector, visit) | ||
center, radius = self._detector_bounding_circle(detector, wcs) | ||
kfindeisen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Need up-to-date census of what's already present. | ||
self.butler.registry.refresh() | ||
|
||
with tempfile.NamedTemporaryFile(mode="w+b", suffix=".yaml") as export_file: | ||
with self.central_butler.export(filename=export_file.name, format="yaml") as export: | ||
boresight_center = lsst.geom.SpherePoint(visit.ra, visit.dec, lsst.geom.degrees) | ||
orientation = lsst.geom.Angle(visit.rot, lsst.geom.degrees) | ||
detector = self.camera[visit.detector] | ||
flip_x = True if self.instrument.getName() == "DECam" else False | ||
wcs = lsst.obs.base.createInitialSkyWcsFromBoresight(boresight_center, | ||
orientation, | ||
detector, | ||
flipX=flip_x) | ||
# Compute the maximum sky circle that contains the detector. | ||
radii = [] | ||
center = wcs.pixelToSky(detector.getCenter(lsst.afw.cameraGeom.PIXELS)) | ||
for corner in detector.getCorners(lsst.afw.cameraGeom.PIXELS): | ||
radii.append(wcs.pixelToSky(corner).separation(center)) | ||
radius = max(radii) | ||
|
||
self._export_refcats(export, center, radius) | ||
self._export_skymap_and_templates(export, center, detector, wcs) | ||
self._export_calibs(export, visit.detector, visit.filter) | ||
|
@@ -209,11 +253,14 @@ def _export_refcats(self, export, center, radius): | |
# collection, so we have to specify a list here. Replace this | ||
# with another solution ASAP. | ||
possible_refcats = ["gaia", "panstarrs", "gaia_dr2_20200414", "ps1_pv3_3pi_20170110"] | ||
export.saveDatasets(self.central_butler.registry.queryDatasets( | ||
possible_refcats, | ||
collections=self.instrument.makeRefCatCollectionName(), | ||
where=htm_where, | ||
findFirst=True)) | ||
refcats = set(_query_missing_datasets( | ||
self.central_butler, self.butler, | ||
possible_refcats, | ||
collections=self.instrument.makeRefCatCollectionName(), | ||
where=htm_where, | ||
findFirst=True)) | ||
_log.debug("Found %d new refcat datasets.", len(refcats)) | ||
export.saveDatasets(refcats) | ||
|
||
def _export_skymap_and_templates(self, export, center, detector, wcs): | ||
"""Export the skymap and templates for this visit from the central | ||
|
@@ -232,12 +279,12 @@ def _export_skymap_and_templates(self, export, center, detector, wcs): | |
""" | ||
# TODO: This exports the whole skymap, but we want to only export the | ||
# subset of the skymap that covers this data. | ||
# TODO: We only want to import the skymap dimension once in init, | ||
# otherwise we get a UNIQUE constraint error when prepping for the | ||
# second visit. | ||
export.saveDatasets(self.central_butler.registry.queryDatasets("skyMap", | ||
collections=self._COLLECTION_SKYMAP, | ||
findFirst=True)) | ||
skymaps = set(_query_missing_datasets(self.central_butler, self.butler, | ||
"skyMap", | ||
collections=self._COLLECTION_SKYMAP, | ||
findFirst=True)) | ||
_log.debug("Found %d new skymap datasets.", len(skymaps)) | ||
export.saveDatasets(skymaps) | ||
# Getting only one tract should be safe: we're getting the | ||
# tract closest to this detector, so we should be well within | ||
# the tract bbox. | ||
|
@@ -253,9 +300,12 @@ def _export_skymap_and_templates(self, export, center, detector, wcs): | |
# TODO: alternately, we need to extract it from the pipeline? (best?) | ||
# TODO: alternately, can we just assume that there is exactly | ||
# one coadd type in the central butler? | ||
export.saveDatasets(self.central_butler.registry.queryDatasets("*Coadd", | ||
collections=self._COLLECTION_TEMPLATE, | ||
where=template_where)) | ||
templates = set(_query_missing_datasets(self.central_butler, self.butler, | ||
"*Coadd", | ||
collections=self._COLLECTION_TEMPLATE, | ||
where=template_where)) | ||
_log.debug("Found %d new template datasets.", len(templates)) | ||
export.saveDatasets(templates) | ||
|
||
def _export_calibs(self, export, detector_id, filter): | ||
"""Export the calibs for this visit from the central butler. | ||
|
@@ -272,17 +322,47 @@ def _export_calibs(self, export, detector_id, filter): | |
# TODO: we can't filter by validity range because it's not | ||
# supported in queryDatasets yet. | ||
calib_where = f"detector={detector_id} and physical_filter='{filter}'" | ||
calibs = set(_query_missing_datasets( | ||
self.central_butler, self.butler, | ||
..., | ||
collections=self.instrument.makeCalibrationCollectionName(), | ||
where=calib_where)) | ||
if calibs: | ||
for dataset_type, n_datasets in self._count_by_type(calibs): | ||
_log.debug("Found %d new calib datasets of type '%s'.", n_datasets, dataset_type) | ||
else: | ||
_log.debug("Found 0 new calib datasets.") | ||
export.saveDatasets( | ||
self.central_butler.registry.queryDatasets( | ||
..., | ||
collections=self.instrument.makeCalibrationCollectionName(), | ||
where=calib_where), | ||
calibs, | ||
elements=[]) # elements=[] means do not export dimension records | ||
target_types = {CollectionType.CALIBRATION} | ||
for collection in self.central_butler.registry.queryCollections(..., | ||
collectionTypes=target_types): | ||
export.saveCollection(collection) | ||
|
||
@staticmethod | ||
def _count_by_type(refs): | ||
"""Count the number of dataset references of each type. | ||
|
||
Parameters | ||
---------- | ||
refs : iterable [`lsst.daf.butler.DatasetRef`] | ||
The references to classify. | ||
|
||
Yields | ||
------ | ||
type : `str` | ||
The name of a dataset type in ``refs``. | ||
count : `int` | ||
The number of elements of type ``type`` in ``refs``. | ||
""" | ||
def get_key(ref): | ||
return ref.datasetType.name | ||
|
||
ordered = sorted(refs, key=get_key) | ||
for k, g in itertools.groupby(ordered, key=get_key): | ||
yield k, len(list(g)) | ||
|
||
def _prep_collections(self): | ||
"""Pre-register output collections in advance of running the pipeline. | ||
""" | ||
|
@@ -418,4 +498,40 @@ def run_pipeline(self, visit: Visit, exposure_ids: set) -> None: | |
# If this is a fresh (local) repo, then types like calexp, | ||
# *Diff_diaSrcTable, etc. have not been registered. | ||
result = executor.run(register_dataset_types=True) | ||
_log.info(f"Pipeline successfully run on {len(result)} quanta.") | ||
_log.info(f"Pipeline successfully run on {len(result)} quanta for " | ||
f"detector {visit.detector} of {exposure_ids}.") | ||
|
||
|
||
def _query_missing_datasets(src_repo: Butler, dest_repo: Butler, | ||
*args, **kwargs) -> collections.abc.Iterable[lsst.daf.butler.DatasetRef]: | ||
"""Return datasets that are present in one repository but not another. | ||
|
||
Parameters | ||
---------- | ||
src_repo : `lsst.daf.butler.Butler` | ||
The repository in which a dataset must be present. | ||
dest_repo : `lsst.daf.butler.Butler` | ||
The repository in which a dataset must not be present. | ||
*args, **kwargs | ||
Parameters for describing the dataset query. They have the same | ||
meanings as the parameters of `lsst.daf.butler.Registry.queryDatasets`. | ||
|
||
Returns | ||
------- | ||
datasets : iterable [`lsst.daf.butler.DatasetRef`] | ||
The datasets that exist in ``src_repo`` but not ``dest_repo``. | ||
""" | ||
try: | ||
known_datasets = set(dest_repo.registry.queryDatasets(*args, **kwargs)) | ||
except lsst.daf.butler.registry.DataIdValueError as e: | ||
_log.debug("Pre-export query with args '%s, %s' failed with %s", | ||
", ".join(repr(a) for a in args), | ||
", ".join(f"{k}={v!r}" for k, v in kwargs.items()), | ||
e) | ||
# If dimensions are invalid, then *any* such datasets are missing. | ||
known_datasets = set() | ||
|
||
# Let exceptions from src_repo query raise: if it fails, that invalidates | ||
# this operation. | ||
return itertools.filterfalse(lambda ref: ref in known_datasets, | ||
src_repo.registry.queryDatasets(*args, **kwargs)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.