### Test case LVV-T2699: Verify implementation of Catalog Provenance Access

Verify that available catalog data products' provenance can be listed and retrieved.

In [1]:
from lsst.daf.butler import Butler, DatasetProvenance

Initialize the butler, and define the collection (corresponding to w_2025_16 processing) we will use.

In [2]:
collection = "LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344"
butler = Butler("/repo/main", collections=[collection])

#### Define some data dimensions to use for querying datasets:

In [3]:
tract = 5063
patch = 24
visit = 2024110800246
detector = 4

#### Catalogs from single-visit images:

We will demonstrate extraction of provenance information for the `source_unstandardized` (initial calibrated photometry), `source_detector` ("standardized" catalogs from the initial calibrated photometry), `source_all` (conglomeration of all `source_detector` catalogs), and `source2` catalog (final calibrated source catalog from visit images). This is essentially tracking (part of) the progression of detection, measurement, and calibration steps through the pipelines.

For each of these, we will:
1. Extract a catalog from the butler using the desired dataId constraints.
2. Print the table metadata to the screen.
3. Extract the provenance associated with the table, and print it to the screen.

#### source_unstandardized

In [4]:
# Retrieve a catalog:
src_unstd = butler.get('source_unstandardized',
                       dataId={'visit':visit, 'detector': detector})

In [5]:
# Print the table metadata:
src_unstd.meta

{'LSST.BUTLER.ID': '8e6222c0-8bf2-4e8f-9b49-c0f9ab4a8982',
 'LSST.BUTLER.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z',
 'LSST.BUTLER.DATASETTYPE': 'source_unstandardized',
 'LSST.BUTLER.DATAID.DETECTOR': 4,
 'LSST.BUTLER.DATAID.INSTRUMENT': 'LSSTComCam',
 'LSST.BUTLER.DATAID.VISIT': 2024110800246,
 'LSST.BUTLER.QUANTUM': '98b2a9c0-a1ef-416d-96f0-ccf508d1dba4',
 'LSST.BUTLER.INPUT.0.ID': 'b97a5838-24a4-4d83-9e14-c2c7fdd17afd',
 'LSST.BUTLER.INPUT.0.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T134626Z',
 'LSST.BUTLER.INPUT.0.DATASETTYPE': 'visit_summary',
 'LSST.BUTLER.INPUT.1.ID': '31b3f43c-a874-4489-b432-d4b3e30ceb97',
 'LSST.BUTLER.INPUT.1.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T134626Z',
 'LSST.BUTLER.INPUT.1.DATASETTYPE': 'refit_psf_star',
 'LSST.BUTLER.INPUT.2.ID': '2d130454-72f3-4515-a5ab-8bca272d982c',
 'LSST.BUTLER.INPUT.2.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T134626Z',
 'LSST.BUTLER.INPUT.2.DATASETTYPE

We see that this metadata returns the RUN collection where the `source_unstandardized` dataset was created, as well as the component datasets that contributed to it (`visit_summary`, `refit_psf_star`, `preliminary_visit_image_background`, `post_isr_image`, and `background_to_photometric_ratio`), along with information about the RUN collections where these are located. Note that to trace the provenance even further back in the pipeline, one would need to explore `refit_psf_star` and stages that came before that catalog's creation.

Pass this metadata to the `DatasetProvenance` tool from `lsst.daf.butler`.

In [6]:
# Extract the provenance and print to the screen:
src_unstd_prov = DatasetProvenance.from_flat_dict(src_unstd.meta, butler)
src_unstd_prov[0]

DatasetProvenance(inputs=[SerializedDatasetRef(id=UUID('b97a5838-24a4-4d83-9e14-c2c7fdd17afd'), datasetType=SerializedDatasetType(name='visit_summary', storageClass='ExposureCatalog', dimensions=['instrument', 'visit'], parentStorageClass=None, isCalibration=False), dataId=SerializedDataCoordinate(dataId={'instrument': 'LSSTComCam', 'visit': 2024110800246}, records=None), run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T134626Z', component=None), SerializedDatasetRef(id=UUID('31b3f43c-a874-4489-b432-d4b3e30ceb97'), datasetType=SerializedDatasetType(name='refit_psf_star', storageClass='ArrowAstropy', dimensions=['instrument', 'visit'], parentStorageClass=None, isCalibration=False), dataId=SerializedDataCoordinate(dataId={'instrument': 'LSSTComCam', 'visit': 2024110800246}, records=None), run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T134626Z', component=None), SerializedDatasetRef(id=UUID('2d130454-72f3-4515-a5ab-8bca272d982c'), datasetType=SerializedDatasetType(name='

We see that the datasetRefs for the inputs have been returned.

#### source_detector

In [7]:
# Retrieve a catalog:
src_det = butler.get('source_detector',
                     dataId={'visit':visit, 'detector': detector})

In [8]:
# Print the table metadata:
src_det.meta

{'LSST.BUTLER.ID': '74cd43a9-9bd0-4050-970e-3d543789cbd1',
 'LSST.BUTLER.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z',
 'LSST.BUTLER.DATASETTYPE': 'source_detector',
 'LSST.BUTLER.DATAID.DETECTOR': 4,
 'LSST.BUTLER.DATAID.INSTRUMENT': 'LSSTComCam',
 'LSST.BUTLER.DATAID.VISIT': 2024110800246,
 'LSST.BUTLER.QUANTUM': 'f77e3c63-962d-496a-8687-7d22b3d6f21f',
 'LSST.BUTLER.INPUT.0.ID': '8e6222c0-8bf2-4e8f-9b49-c0f9ab4a8982',
 'LSST.BUTLER.INPUT.0.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z',
 'LSST.BUTLER.INPUT.0.DATASETTYPE': 'source_unstandardized'}

In [9]:
# Extract the provenance and print to the screen:
src_det_prov = DatasetProvenance.from_flat_dict(src_det.meta, butler)
src_det_prov

(DatasetProvenance(inputs=[SerializedDatasetRef(id=UUID('8e6222c0-8bf2-4e8f-9b49-c0f9ab4a8982'), datasetType=SerializedDatasetType(name='source_unstandardized', storageClass='ArrowAstropy', dimensions=['instrument', 'detector', 'visit'], parentStorageClass=None, isCalibration=False), dataId=SerializedDataCoordinate(dataId={'instrument': 'LSSTComCam', 'detector': 4, 'visit': 2024110800246}, records=None), run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z', component=None)], quantum_id=UUID('f77e3c63-962d-496a-8687-7d22b3d6f21f'), extras={}),
 DatasetRef(DatasetType('source_detector', {band, instrument, day_obs, detector, physical_filter, visit}, ArrowAstropy), {instrument: 'LSSTComCam', detector: 4, visit: 2024110800246}, run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z', id=74cd43a9-9bd0-4050-970e-3d543789cbd1))

We see that `source_detector` has only the `source_unstandardized` as input, as it is a transformed version of the unstandardized source table.

#### source_all

In [10]:
# Retrieve a catalog:
src_all = butler.get('source_all',
                     dataId={'visit':visit, 'detector': detector})

In [11]:
# Print the table metadata:
src_all.meta

{'LSST.BUTLER.ID': 'dd47a1a7-74b3-4a1c-8e33-9f72d7be2d97',
 'LSST.BUTLER.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z',
 'LSST.BUTLER.DATASETTYPE': 'source_all',
 'LSST.BUTLER.DATAID.INSTRUMENT': 'LSSTComCam',
 'LSST.BUTLER.DATAID.VISIT': 2024110800246,
 'LSST.BUTLER.QUANTUM': '76e85758-ba7d-4653-bf87-90bf0ca97762',
 'LSST.BUTLER.INPUT.0.ID': 'dca387fe-90ef-4b84-bf14-72f81787efc8',
 'LSST.BUTLER.INPUT.0.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z',
 'LSST.BUTLER.INPUT.0.DATASETTYPE': 'source_detector',
 'LSST.BUTLER.INPUT.1.ID': '068665f5-39e8-480b-b8a5-570b7be8e895',
 'LSST.BUTLER.INPUT.1.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z',
 'LSST.BUTLER.INPUT.1.DATASETTYPE': 'source_detector',
 'LSST.BUTLER.INPUT.2.ID': '9cca86cf-3961-4cab-bc57-db9049148a41',
 'LSST.BUTLER.INPUT.2.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z',
 'LSST.BUTLER.INPUT.2.DATASETTYPE': 'source_detector',
 'LSST.BUTLER.INPUT.3

In [12]:
# Extract the provenance and print to the screen:
src_all_prov = DatasetProvenance.from_flat_dict(src_all.meta, butler)
src_all_prov

(DatasetProvenance(inputs=[SerializedDatasetRef(id=UUID('dca387fe-90ef-4b84-bf14-72f81787efc8'), datasetType=SerializedDatasetType(name='source_detector', storageClass='ArrowAstropy', dimensions=['instrument', 'detector', 'visit'], parentStorageClass=None, isCalibration=False), dataId=SerializedDataCoordinate(dataId={'instrument': 'LSSTComCam', 'detector': 0, 'visit': 2024110800246}, records=None), run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z', component=None), SerializedDatasetRef(id=UUID('068665f5-39e8-480b-b8a5-570b7be8e895'), datasetType=SerializedDatasetType(name='source_detector', storageClass='ArrowAstropy', dimensions=['instrument', 'detector', 'visit'], parentStorageClass=None, isCalibration=False), dataId=SerializedDataCoordinate(dataId={'instrument': 'LSSTComCam', 'detector': 1, 'visit': 2024110800246}, records=None), run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z', component=None), SerializedDatasetRef(id=UUID('9cca86cf-3961-4cab-bc57-d

We see that the `source_all` dataset consists of the `source_detector` catalogs from all 9 ComCam detectors.

#### source2

In [13]:
# Retrieve a catalog:
src = butler.get('source2',
                 dataId={'visit':visit, 'detector': detector})

In [14]:
# Print the table metadata:
src.meta

{'LSST.BUTLER.ID': 'b4ba9d6e-aa25-4547-90b3-c9fa368cb00d',
 'LSST.BUTLER.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z',
 'LSST.BUTLER.DATASETTYPE': 'source2',
 'LSST.BUTLER.DATAID.INSTRUMENT': 'LSSTComCam',
 'LSST.BUTLER.DATAID.VISIT': 2024110800246,
 'LSST.BUTLER.QUANTUM': '4dca3549-8f31-4ca3-99e6-c2bcfa4db778',
 'LSST.BUTLER.INPUT.0.ID': 'dd47a1a7-74b3-4a1c-8e33-9f72d7be2d97',
 'LSST.BUTLER.INPUT.0.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z',
 'LSST.BUTLER.INPUT.0.DATASETTYPE': 'source_all'}

In [15]:
# Extract the provenance and print to the screen:
src_prov = DatasetProvenance.from_flat_dict(src.meta, butler)
src_prov

(DatasetProvenance(inputs=[SerializedDatasetRef(id=UUID('dd47a1a7-74b3-4a1c-8e33-9f72d7be2d97'), datasetType=SerializedDatasetType(name='source_all', storageClass='ArrowAstropy', dimensions=['instrument', 'visit'], parentStorageClass=None, isCalibration=False), dataId=SerializedDataCoordinate(dataId={'instrument': 'LSSTComCam', 'visit': 2024110800246}, records=None), run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z', component=None)], quantum_id=UUID('4dca3549-8f31-4ca3-99e6-c2bcfa4db778'), extras={}),
 DatasetRef(DatasetType('source2', {band, instrument, day_obs, physical_filter, visit}, ArrowAstropy), {instrument: 'LSSTComCam', visit: 2024110800246}, run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250421T151307Z', id=b4ba9d6e-aa25-4547-90b3-c9fa368cb00d))

We see that the `source2` table is simply a subset of `source_all`, and thus has only `source_all` as an input.

We have demonstrated that the provenance of catalogs from single-visit images can be retrieved with LSST Science Pipelines tooling.

#### Catalogs from coadds:

We will demonstrate extraction of provenance information for the `object_patch` (initial patch-level compilation of object catalogs), `object_all` (conglomeration of all `object_patch` catalogs for each tract), and `object` catalog (final calibrated object catalog from coadd images). This is essentially tracking (part of) the progression of detection, measurement, and calibration steps through the pipelines.

For each of these, we will:
1. Extract a catalog from the butler using the desired dataId constraints.
2. Print the table metadata to the screen.
3. Extract the provenance associated with the table, and print it to the screen.

#### object_patch

In [16]:
# Retrieve a catalog:
obj_patch = butler.get('object_patch',
                       dataId={'tract':tract, 'patch': patch, 'skymap':'lsst_cells_v1'},
                       parameters={'columns': ['coord_ra', 'coord_dec', 'r_psfFlux', 'r_psfFluxErr']})

In [17]:
# Print the table metadata:
obj_patch.meta

{'LSST.BUTLER.ID': '04e7f8bc-5d3e-4dc4-8238-7b2babeb7dea',
 'LSST.BUTLER.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z',
 'LSST.BUTLER.DATASETTYPE': 'object_patch',
 'LSST.BUTLER.DATAID.PATCH': 24,
 'LSST.BUTLER.DATAID.SKYMAP': 'lsst_cells_v1',
 'LSST.BUTLER.DATAID.TRACT': 5063,
 'LSST.BUTLER.QUANTUM': 'd618b6ce-4e7f-4ed5-b994-44624d5903a2',
 'LSST.BUTLER.INPUT.0.ID': 'e4ba79f0-2578-4d0b-a6c1-875ddc0c6a45',
 'LSST.BUTLER.INPUT.0.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z',
 'LSST.BUTLER.INPUT.0.DATASETTYPE': 'object_epoch',
 'LSST.BUTLER.INPUT.1.ID': '81aaea92-756a-41fd-a7e5-1915e3e34812',
 'LSST.BUTLER.INPUT.1.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z',
 'LSST.BUTLER.INPUT.1.DATASETTYPE': 'object_unstandardized',
 'LSST.BUTLER.INPUT.2.ID': 'e0b42428-0451-4d89-9942-fca9dee981d8',
 'LSST.BUTLER.INPUT.2.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z',
 'LSST.BUTLER.INPUT.2.DATASETTYPE': 'object_ref_

We see that this metadata returns the RUN collection where the `object_patch` dataset was created, as well as the component datasets that contributed to it (`object_epoch`, `object_unstandardized`, `object_ref_measurement`, and `object_sersic_multiprofit`), along with information about the RUN collections where these are located. Note that to trace the provenance even further back in the pipeline, one would need to explore the inputs and the stages that came before those catalogs' creation.

Pass this metadata to the `DatasetProvenance` tool from `lsst.daf.butler`.

In [18]:
# Extract the provenance and print to the screen:
obj_patch_prov = DatasetProvenance.from_flat_dict(obj_patch.meta, butler)
obj_patch_prov[0]

DatasetProvenance(inputs=[SerializedDatasetRef(id=UUID('e4ba79f0-2578-4d0b-a6c1-875ddc0c6a45'), datasetType=SerializedDatasetType(name='object_epoch', storageClass='ArrowAstropy', dimensions=['skymap', 'tract', 'patch'], parentStorageClass=None, isCalibration=False), dataId=SerializedDataCoordinate(dataId={'skymap': 'lsst_cells_v1', 'tract': 5063, 'patch': 24}, records=None), run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z', component=None), SerializedDatasetRef(id=UUID('81aaea92-756a-41fd-a7e5-1915e3e34812'), datasetType=SerializedDatasetType(name='object_unstandardized', storageClass='DataFrame', dimensions=['skymap', 'tract', 'patch'], parentStorageClass=None, isCalibration=False), dataId=SerializedDataCoordinate(dataId={'skymap': 'lsst_cells_v1', 'tract': 5063, 'patch': 24}, records=None), run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z', component=None), SerializedDatasetRef(id=UUID('e0b42428-0451-4d89-9942-fca9dee981d8'), datasetType=SerializedDa

#### object_all

In [19]:
# Retrieve a catalog:
obj_all = butler.get('object_all',
                     dataId={'tract':tract, 'patch': patch, 'skymap':'lsst_cells_v1'},
                     parameters={'columns': ['coord_ra', 'coord_dec', 'r_psfFlux', 'r_psfFluxErr']})

In [20]:
# Print the table metadata:
obj_all.meta

{'LSST.BUTLER.ID': '35a7d966-ffec-4ad9-a887-aa6a7c605760',
 'LSST.BUTLER.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z',
 'LSST.BUTLER.DATASETTYPE': 'object_all',
 'LSST.BUTLER.DATAID.SKYMAP': 'lsst_cells_v1',
 'LSST.BUTLER.DATAID.TRACT': 5063,
 'LSST.BUTLER.QUANTUM': '0ad3e562-dc43-425a-95f5-137303fe2143',
 'LSST.BUTLER.INPUT.0.ID': '9703c45e-761d-414a-9fc6-457e2babbb64',
 'LSST.BUTLER.INPUT.0.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z',
 'LSST.BUTLER.INPUT.0.DATASETTYPE': 'object_patch',
 'LSST.BUTLER.INPUT.1.ID': 'f613665b-9f50-4617-8a25-6d10d0c62c62',
 'LSST.BUTLER.INPUT.1.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z',
 'LSST.BUTLER.INPUT.1.DATASETTYPE': 'object_patch',
 'LSST.BUTLER.INPUT.2.ID': '9027c8fd-f856-4893-8111-906cdb4de534',
 'LSST.BUTLER.INPUT.2.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z',
 'LSST.BUTLER.INPUT.2.DATASETTYPE': 'object_patch',
 'LSST.BUTLER.INPUT.3.ID': 'c7d398b0-559

We see that `object_all` compiles all of the `object_patch` catalogs for the tract.

In [21]:
# Extract the provenance and print to the screen:
obj_all_prov = DatasetProvenance.from_flat_dict(obj_all.meta, butler)
obj_all_prov[0]

DatasetProvenance(inputs=[SerializedDatasetRef(id=UUID('9703c45e-761d-414a-9fc6-457e2babbb64'), datasetType=SerializedDatasetType(name='object_patch', storageClass='ArrowAstropy', dimensions=['skymap', 'tract', 'patch'], parentStorageClass=None, isCalibration=False), dataId=SerializedDataCoordinate(dataId={'skymap': 'lsst_cells_v1', 'tract': 5063, 'patch': 0}, records=None), run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z', component=None), SerializedDatasetRef(id=UUID('f613665b-9f50-4617-8a25-6d10d0c62c62'), datasetType=SerializedDatasetType(name='object_patch', storageClass='ArrowAstropy', dimensions=['skymap', 'tract', 'patch'], parentStorageClass=None, isCalibration=False), dataId=SerializedDataCoordinate(dataId={'skymap': 'lsst_cells_v1', 'tract': 5063, 'patch': 1}, records=None), run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z', component=None), SerializedDatasetRef(id=UUID('9027c8fd-f856-4893-8111-906cdb4de534'), datasetType=SerializedDatasetTyp

#### object

In [22]:
# Retrieve a catalog:
obj = butler.get('object',
                 dataId={'tract':tract, 'patch': patch, 'skymap':'lsst_cells_v1'},
                 parameters={'columns': ['coord_ra', 'coord_dec', 'r_psfFlux', 'r_psfFluxErr']})

In [23]:
# Print the table metadata:
obj.meta

{'LSST.BUTLER.ID': '90c0434a-20f3-46fb-a536-470fee43aae8',
 'LSST.BUTLER.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z',
 'LSST.BUTLER.DATASETTYPE': 'object',
 'LSST.BUTLER.DATAID.SKYMAP': 'lsst_cells_v1',
 'LSST.BUTLER.DATAID.TRACT': 5063,
 'LSST.BUTLER.QUANTUM': 'a38a8568-992c-43e4-b52b-3cf483ea24ad',
 'LSST.BUTLER.INPUT.0.ID': '35a7d966-ffec-4ad9-a887-aa6a7c605760',
 'LSST.BUTLER.INPUT.0.RUN': 'LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z',
 'LSST.BUTLER.INPUT.0.DATASETTYPE': 'object_all'}

In [24]:
# Extract the provenance and print to the screen:
obj_prov = DatasetProvenance.from_flat_dict(obj.meta, butler)
obj_prov

(DatasetProvenance(inputs=[SerializedDatasetRef(id=UUID('35a7d966-ffec-4ad9-a887-aa6a7c605760'), datasetType=SerializedDatasetType(name='object_all', storageClass='ArrowAstropy', dimensions=['skymap', 'tract'], parentStorageClass=None, isCalibration=False), dataId=SerializedDataCoordinate(dataId={'skymap': 'lsst_cells_v1', 'tract': 5063}, records=None), run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z', component=None)], quantum_id=UUID('a38a8568-992c-43e4-b52b-3cf483ea24ad'), extras={}),
 DatasetRef(DatasetType('object', {skymap, tract}, ArrowAstropy), {skymap: 'lsst_cells_v1', tract: 5063}, run='LSSTComCam/runs/DRP/DP1/w_2025_16/DM-50344/20250419T223750Z', id=90c0434a-20f3-46fb-a536-470fee43aae8))

We see that the `object` table is a transformed version of a single input, `object_all`.

## Results

We have demonstrated that catalog data products produced by the LSST Science Pipelines have provenance information associated with them. This provenance is readily retrieved for all catalogs resulting from single-visit or coadd images. The result of this test is thus a *Pass*.