# Operations Rehearsal for Commissioning April 2024 Data Access Demo

Tutorial notebook for accessing data from the Operations Rehearsal for Commissioning in April 2024 with pointers to other resources, examples, and documentation. Most examples use simulated ComCam data stream.

Last verified to run: 20 Mar 2024

LSST Science Pipelines Version: `w_2024_10`

## What version of the Science Pipelines am I using?

In [None]:
!eups list -s | grep lsst_distrib

For a summary of changes between versions, see https://lsst-dm.github.io/lsst_git_changelog/

Import additional python packages that we will need for this tutorial.

In [None]:
import numpy as np

In [None]:
import matplotlib.pyplot as plt
%matplotlib widget

## How do I find information on data processing campaigns?

See https://confluence.lsstcorp.org/display/DM/Campaigns for the list of productions that the campaign management team has acknowledged as routine and accepted to oversee.

In particular, there is a section for Operations Rehearsal #3 called "DRP on Simulated ComCam data at USDF" for the simulated ComCam data stream.

The repo and collection names are provided, which we will need for the next section.

## How do I instatiate a Butler for data access?

* https://github.com/rubin-dp0/tutorial-notebooks/blob/main/DP02_01_Introduction_to_DP02.ipynb

In [None]:
from lsst.daf.butler import Butler

In [None]:
# One can see all of the collections with the following command
# butler.registry.queryCollections()

In [None]:
repo = '/repo/ops-rehearsal-3-prep'
collection = 'u/homer/htc-test1'
#collections = 'u/homer/w_2024_10/DM-43228'

butler = Butler(repo, collections=collection)
registry = butler.registry

## How do I find the contents of a collection?

In [None]:
# Determine which dataset types exist in the collection
for datasetType in registry.queryDatasetTypes():
    if registry.queryDatasets(datasetType, collections=collection).any(execute=False, exact=False):
        # Limit search results to the data products
        if ('_config' not in datasetType.name) and ('_log' not in datasetType.name) and ('_metadata' not in datasetType.name) and ('_resource_usage' not in datasetType.name):
            print(datasetType)

## Standard Data Model

We will be accessing several tabular data products. A quick reference for the Standard Data Model schema:

https://dm.lsst.org/sdm_schemas/browser/

## How do I get a summary of visits that are included in the collection?

In [None]:
list(butler.registry.queryDatasets('visitTable'))

In [None]:
visitTable = butler.get('visitTable')

In [None]:
visitTable

## How do I get a quick summary of the science performance of the individual visits?

In [None]:
list(butler.registry.queryDatasets('ccdVisitTable'))

In [None]:
ccdVisitTable = butler.get('ccdVisitTable')

In [None]:
ccdVisitTable

In [None]:
ccdVisitTable.columns

In [None]:
#in_band = np.where(ccdVisitTable['band'] == 'g')[0]
in_band = ccdVisitTable['band'] == 'g'
ccdVisitTable['band'][in_band]

In [None]:
f2c = {'u': 'purple', 'g': 'blue', 'r': 'green',
       'i': 'cyan', 'z': 'orange', 'y': 'red'}

plt.figure(dpi=200)
for bandname in f2c:
    in_band = ccdVisitTable['band'] == bandname
    if np.sum(in_band) > 0:
        plt.plot(ccdVisitTable['zenithDistance'][in_band], ccdVisitTable['zeroPoint'][in_band], 
                 'o', markersize=1, color=f2c[bandname], label=bandname)
plt.legend()
plt.xlabel('zenithDistance')
plt.ylabel('zeroPoint')

Notice the outlier detectors that have anomalous zeropoint values relative to the sample.

Example analyses to try
* https://rubinobs.atlassian.net/jira/software/c/projects/DM/boards/174?selectedIssue=DM-43070

## How do I figure out which tracts have data?

In [None]:
for dtype in sorted(registry.queryDatasetTypes(expression="*nImage*")):
    print(dtype.name)

In [None]:
nImage_refs = list(butler.registry.queryDatasets('deepCoadd_nImage'))

In [None]:
tracts = np.unique([ref.dataId['tract'] for ref in nImage_refs])
print(tracts)

bands = np.unique([ref.dataId['band'] for ref in nImage_refs])
print(bands)

In [None]:
# Check which tracts actually have a lot of visit coverage:
for tract in tracts:
    visits = list(butler.registry.queryDatasets('visitSummary', tract=tract, skymap='DC2', findFirst=True))
    print(tract, len(visits))

In [None]:
skymap = butler.get('skyMap', skymap='DC2')
tract = skymap.generateTract(3346)
sp2 = tract.getCtrCoord()
sp2

## How do I access a source table?

In [None]:
sourceTable_refs = sorted(butler.registry.queryDatasets('sourceTable_visit'))
sourceTable = butler.get(sourceTable_refs[0])
sourceTable

In [None]:
sourceTable.columns.values

## How do I access an object table?

In [None]:
objectTable_refs = sorted(butler.registry.queryDatasets('objectTable_tract'))
objectTable = butler.get(objectTable_refs[0])
objectTable

In [None]:
#for column in objectTable.columns.values:
#    print(column)

## How do I access DIA sources?

* https://github.com/lsst-dm/vv-team-notebooks/blob/tickets/PREOPS-4964/notebooks/PREOPS-4964-AuxTel-Lines.ipynb

In [None]:
repoSim = '/sdf/group/rubin/repo/ops-rehearsal-3-prep'
skymapNameSim = 'DC2'
instrumentNameSim = 'LSSTComCamSim'
collectionSim = 'u/homer/htc-test1'
butlerSim = Butler(repoSim, collections=collectionSim, skymap=skymapNameSim)

In [None]:
visitListSim = []
for item in butlerSim.registry.queryDatasets('diaSourceTable'):
    visitListSim.append(item.dataId.get('visit'))
print(len(set(visitListSim)))

In [None]:
testDiaSourceTableSim = butlerSim.get('diaSourceTable', visit=visitListSim[0])

In [None]:
testDiaSourceTableSim

In [None]:
testDiaSourceTableSim.columns

## How do I visualize a pixel-level image?

Suggested references:
* https://github.com/rubin-dp0/tutorial-notebooks/blob/main/DP02_03a_Image_Display_and_Manipulation.ipynb
* https://github.com/rubin-dp0/tutorial-notebooks/blob/main/DP02_03b_Image_Display_with_Firefly.ipynb
* https://github.com/yalsayyad/dm_notebooks/blob/master/examples/Firefly.ipynb

In [None]:
# Find a calexp

In [None]:
calexp_refs = sorted(registry.queryDatasets('calexp', band = 'i'))

In [None]:
len(calexp_refs)

In [None]:
calexp_refs[0]

In [None]:
calexp = butler.get(calexp_refs[0])

In [None]:
import lsst.afw.display as afwDisplay

In [None]:
afwDisplay.setDefaultBackend('matplotlib')

Inline image visualization

In [None]:
fig = plt.figure(figsize=(7, 6))
afw_display = afwDisplay.Display(fig)
afw_display.scale('asinh', 'zscale')
afw_display.mtv(calexp.image)
plt.gca().axis('on')

Interactive data visualization w/ Firefly. This will open a new tab.

In [None]:
afwDisplay.setDefaultBackend('firefly')
afw_display = afwDisplay.Display(frame=1)

In [None]:
afw_display.mtv(calexp)

## How do I access science performance metrics computed as part of the pipeline (Analysis Tools)?

Additional suggested resources:
* https://github.com/lsst-dm/analysis_tools_examples/blob/main/atoolsInvestigation.ipynb
* https://github.com/lsst-dm/analysis_tools_examples/blob/main/reconstructorDemo.ipynb
* https://github.com/lsst-dm/analysis_tools_examples/blob/main/data_access_demo.ipynb

In [None]:
from lsst.analysis.tools.tasks.reconstructor import reconstructAnalysisTools

In [None]:
dataId = {"tract": 7445, "skymap": "DC2"}
#dataId = {"tract": 3346, "skymap": "DC2"}

In [None]:
refs = sorted(butler.registry.queryDatasets("objectTableCore_metrics", collections=collection, dataId=dataId))

In [None]:
refs

In [None]:
objectTable_metrics = butler.get("objectTableCore_metrics", dataId=dataId, collections=collection)

In [None]:
for dtype in sorted(registry.queryDatasetTypes(expression="*analyzeObjectTableCore*")):
    print(dtype.name)

In [None]:
# Access the configuration
objectTable_config = butler.get("analyzeObjectTableCore_config", dataId=dataId, collections=collection)
# objectTable_config.toDict()

In [None]:
# objectTable_metrics.data

In [None]:
taskState, inputData = reconstructAnalysisTools(butler, 
                                                collection=collection,
                                                label="analyzeObjectTableCore",
                                                dataId=dataId, 
                                                callback=None
)

In [None]:
inputData['data']

In [None]:
for action in taskState.atools:
    print(action.identity)

In [None]:
plotInfoDict = {"run": collection, "bands": "i", "tract": 7445, "tableName": "objectTable_tract"}
fig = taskState.atools.shapeSizeFractionalDiff(inputData["data"], plotInfo=plotInfoDict, skymap=inputData['skymap'], band="i")

The brighter-fatter correction was not turned on in "Pass 1" iteration of the data reduction.

## How do I query science performance metrics and system telemetry (Sasquatch and EFD)?

https://sasquatch.lsst.io/user-guide/analysistools.html

https://github.com/lsst-dm/analysis_tools_examples/blob/main/sasquatch_analysis_tools_demo.ipynb

In [None]:
from lsst_efd_client import EfdClient
client = EfdClient("usdfdev_efd", db_name="lsst.dm")

In [None]:
topics = await client.get_topics()
for t in topics: print(t)

In [None]:
query = '''SELECT * FROM "lsst.dm.e1Diff" WHERE time > now() - 100d '''
#query = '''SELECT * FROM "lsst.dm.calexpMetrics" WHERE time > now() - 100d '''
df = await client.influx_client.query(query)
df.columns.values

In [None]:
np.unique(df['dataset_tag'])

## Calexp Metrics

In [None]:
from lsst_efd_client import EfdClient
efd_client = EfdClient("usdfdev_efd", db_name="lsst.dm")
query = '''SELECT * FROM "lsst.dm.calexpMetrics" WHERE time > '2024-03-10' and time < '2024-03-14' '''
res = await efd_client.influx_client.query(query)
res.columns

"""
Results in:

Index(['band', 'band_1', 'dataset_tag', 'dataset_tag_1', 'dataset_type',
       'detector', 'detector_1', 'exposure', 'id', 'instrument',
       'instrument_1', 'patch', 'physical_filter', 'physical_filter_1',
       'psfArea', 'psfSigma', 'reference_package',
       'reference_package_timestamp', 'reference_package_version', 'run',
       'run_1', 'run_timestamp', 'skymap', 'timestamp', 'tract', 'visit',
       'visit_1'],
      dtype='object')
"""

In [None]:
len(res)

In [None]:
res

In [None]:
np.unique(res['band'])

In [None]:
plt.figure()
plt.scatter(res['detector'], res['psfSigma'])

## How do I access science performance plots generated as part of the pipeline (Plot Navigator)?

https://usdf-rsp.slac.stanford.edu/plot-navigator/dashboard_gen3

Try a repo and collection for Operations Rehearsal 3

## How do I create new science performance plots and metrics to be computed as part of the pipeline (Analysis Tools)? 

Suggested resources with examples:

* https://pipelines.lsst.io/v/daily/modules/lsst.analysis.tools/getting-started.html#analysis-tools-getting-started
* https://github.com/lsst-dm/analysis_tools_examples/blob/main/atoolsInvestigation.ipynb
* https://github.com/lsst-dm/analysis_tools_examples/blob/main/data_access_demo.ipynb

## How do I access and visualize survey property maps?

https://github.com/rubin-dp0/tutorial-notebooks/blob/main/DP02_03c_Survey_Property_Maps.ipynb

https://github.com/LSSTDESC/skyproj/tree/main/tutorial

In [None]:
import skyproj

In [None]:
for dtype in sorted(registry.queryDatasetTypes(expression="*consolidated_map*")):
    print(dtype.name)

In [None]:
sorted(registry.queryDatasets('deepCoadd_psf_maglim_consolidated_map_weighted_mean'))

In [None]:
hspmap = butler.get('deepCoadd_psf_maglim_consolidated_map_weighted_mean', band='i', skymap='DC2')

In [None]:
fig, ax = plt.subplots(figsize=(8, 5))
sp = skyproj.McBrydeSkyproj(ax=ax, lon_0=65.0)
sp.draw_hspmap(hspmap)
sp.draw_colorbar(label='PSF Maglim (i-band)')
plt.show()

del fig, ax, sp

The fields are small when viewed at this scale. Try zooming in, for example, on the COSMOS field at (RA, Dec) = (150, 2).