# Pre-bootcamp exercises: accessing data products via butler

**Description:** Introduction to data access with the Butler using a small test dataset from HSC, [rc2_subset](https://github.com/lsst/rc2_subset).

**Contact authors:** Keith Bechtol

**Last verified to run:** 2023-05-02

**LSST Science Piplines version:** w_2023_17

One can use an existing sandbox repo (prepared for this exercise) to bypass processing steps and go straight to demonstrating data access via the Butler, e.g., object tables and source tables. Alternatively, this notebook can be run after executing the data processing steps in `process_rc2_subset.sh` to access the reduced data products in your repo.

Check the version of the stack you are using

In [None]:
!eups list -s | grep lsst_distrib

## Preliminaries

In [None]:
import lsst.daf.butler as dafButler

In [None]:
# Point to existing sandbox repo if you prefer to skip processing steps
collections = ['u/bechtol']
repo = '/sdf/group/rubin/user/bechtol/bootcamp_2023/rc2_subset/SMALL_HSC/'

# User instance of the repo if you have processed rc2_subset yourself
#collections = ['u/%s'%os.environ['USER']]
#repo = '/sdf/group/rubin/user/%s/bootcamp_2023/rc2_subset/SMALL_HSC/'%(os.environ['USER'])

In [None]:
butler = dafButler.Butler(repo, collections=collections)
registry = butler.registry

Check what dataset types are present in the collection

In [None]:
for datasetType in registry.queryDatasetTypes():
    if registry.queryDatasets(datasetType, collections=collections).any(execute=False, exact=False):
        print(datasetType)

## Object tables

In [None]:
refs = sorted(registry.queryDatasets("objectTable_tract"))
print(len(refs))

In [None]:
refs[0].dataId

In [None]:
objectTable = butler.get(refs[0])
objectTable

In [None]:
objectTable.columns.values

## Source tables

In [None]:
refs = sorted(registry.queryDatasets("sourceTable_visit"))

In [None]:
for ref in refs: print(ref.dataId.full)

In [None]:
sourceTable = butler.get(refs[-1])
sourceTable

In [None]:
sourceTable.columns.values

## Run analysis_tools interactively

Demonstration of running analysis tools interactively in a notbeook by passing in-memory data inputs to create metrics and diagnostic plots.

In [None]:
from lsst.analysis.tools.atools import ShapeSizeFractionalDiff
from lsst.analysis.tools.interfaces._task import _StandinPlotInfo
from lsst.analysis.tools.interfaces._actions import NoPlot

In [None]:
atool = ShapeSizeFractionalDiff()
atool.produce.plot.addSummaryPlot = False

# Do not produce plot; only metric values
#atool.produce.plot = NoPlot() 

# This helps simplify some of the configuration
# by ensuring that appropriate keys are set to 
# load columns that are needed in later steps. 
# This happens automatically when an AnalysisTool 
# is used as a single unit.
atool.populatePrepFromProcess() # Needed to run 

Notice that the returned metric values match summary statistics displayed on the plot

In [None]:
results = atool(objectTable, band='i', skymap=None, plotInfo=_StandinPlotInfo())
results