<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=170 style="padding: 10px"> 
<b>Little Demo: Butler 2</b> <br>
Contact author(s): Melissa Graham <br>
Last verified to run: 2024-06-27 <br>
LSST Science Pipelines version: Weekly 2024_16 <br>
Container Size: medium

The `butler` is powerful middleware to query and retrieve LSST data.

This little demo shows how to explore the bulter and find out what data is available.

For more information about the `butler`, see the [butler documentation](https://pipelines.lsst.io/v/weekly/modules/lsst.daf.butler/index.html), the [butler FAQ](https://pipelines.lsst.io/middleware/faq.html#frequently-asked-questions), and [tutorial notebooks](https://github.com/rubin-dp0/tutorial-notebooks) 04a and 04b.

## 1. Set up

Import packages.

In [None]:
import lsst.daf.butler as dafButler

Instantiate the `butler`.

In [None]:
butler = dafButler.Butler('dp02', collections='2.2i/runs/DP0.2')

## 2. The registry

The `registry` contains entries for all data products.

In [None]:
registry = butler.registry

### 2.1. Collections

Data sets in the `registry` are organized into [collections](https://pipelines.lsst.io/v/weekly/modules/lsst.daf.butler/organizing.html#collections).

Find all collections with the word "raw" in their name.

In [None]:
for c in sorted(registry.queryCollections("*raw*")):
    print(c)

### 2.2. Dataset types

The `butler` contains a variety of [dataset types](https://pipelines.lsst.io/middleware/faq.html#querydatasettypes).

For example, print all dataset types with names that start with `calexp`.

In [None]:
for dt in sorted(registry.queryDatasetTypes('calexp*')):
    print(dt)

## 3. The dataId

The `dataId` is a dictionary-like identifier ([dataId documentation](https://pipelines.lsst.io/modules/lsst.daf.butler/dimensions.html#data-ids)).

The `DatasetType` (above) will show the `dimensions` of the `dataId`.

For example, print the `dataId` dimensions for a `calexp`.

In [None]:
dt = registry.getDatasetType('calexp')
print("Name:", dt.name)
print("Dimensions:", dt.dimensions)
print("Storage Class:", dt.storageClass)

### 3.1. Query datasets

A `calexp` `dataId` can be uniquely specified by only visit and detector.

For a `calexp`, define a minimal `dataId` and get the band from the full `dataId`.

In [None]:
dataId = {'visit': 192350, 'detector': 175}
for i, ref in enumerate(registry.queryDatasets('calexp', dataId=dataId)):
    print("band: ", ref.dataId['band'])
    print(' ')
    print('full dataId')
    print(ref.dataId)

Visit alone is insufficient to uniquely specify a `calexp.

Define a `dataId` with only visit and print the number of `calexps` (detectors) for that visit.

In [None]:
dataId = {'visit': 192350}
datasetRefs = set(registry.queryDatasets('calexp', dataId=dataId))
print(f"Found {len(datasetRefs)} detectors for this visit.")

Print the first two `dataId`.

In [None]:
for i, ref in enumerate(datasetRefs):
    if i < 2:
        print(ref.dataId)

## 4. Retrieving data

With `datasetRefs` already defined, retrieve the first two `calexp` in the list.

As an example of accessing metadata, simply print their detector number.

In [None]:
for i, ref in enumerate(datasetRefs):
    if i < 2:
        calexp = butler.get(ref)
        print(i, ' calexp.detector.getId(): ', calexp.detector.getId())
        del calexp