<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=250 style="padding: 10px"> 
<b>Butler Access for Images and Catalogs</b> <br>
Contact author(s): Alex Drlica-Wagner <br>
Last verified to run: 2022-06-03 <br>
LSST Science Piplines version: Weekly 2022_22 <br>
Container Size: medium <br>
Targeted learning level: beginner <br>

_While developing, use the following code cell to check that the code conforms to standards, but then delete the cell and "Kernel --> Restart Kernel and Clear All Outputs" before saving and committing._

In [None]:
#%load_ext pycodestyle_magic
#%flake8_on
#import logging
#logging.getLogger("flake8").setLevel(logging.FATAL)

**Description:** Use the Butler to access a cutout image and associated catalog. Plot both.

**Skills:** Butler queries, image and catalog plotting

**LSST Data Products:** _List the all of the LSST catalogs and images used._

**Packages:** lsst.daf.butler, lsst.afw.display

**Credit:** Based on notebooks developed by Alex Drlica-Wagner in the context of the LSST Stack Club and Michael Wood-Vasey for DC2. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.



**Get Support:**
Find DP0-related documentation and resources at <a href="https://dp0-1.lsst.io">dp0-1.lsst.io</a>. Questions are welcome as new topics in the <a href="https://community.lsst.org/c/support/dp0">Support - Data Preview 0 Category</a> of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

## 1. Introduction

This notebook demonstrates how to access a cutout image and the associated source catalog overlapping that image. It then demonstrates how to plot the image and catalog together.

### 1.1 Package Imports

In [None]:
# general python packages
import numpy as np
import matplotlib.pyplot as plt

# LSST packages
import lsst.daf.butler as dafButler
import lsst.afw.display as afwDisplay
import lsst.geom as geom
import lsst.sphgeom
import lsst.afw.coord as afwCoord

# Set plotting configuration
plt.style.use('tableau-colorblind10')
plt.rcParams['figure.figsize'] = (8.0, 8.0)
afwDisplay.setDefaultBackend('matplotlib')

## 2. Retrieve Image Cutout from the Butler

We start by creating a butler pointed at the DP0.2 collection.

In [None]:
repo='dp02'
collections="2.2i/runs/DP0.2"
butler = dafButler.Butler(repo,collections=collections)

We will want the skymap object to lookup the processing units associated with specific sky coordinates.

In [None]:
skymap = butler.get("skyMap")

We use the LSST Science Pipelines geometry (`import lsst.geom as geom`) and coordinate (`import lsst.afw.coord as afwCoord`) packages to identify the tract and patch containing our location of interest. We define a bounding box of the desired size of our cutout. Then we pass this information to the butler to return a cutout at our specified location.

In [None]:
# Define our coordinate and other cutout information
ra, dec = 55.064, -29.783
band = 'r'
cutoutSideLength=201

# Convert to LSST geometry objects
radec = geom.SpherePoint(ra, dec, geom.degrees)
cutoutSize = geom.ExtentI(cutoutSideLength, cutoutSideLength)

# Look up the tract, patch for the RA, Dec
tractInfo = skymap.findTract(radec)
patchInfo = tractInfo.findPatch(radec)
patch = tractInfo.getSequentialPatchIndex(patchInfo)

# Define the bounding box of our cutout
xy = geom.PointI(tractInfo.getWcs().skyToPixel(radec))
bbox = geom.BoxI(xy - cutoutSize//2, cutoutSize)

# Package everything up to be passed to the butler
coaddId = {'tract': tractInfo.getId(), 'patch': patch, 'band': band}
parameters = {'bbox': bbox}

In [None]:
# Retrieve the image
print("Retrieving image from:\n",coaddId)
datasetType='deepCoadd'
cutout_image = butler.get(datasetType, parameters=parameters,
                          dataId=coaddId)
print("The size of the cutout is: ", cutout_image.image.array.shape)

In [None]:
# Display the image cutout
fig = plt.figure()
afw_display = afwDisplay.Display(1)
afw_display.scale('asinh', 'zscale')
afw_display.mtv(cutout_image.image)
#plt.gca().axis('off')

Note that the cutout image retains the x,y coordinate indexes from the tract that it was extracted from.

## 3. Retrieve catalog data from the Butler

The TAP service is the recommended way to retrieve catalog data for a notebook, and there are several other [tutuorials](https://github.com/rubin-dp0/tutorial-notebooks) that demonstrate how to use the TAP service.

However, as we saw above, the Butler can also be used to access to catalog data. We can investigate the table schema for a specific source catalog by  Butler appending `_schema` to the `datasetType`. Note that this does not require you to specify the ``dataId``. 


In [None]:
schema_coadd_src = butler.get('deepCoadd_forced_src_schema')
schema_coadd_src.asAstropy()

The table `schema` stores information about the columns stored in the table. Each of the following lines will print the schema to the screen in different ways.

In [None]:
# schema_coadd_src.schema
# schema_coadd_src.schema.getNames()
# schema_coadd_src.schema.getOrderedNames()
print('Number of columns in this table = ', 
      len(schema_coadd_src.schema.getNames()))

Perhaps you want to search for all schema elements that contain the term 'psf'.

In [None]:
# Define an array that is all of the column names
all_names = schema_coadd_src.schema.getOrderedNames()

# Loop over the names and look for the term 'psf'
for i, name in enumerate(all_names):
    if name.find('psf') >= 0:
        print(i, name)
del all_names

Probably you will want to know more about the values in these columns. You can do that by printing the documentation string in the schema.

In [None]:
# Turn the schema into a python dictionary, to be able to call a column by name
schema_dict = schema_coadd_src.schema.extract('*')

# Print the associated docstring for each of the named columns of interest
for name in ['base_SdssShape_psf_xx', 'base_SdssShape_psf_yy',
             'base_SdssShape_psf_xy']:
    doc = schema_dict[name].getField().getDoc()
    units = schema_dict[name].getField().getUnits()
    print(name, '[%s]'%units, ' = ', doc)

Refer to the Data Products Definitions Document (DPDD) at [dp0-2.lsst.io](https://dp0-2.lsst.io) to find out more about the columns.

The full catalogs are very large and it is not feasible to try and retrieve them in their entirety. Instead, we retrieve only the catalog data for the same tract and patch that we grabbed the cutout from (specified in `coaddId`).

In [None]:
print("Retrieving catalog from:\n",coaddId)
datasetType = 'deepCoadd_forced_src'
coadd_src = butler.get(datasetType, dataId=coaddId)
print(f"Retrieved {len(coadd_src)} catalog objects")

In [None]:
# Show the table contents if desired
# coadd_src.asAstropy()

Convert the source catalog to a Pandas dataframe (see the first tutorial) for easy interaction.
The following cells offer options for printing the column names or the data values.

In [None]:
data = coadd_src.asAstropy().to_pandas()
print(data.columns)
data.head()

We can now plot the locations of sources in the patch that we requested. Note that the `coord_ra` and `coord_dec` are in radians, so we need to convert them to degrees. We also plot a star at the location of our target ra, dec.

In [None]:
fig = plt.figure()
plt.plot(np.degrees(coadd_src['coord_ra']),
         np.degrees(coadd_src['coord_dec']),
         'o', ms=2, alpha=0.5)
plt.plot(ra, dec, '*', ms=25, mec='k')
plt.xlabel('RA (deg)')
plt.ylabel('Dec (deg)')
plt.title('Butler coadd_forced_src objects in tract 4638 patch 43')

As we noted, the `coord_ra` and `coord_dec` columns have units of _radians_. As an exercise, you could use what you've learned from above to confirm this by accessing the table schema. (Also note that you can scroll up and find the answer in the outputs from a cell you already executed.) 

We can also select only the catalog sources within our cutout and plot them.

In [None]:
sel = np.array([bbox.contains(s.getX(),s.getY()) for s in coadd_src])
cutout_src = coadd_src[sel]

In [None]:
fig = plt.figure()
plt.plot(np.degrees(cutout_src['coord_ra']),
         np.degrees(cutout_src['coord_dec']),
         'o', ms=4, alpha=0.5)
plt.plot(ra, dec, '*', ms=25, mec='k')
plt.xlabel('RA (deg)')
plt.ylabel('Dec (deg)')
plt.title('Butler coadd_forced_src objects in tract 4638 patch 43')

## 3. Overplot Catalog and Image

Finally, we want to put everything together to plot the locations of catalog source on top of our cutout image. We already have the image and source catalog, so this is just an excercise in plotting.

In [None]:
# Display the image cutout
fig = plt.figure()
afw_display = afwDisplay.Display(1)
afw_display.scale('asinh', 'zscale')
afw_display.mtv(cutout_image.image)
plt.gca().axis('off')

# We use display buffering to avoid re-drawing the image
#  after each source is plotted
with afw_display.Buffering():
    for s in cutout_src :
        afw_display.dot('+', s.getX(), s.getY(), ctype=afwDisplay.RED)
        afw_display.dot('o', s.getX(), s.getY(), size=20, ctype='orange')

It looks like we've succeeded in detection the bright sources; however there are also many catalog sources with little detectable signal in the image. To clean things up, let's apply a signal-to-noise cut on the flux of the sources.

In [None]:
# Display the image cutout
fig = plt.figure()
afw_display = afwDisplay.Display(1)
afw_display.scale('asinh', 'zscale')
afw_display.mtv(cutout_image.image)
plt.gca().axis('off')

# We use display buffering to avoid re-drawing the image
#  after each source is plotted
with afw_display.Buffering():
    for s in cutout_src :
        if s['base_SdssShape_instFlux']/s['base_SdssShape_instFluxErr'] < 10: continue
        afw_display.dot('+', s.getX(), s.getY(), ctype=afwDisplay.RED)
        afw_display.dot('o', s.getX(), s.getY(), size=20, ctype='orange')