<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=250 style="padding: 10px"> 
<b>Introduction to Jupyter Notebooks for Data Preview 0.2</b> <br>
Contact author: Melissa Graham <br>
Last verified to run: <i>yyyy-mm-dd</i> <br>
LSST Science Piplines version: Weekly <i>yyyy_xx</i> <br>
Container size: medium <br>
Targeted learning level: beginner <br>

In [None]:
# %load_ext pycodestyle_magic
# %flake8_on
# import logging
# logging.getLogger("flake8").setLevel(logging.FATAL)

**Description:** An introduction to using Jupyter Notebooks and Rubin python packages to access LSST data products (images and catalogs).

**Skills:** Execute python code in a Jupyter Notebook. Use the TAP service to retreive Object catalog data. Use the Butler to retrieve and display a deepCoadd image.

**LSST Data Products:** TAP dp02_dc2_catalogs.Object table. Butler deepCoadd image.

**Packages:** lsst.rsp.get_tap_service, lsst.rsp.retrieve_query, lsst.daf.butler, lsst.afw.display, lsst.geom, pandas, matplotlib

**Credit:** Originally developed by Melissa Graham and the Rubin Community Engagement Team in the context of the Rubin DP0.1. Please consider acknowledging Melissa Graham if this notebook is used for the preparation of journal articles or software releases.

**Get Support:**
Find DP0-related documentation and resources at <a href="https://dp0-2.lsst.io">dp0-2.lsst.io</a>. Questions are welcome as new topics in the <a href="https://community.lsst.org/c/support/dp0">Support - Data Preview 0 Category</a> of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

## 1.0. Introduction

This Jupyter Notebook provides an introduction to how notebooks work. It demonstrates how to execute code and markdown text cells, how to import Python packages and learn about their modules, and provides links to further documentation.

This Notebook also demonstrates the basic functionality of the Rubin Science Platform (RSP) installed at the Interim Data Facility (IDF; the Google Cloud), such as how to use the TAP service to query and retrieved catalog data; matplotlib to plot catalog data; the LSST Butler package to query and retrieve image data; and the LSST afwDisplay package to display images.

This Notebook uses the Data Preview 0.2 (DP0.2) data set. This data set uses a subset of the DESC's Data Challenge 2 (DC2) simulated images, which have been *reprocessed* by Rubin Observatory using Version 23 of the LSST Science Pipelines. More information about the simulated data can be found in the <a href="https://ui.adsabs.harvard.edu/abs/2021ApJS..253...31L/abstract">DESC's DC2 paper</a> and in the <a href="dp0-2.lsst.io">DP0.2 data release documentation</a>.

### 1.1. How to Use a Jupyter Notebook

Jupyter Notebooks contain a mix of code, output, visualizations, and narrative text. The most comprehensive source for documentation about Jupyter Notebooks is https://jupyter-notebook.readthedocs.io, but there are many great beginner-level tutorials and demos out there. Usually a web search of a question, like "how to make a table in markdown jupyter notebook", will yield several good examples. Often the answers will be found in <a href="https://stackoverflow.com/">StackOverflow</a>.

A Jupyter Notebook is a series of cells. There are three types of cells: code, markdown, and raw. This text was generated from a markdown cell. Up in the menu bar you will find a drop-down menu to set the cell type.

> **Important:** All of the code cells in a notebook should be executed in the order that they appear.

Click in the following code cell: with the cursor in the cell, simultaneously press "shift" and "enter" (or "return") to execute the cell code.

In [None]:
# This is a code cell. Press shift-enter to execute.
# The # makes these lines comments, not code. They are not executed.
print('Hello, world!')

<!---
This is a markdown cell.
Press shift-enter to execute, and see the formatted text reappear.
-->

Double click on THESE WORDS IN THIS MARKDOWN CELL to see the markdown source code.

#### FAQ (Jupyter Notebooks)

>**Important: How do I emergency-stop a notebook?** If a code cell is taking a long time to execute (for example you accidentally tried to retrieve an entire catalog or tried to print 100,000 rows) and you need to kill it, go to "Kernel" in the top menu bar and select "Restart Kernel and Clear All Outputs". It might still take a few tens of seconds, but it will stop the process and restart the kernel.

**How can I quickly execute all the cells?** Go to the top menu bar and select "Kernel", then "Restart Kernel and Run All Cells".

**What is a kernel?** The kernel is the computational engine for the notebook. In this case, we are using a Python3 kernel. You can think of the kernel as a live compiler. Restarting the kernel means that all defined variables or functions are removed from memory, and all code cells revert to an "unexecuted" state.

**How can I view a table of contents for this notebook?** Click on the icon of a bullet list in the leftmost vertical menu bar, and an automatically-generated ToC will appear at left. Click on the icon of the file folder at the top of the leftmost verticle menu bar to return to a directory view.

**Which version of the LSST Science Pipelines am I using?** Look along the bottom bar of this browser window, and find the version of the LSST Science Pipelines that was selected as the "image". It is probably "Recommended (Weekly yyyy_ww)", and it should match the verified version listed in the notebook's header. Alternatively, uncomment the two lines in the following code cell and execute the cell.

In [None]:
# ! echo $IMAGE_DESCRIPTION
# ! eups list -s | grep lsst_distrib

### 1.2. Package Imports

Most Jupyter Notebooks start out by importing all the packages they will need in the first code cell.

Complete knowledge of these packages is not required in order to complete this tutorial, but here is a bit of basic information and some links for further learning.

**numpy**: A fundamental package for scientific computing with arrays in Python. Its comprehensive documentation is available at <a href="https://numpy.org">numpy.org</a>, and it includes quickstart beginner guides. (The numpy package is not used in this notebook, but is imported as a demonstration because it is a very commonly-used package.) <br>

**matplotlib**: This package is a comprehensive library for creating static, animated, and interactive visualizations in Python. Its comprehensive documentation is at <a href="https://matplotlib.org/">matplotlib.org</a>. The <a href="https://matplotlib.org/stable/gallery/index.html">matplotlib gallery</a> is a great place to start and links to examples. <br>
 
**pandas**: A package which allows users to deal efficiently with tabular data in dataframes. Learn more in the <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html">Pandas documentation</a> <br>

**astropy**: A python package of useful astronomy tools. Learn more in the <a href="http://docs.astropy.org/en/stable/_modules/astropy/table/table.html">astropy documentation</a>.
 
**lsst**: These packages are all from the <a href="https://pipelines.lsst.io/">LSST Science Pipelines</a>.

Import the packages used in this notebook by executing the following cell.

In [None]:
# Import general python packages used by scientists
import numpy
import matplotlib
import matplotlib.pyplot as plt

# Import packages for Section 2.0 Catalog Access
import pandas
pandas.set_option('display.max_rows', 1000)
from lsst.rsp import get_tap_service, retrieve_query

# Import packages for Section 3.0 Image Access
import lsst.daf.butler as dafButler
import lsst.geom
import lsst.afw.display as afwDisplay

#### Learn more about the imported python packages

Print the version of numpy and matplotlib.

In [None]:
print('numpy version: ', numpy.__version__)
print('matplotlib version: ', matplotlib.__version__)

View a pop-up list of any package's modules by writing the package name, then a period, and then pressing tab. Use the up and down arrows to scroll through the pop-up list. This works whether or not the line is commented-out. In the cell below, `numpy.` is commented-out because that is not an executable code statements, and if the # were not there, this cell would fail to execute (try it -- remove the #, press shift-enter, and watch it fail).

In [None]:
# numpy.

Use "help" function to view the help documentation for a package. Remove the # symbol to un-comment any one line, and execute the following cell. Help documentation can be really long. Re-comment the line by replacing the #, then re-execute the cell and the output will go away.

In [None]:
# help(numpy)
# help(matplotlib)
# help(numpy.abs)
# help(matplotlib.pyplot)

## 2.0. Catalog Access

### 2.1. Table Access Protocol (TAP) service

Table Access Procotol (TAP) provides standardized access to the catalog data for discovery, search, and retrieval. Full <a href="http://www.ivoa.net/documents/TAP">documentation for TAP</a> is provided by the International Virtual Observatory Alliance (IVOA).

The TAP service uses a query language similar to SQL (Structured Query Langage) called ADQL (Astronomical Data Query Language). The <a href="http://www.ivoa.net/documents/latest/ADQL.html">documentation for ADQL</a> includes more information about syntax and keywords.

> **Notice:** Not all ADQL functionality is supported by the RSP for Data Preview 0. 

Start the TAP service. If a "Patching auth into..." warning is returned, it is safe to ignore it.

In [None]:
service = get_tap_service()

### 2.2. Exploring catalog tables and columns with TAP

This example uses the DP0.2 Object catalog, which contains sources detected in the coadded images (also called stacked, combined, or deepCoadd images). 

Catalog contents can also be explored with the <a href="https://dm.lsst.org/sdm_schemas/browser/dp02.html">DP0.2 schema browser</a>.

Results from a TAP service search are best displayed using one of two functions:<br>
`.to_table()`: convert results to an astropy table. <br>
`.to_table().to_pandas()`: convert to an astropy table and then to a Pandas dataframe.

> **Warning:** do not use the .to_table().show_in_notebook() method. This can cause issues in the RSP Jupyterlab environment that make your notebook hang indefinitely.

The three optional exercises below teach different was to explore using the TAP service. They show how to use the TAP service with ADQL statements to discover what catalogs exist and which columns they contain. Each cell uses a different method to display the TAP search results. Remove all of the # and execute each cell, and see that they create a lot of output -- add the # back to each line and re-execute the cell, and the output will go away.

#### 2.2.1. Exercise 1
Retrieve and display a list of all the table names and descriptions that are available via the TAP server.

In [None]:
# my_adql_query = "SELECT description, table_name FROM TAP_SCHEMA.tables"
# results = service.search(my_adql_query)
# results_table = results.to_table().to_pandas()
# results_table

#### 2.2.2. Exercise 2
Retrieve and display a list of the field names (columns names) in the DP0.2 Object catalog's TAP schema. Note that the results can be named anything else; here, 'res' is used instead.

In [None]:
# my_adql_query = "SELECT * from TAP_SCHEMA.columns "+\
#                 "WHERE table_name = 'dp02_dc2_catalogs.Object'"
# res = service.search(my_adql_query)
# print(res.fieldnames)

#### 2.2.3. Exercise 3
Retrieve the names, data types, description, and units for all columns in the Object catalog. Display the number of columns.

In [None]:
# my_adql_query = "SELECT column_name, datatype, description, unit "+\
#                 "FROM TAP_SCHEMA.columns "+\
#                 "WHERE table_name = 'dp02_dc2_catalogs.Object'"
# results = service.search(my_adql_query)
# results_table = results.to_table().to_pandas()
# print('Number of columns available in the Object catalog: ', len(results_table))

Display all 991 column names and their information. It's so much output! Comment-out the line in the cell and re-execute the cell to make all that output disappear.

In [None]:
# results_table

Only display names and descriptions for columns that contains the string 'cModelFlux'. Try other strings like 'coord', 'extendedness', 'deblend', or 'detect'.

In [None]:
# my_string = 'detect'
# for col,des in zip(results_table['column_name'],results_table['description']):
#     if col.find(my_string) > -1:
#         print('%-40s %-200s' % (col,des))

### 2.3. Retrieving data with TAP

#### 2.3.1. Converting fluxes to magnitudes

The object and source catalogs store only fluxes. There are hundreds of flux-related columns, and to store them also as magnitudes would be redundant, and a waste of space.

All flux units are nanojanskies ($nJy$). The <a href="https://en.wikipedia.org/wiki/AB_magnitude">AB Magnitudes Wikipedia page</a> provides a concise resource for users unfamiliar with AB magnitudes and jansky fluxes. To convert $nJy$ to AB magnitudes use: $m_{AB} = -2.5log( f_{nJy}) + 31.4$.

As demonstrated in Section 2.3.2, to add columns of magnitudes after retrieving columns of flux, users can do this:<br>
`results_table['r_calibMag'] = -2.50 * numpy.log10(results_table['r_calibFlux']) + 31.4`<br>
`results_table['r_cModelMag'] = -2.50 * numpy.log10(results_table['r_cModelFlux']) + 31.4`

As demonstrated in Section 2.3.3, to retrieve columns of fluxes *as magnitudes* in an ADQL query, users can do this:<br>
`scisql_fluxToAbMag(g_calibFlux/1e32) as g_calibMag`.

#### 2.3.2. Ten objects of any kind

To quickly demonstrate how to retrive data from the Object catalog,  use a cone search and request only 10 records be returned. Figure 2 of the <a href="https://ui.adsabs.harvard.edu/abs/2021ApJS..253...31L/abstract">DESC's DC2 paper</a> shows the sky region covered by DC2 simulation contains coordinates RA,Dec = 62,-37.

> **Warning:** The Object catalog contains hundreds of millions of rows. Searches that do not specify a region and/or a maximum number of records can take a long time, and return far too many rows to display in a notebook.

Retrieve coordinates and g,r,i magnitudes for 10 objects within a radius 0.5 degrees of 62,-37.

In [None]:
use_center_coords = "62, -37"

In [None]:
my_adql_query = "SELECT coord_ra, coord_dec, detect_isPrimary, "+\
                "r_calibFlux, r_cModelFlux, r_extendedness "+\
                "FROM dp02_dc2_catalogs.Object "+\
                "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "+\
                "CIRCLE('ICRS', "+use_center_coords+", 0.5)) = 1 "

results = service.search(my_adql_query, maxrec=10)
results_table = results.to_table()

In [None]:
results_table['r_calibMag'] = -2.50 * numpy.log10(results_table['r_calibFlux']) + 31.4
results_table['r_cModelMag'] = -2.50 * numpy.log10(results_table['r_cModelFlux']) + 31.4

In [None]:
results_table

#### 2.3.2. Ten thousand bright, star-like objects

In addition to a cone search, impose query restrictions that detect_isPrimary is True (this will not return deblended "child" sources), that the calibrated flux is greater than 360 nJy (about 25th mag), and that the extendedness parameters are 0 (star-like sources).

Retrieve g-, r- and i-band magnitudes for 10000 bright objects that are likely to be stars.

In [None]:
results = service.search("SELECT coord_ra, coord_dec, "
                         "scisql_fluxToAbMag(g_calibFlux/1e32) as g_calibMag, "
                         "scisql_fluxToAbMag(r_calibFlux/1e32) as r_calibMag, "
                         "scisql_fluxToAbMag(i_calibFlux/1e32) as i_calibMag "
                         "FROM dp02_dc2_catalogs.Object "
                         "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "
                         "CIRCLE('ICRS', "+use_center_coords+", 1.0)) = 1 "
                         "AND detect_isPrimary = 1 "
                         "AND g_calibFlux > 360 "
                         "AND r_calibFlux > 360 "
                         "AND i_calibFlux > 360 "
                         "AND g_extendedness = 0 "
                         "AND r_extendedness = 0 "
                         "AND i_extendedness = 0",
                         maxrec=10000)
results_table = results.to_table()
print(len(results_table))

The table display will automatically truncate.

In [None]:
results_table

Put the results into a pandas dataframe for easy access to contents. This data is used to create a color-magnitude diagram in Section 2.5.

In [None]:
data = results_table.to_pandas()

For users unfamiliar with Pandas, here are some optional lines of code that demonstrate how to print the column names, the 'ra' column info, or the 'ra' column values. Uncomment (remove #) and execute the cell to view the demo output.

In [None]:
# data.columns

In [None]:
# data['coord_ra']

In [None]:
# data['coord_ra'].values

### 2.5 Make a color-magnitude diagram

Use the plot task of the matplotlib.pyplot package (which was imported as plot). The plot task parameters are described in full at <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot">this matplotlib website</a>, but briefly they are: x values, y values, symbol shape ('o' is circle), marker size (ms), and marker transparency (alpha).

In [None]:
plt.plot(data['r_calibMag'].values - data['i_calibMag'].values,
         data['g_calibMag'].values, 'o', ms=2, alpha=0.2)

plt.xlabel('mag_r - mag_i')
plt.ylabel('mag_g')

plt.xlim([-0.5, 2.0])
plt.ylim([25.5, 16.5])

plt.show()

This plot generates many questions, such as "Why are the colors quantized?" and "Are those all really stars?". The answers are beyond the scope of this notebook, and are left as potential topics of scientific analysis that could be done with the DC2 data set.

## 3.0. Image Access

The two most common types of images that DP0 delegates will interact with are calexps and deepCoAdds.

**calexp**: A single image in a single filter.

**deepCoadd**: A combination of single images into a deep stack or Coadd.
 
The LSST Science Pipelines processes and stores images in tracts and patches.

**tract**: A portion of sky within the LSST all-sky tessellation (sky map); divided into patches.

**patch**: A quadrilateral sub-region of a tract, of a size that fits easily into memory on desktop computers.
 
To retrieve and display an image at a desired coordinate, users have to specify their image type, tract, and patch.

### 3.1. Create an instance of the butler

The butler (<a href="https://pipelines.lsst.io/modules/lsst.daf.butler/index.html">documentation</a>) is an LSST Science Pipelines software package to fetch LSST data without having to know its location or format. The butler can also be used to explore and discover what data exists. Other tutorials demonstrate the full butler functionality.

Define the butler data configuration and collection. These will be the same for all DP0.2 data sets.

In [None]:
config = 'dp02'
collection = '2.2i/runs/DP0.2/v23_0_2/PREOPS-905/step_all'

Create an instance of the butler using the configuration and the collection. It will return an informative statement about credentials being found, and if a wanring message about a "version mismatch between CFITSIO header" is returned, it is a known bug that is being resolved.

In [None]:
butler = dafButler.Butler(config, collections=collection)

### 3.2. Identify and retrieve a deepCoadd

There is a cool-looking DC2 galaxy cluster at RA = 03h42m59.0s, Dec = -32d16m09s (in degrees, 55.745834, -32.269167).

Use lsst.geom to define a SpherePoint for the cluster's coordinates (<a href="https://pipelines.lsst.io/modules/lsst.geom/index.html">lsst.geom documentation</a>).

In [None]:
my_ra_deg = 55.745834
my_dec_deg = -32.269167

my_spherePoint = lsst.geom.SpherePoint(my_ra_deg*lsst.geom.degrees,
                                       my_dec_deg*lsst.geom.degrees)
print(my_spherePoint)

Retrieve the DC2 sky map from the butler and use it to identify the tract and patch for the cluster's coordinates (<a href="http://doxygen.lsst.codes/stack/doxygen/x_masterDoxyDoc/skymap.html">skymap documentation</a>).

In [None]:
skymap = butler.get('skyMap')

tract = skymap.findTract(my_spherePoint)
patch = tract.findPatch(my_spherePoint)

my_tract = tract.tract_id
my_patch = patch.getSequentialIndex()

print('my_tract: ', my_tract)
print('my_patch: ', my_patch)

Use the butler to retrieve the deep i-band Coadd for the tract and patch.

In [None]:
dataId = {'band': 'i', 'tract': my_tract, 'patch': my_patch}
my_deepCoadd = butler.get('deepCoadd', dataId=dataId)

### 3.3. Display the image with afwDisplay
Image data retrieved with the butler can be displayed several different ways. A simple option is to use the LSST Science Pipelines package afwDisplay. There is some <a href="https://pipelines.lsst.io/modules/lsst.afw.display/index.html">documentation for afwDisplay</a> available, and other DP0 tutorials go into more detail about all the display options (e.g., overlaying mask data to show bad pixels).

Set the backend of afwDisplay to matplotlib.

In [None]:
afwDisplay.setDefaultBackend('matplotlib')

Use afwDisplay to show the image data retrieved.

In [None]:
# Create a matplotlib.pyplot figure.
fig = plt.figure(figsize=(10, 8))

# Alias lsst.afw.display.Display().
afw_display = afwDisplay.Display(1)

# Set the scale for the pixel shading.
afw_display.scale('asinh', 'zscale')

# Display the image data.
afw_display.mtv(my_deepCoadd.image)

# Turn off the x and y axes labels.
plt.gca().axis('on')

To learn more about the afwDisplay package and its tasks, use the help function.

In [None]:
# help(afw_display.scale)
# help(afw_display.mtv)