<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=250 style="padding: 10px"> 
<b>Introduction to Jupyter Notebooks for Data Preview 0.2</b> <br>
Contact author: Melissa Graham <br>
Last verified to run: <i>yyyy-mm-dd</i> <br>
LSST Science Piplines version: Weekly <i>yyyy_xx</i> <br>
Container size: medium <br>
Targeted learning level: beginner <br>

In [None]:
%load_ext pycodestyle_magic
%flake8_on
import logging
logging.getLogger("flake8").setLevel(logging.FATAL)

**Description:** An introduction to using Jupyter Notebooks and Rubin python packages to access LSST data products (images and catalogs).

**Skills:** Execute python code in a Jupyter Notebook. Use the TAP service to retreive catalog data. Use the Butler to retrieve and display images.

**LSST Data Products:** _dp02_test_PREOPS863_00.Object_

**Packages:** lsst.rsp.get_tap_service, lsst.rsp.retrieve_query, lsst.daf.butler, lsst.afw.display, lsst.geom, pandas, matplotlib

**Credit:** Originally developed by Melissa Graham and the Rubin Community Engagement Team in the context of the Rubin DP0.1. Please consider acknowledging Melissa Graham if this notebook is used for the preparation of journal articles or software releases.

**Get Support:**
Find DP0-related documentation and resources at <a href="https://dp0-2.lsst.io">dp0-2.lsst.io</a>. Questions are welcome as new topics in the <a href="https://community.lsst.org/c/support/dp0">Support - Data Preview 0 Category</a> of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

### 1.0 Introduction

This Jupyter Notebook provides an introduction to how notebooks work. It demonstrates how to execute code and markdown text cells, how to import Python packages and learn about their modules, and provides links to further documentation.

This Notebook also demonstrates the basic functionality of the Rubin Science Platform (RSP) installed at the Interim Data Facility (IDF; the Google Cloud), such as how to use the TAP service to query and retrieved catalog data; `matplotlib` to plot catalog data; the LSST `Butler` package to query and retrieve image data; and the LSST `afwDisplay` package to display images.

This Notebook uses the Data Preview 0.2 (DP0.2) data set. This data set uses a subset of the DESC's Data Challenge 2 (DC2) simulated images, which have been *reprocessed* by Rubin Observatory using Version 23 of the LSST Science Pipelines. More information about the simulated data can be found in the <a href="https://ui.adsabs.harvard.edu/abs/2021ApJS..253...31L/abstract">DESC's DC2 paper</a>.

<!---
This is a markdown cell. Press shift-enter to execute, and see the formatted text reappear.
-->

#### 1.1 How to Use a Jupyter Notebook

Jupyter Notebooks contain a mix of code, output, visualizations, and narrative text. The most comprehensive source for documentation about Jupyter Notebooks is https://jupyter-notebook.readthedocs.io, but there are many great beginner-level tutorials and demos out there. Usually a web search of a question, like "how to make a table in markdown jupyter notebook", will yield several good examples. Often the answers will be found in <a href="https://stackoverflow.com/">StackOverflow</a>.

A Jupyter Notebook is a series of cells. There are three types of cells: markdown, code, and raw. This text was generated from a markdown cell. Up in the menu bar you will find a drop-down menu to set the cell type.

*Action: Double click on these words and this cell will transform from formatted text to the markdown source code used to create it.*

*Action: Click in the following code cell. When your cursor is in a cell, simultaneously press "shift" and "enter" (or "return") to execute the cell code.*

In [None]:
# This is a code cell. Press shift-enter to execute.
# The # makes these lines comments, not code. They are not executed.
print('Hello, world!')

**It is important that all of the code cells in a notebook are executed in the order that they appear.** 

Not all of the code cells produce output like the one above, which has a print statement, but they are all doing something (e.g., importing packages, defining variables). If you want to execute all of the cells in a notebook in order, go to the top menu bar and select Kernel --> Restart Kernel and Run All Cells.

**What is a kernel?** The kernel is the computational engine for the notebook. In this case, we are using a Python3 kernel. You can think of the kernel as a live compiler, if that's helpful. Restarting the kernel means that all defined variables or functions are removed from memory, and all code cells revert to an "unexecuted" state.

**View a table of contents for this notebook.** Click on the icon of a bullet list in the leftmost vertical menu bar, and an automatically-generated ToC will appear at left. Click on the icon of the file folder at the top of the leftmost verticle menu bar to return to a directory view.


#### 1.2 Emergency Stop a Notebook
If a code cell is taking a long time to execute (for example you accidentally tried to retrieve an entire catalog or tried to print 100,000 rows) and you need to kill it, go to Kernel --> Restart Kernel and Clear All Outputs. It might still take a few tens of seconds but it will stop the process and restart the kernel.

#### 1.3 Package Imports
You will find that many Jupyter Notebooks start out by importing all the packages they will need in the first code cell.

You do not need to know anything about these packages to complete this tutorial, but here is a bit of extra information about numpy and matplotlib for new users. 

 * The **numpy** package is a fundamental package for scientific computing with arrays in Python. Its comprehensive documentation is available at <a href="https://numpy.org">numpy.org</a>, and it includes quickstart beginner guides. (The numpy package is not used in this notebook, but is imported as a demonstration because it is a very commonly-used package.) <br>

 * The **matplotlib** package is a comprehensive library for creating static, animated, and interactive visualizations in Python. Its comprehensive documentation is at <a href="https://matplotlib.org/">matplotlib.org</a>. The <a href="https://matplotlib.org/stable/gallery/index.html">matplotlib gallery</a> is a great place to start and links to copy-and-pastable code. <br>
 
 * The **pandas** package allows users to deal efficiently with tabular data in `dataframes`. <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html">Pandas Documentation</a>). <br>
 
 * The **lsst** packages are all from the <a href="https://pipelines.lsst.io/">LSST Science Pipelines</a>.

*Action: Import packages by executing this cell.*

In [None]:
# Import general python packages used by scientists
import numpy
print('numpy version: ', numpy.__version__)
import matplotlib
print('matplotlib version: ', matplotlib.__version__)
import matplotlib.pyplot as plt

# Import packages for Section 2.0 Catalog Access
import pandas
pandas.set_option('display.max_rows', 1000)
from lsst.rsp import get_tap_service, retrieve_query

# Import packages for Section 3.0 Image Access
import lsst.daf.butler as dafButler
import lsst.geom
import lsst.afw.display as afwDisplay

*Action: Execute the code cell below to check which version of the LSST Science Pipelines you are using.*

It should match the verified version listed in the notebook's header, and also the version listed along the bottom of the JupyterLab window.

In [None]:
! echo $IMAGE_DESCRIPTION
! eups list -s | grep lsst_distrib

#### 1.3 Learn About Python Packages

*Action: Put your cursor after one of the periods in the following cell and press tab to view a list of that package's modules. Use the down arrow to scroll.*

Note that the # symbol is there to comment out the lines because `numpy.` and `plt.` are not executable code statements. If the # were not there, this cell would fail to execute (try it -- remove the #, press shift-enter, and watch it fail).

In [None]:
# numpy.
# plt.

*Action: Remove the # symbol to 'uncomment' one line, and execute the cell.*

You will see the help documentation for that one package appear below the cell. Help documentation can be really long. 'Recomment' the line with a #, re-execute the cell, and the output will go away.

In [None]:
# help(numpy)
# help(matplotlib)
# help(numpy.abs)
# help(matplotlib.pyplot)

### 2.0 Catalog Access

#### 2.1 Table Access Protocol (TAP) service

Table Access Procotol (TAP) provides standardized access to the catalog data for discovery, search, and retrieval. Full <a href="http://www.ivoa.net/documents/TAP">documentation for TAP</a> is provided by the International Virtual Observatory Alliance (IVOA).

The TAP service uses a query language similar to SQL (Structured Query Langage) called ADQL (Astronomical Data Query Language). The <a href="http://www.ivoa.net/documents/latest/ADQL.html">documentation for ADQL</a> includes more information about syntax and keywords.

**Hazard Warning:** Not all ADQL functionality is supported yet in the RSP at the IDF for DP0. 

*Action: Start your TAP service. (This cell produces no output, unlike the cells above that have print statements. If a "Patching auth into..." warning is returned, it is safe to ignore it.)* 

In [None]:
service = get_tap_service()

#### 2.2 Exploring catalog tables and columns with TAP
For this example we use the DP0.2 `Object` catalog, which contains sources detected in the coadded images (also called stacked, combined, or deepCoadd images). 

Results from a TAP service search are best displayed using one of two functions:<br>
.to_table() --> an astropy table (astropy is a python package of useful astronomy tools; <a href="http://docs.astropy.org/en/stable/_modules/astropy/table/table.html">AstroPy Documentation</a>). <br>
.to_table().to_pandas() --> a pandas dataframe (pandas is a python package for dealing with tabular data; <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html">Pandas Documentation</a>).

**Hazard Warning:** do not use the .to_table().show_in_notebook() method. This can cause issues in the RSP Jupyterlab environment that make your notebook hang indefinitely.

The next three executable cells are optional. They show how to use the TAP service to discover what catalogs exist and which columns they contain. Each cell uses a different method to display the TAP search results. Remove all of the # and execute each cell, and see that they create a lot of output -- add the # back to each line and re-execute the cell, and the output will go away.

*Action: Retrieve the table names and descriptions of available tables. Show the results as an AstroPy table.*

In [None]:
# results = service.search("SELECT description, table_name "
#                          "FROM TAP_SCHEMA.tables")
# results_tab = results.to_table()
# results_tab

*Action: Print all of the field names (columns names) in the DP0.2 `Object` catalog's TAP schema. Note that the 'results' can be named anything else, e.g., 'res'.*

In [None]:
# res = service.search("SELECT * from TAP_SCHEMA.columns "
#                      "WHERE table_name = 'dp02_test_PREOPS863_00.Object'")
# print(res.fieldnames)

*Action: Retrieve the names, data types, description, and units for all columns in the `Object` catalog. This time, show the results as a pandas dataframe.*

In [None]:
# results = service.search("SELECT column_name, datatype, description, unit "
#                          "FROM TAP_SCHEMA.columns "
#                          "WHERE table_name = 'dp02_test_PREOPS863_00.Object'")
# results.to_table().to_pandas()

#### 2.3 Retrieving data with TAP

##### 2.3.1 Ten objects of any kind

To quickly demonstrate how to retrive data from the `Object` catalog, we use a cone search and request only 10 records be returned. Figure 2 of the <a href="https://ui.adsabs.harvard.edu/abs/2021ApJS..253...31L/abstract">DESC's DC2 paper</a> shows the sky region covered by DC2 simulation contains coordinates RA,Dec = 62,-37.

<b> Hazard Warning: </b> The `Object` catalog contains hundreds of millions of rows. Searches that do not specify a region and/or a maximum number of records can take a long time, and return far too many rows to display in a notebook.

*Action: Retrieve coordinates and g,r,i magnitudes for 10 objects within a radius 0.5 degrees of 62,-37.*

In [None]:
use_center_coords = "57.5, -36.5"

In [None]:
results = service.search("SELECT coord_ra, coord_dec, detect_isPrimary, "
                         "r_calibFlux, r_cModelFlux, r_extendedness "
                         "FROM dp02_test_PREOPS863_00.Object "
                         "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "
                         "CIRCLE('ICRS', "+use_center_coords+", 0.5)) = 1 ",
                         maxrec=10)
results_tab = results.to_table()

The flux units are nanojanskies, nJy. To convert to AB magnitudes: $m_{AB} = -2.5log( f_{nJy}) + 31.4$.

In [None]:
results_tab['r_calibMag'] = -2.50 * numpy.log10(results_tab['r_calibFlux']) + 31.4
results_tab['r_cModelMag'] = -2.50 * numpy.log10(results_tab['r_cModelFlux']) + 31.4

In [None]:
# results_tab

##### 2.3.2 Ten thousand bright, star-like objects

In addition to a cone search, we impose query restrictions that `detect_isPrimary` is True (this will not return deblended "child" sources), that the calibrated flux is greater than 360 nJy (about 25th mag), and that the `extendedness` parameters are 0 (star-like sources).

*Action: Retrieve magnitudes and their uncertainties for 10000 bright objects that are likely to be stars. (No output).* 

In [None]:
results = service.search("SELECT coord_ra, coord_dec, detect_isPrimary, "
                         "g_calibFlux, r_calibFlux, i_calibFlux "
                         "FROM dp02_test_PREOPS863_00.Object "
                         "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "
                         "CIRCLE('ICRS', "+use_center_coords+", 1.0)) = 1 "
                         "AND detect_isPrimary = 1 "
                         "AND g_calibFlux > 360 "
                         "AND r_calibFlux > 360 "
                         "AND i_calibFlux > 360 "
                         "AND g_extendedness = 0 "
                         "AND r_extendedness = 0 "
                         "AND i_extendedness = 0",
                         maxrec=10000)
results_tab = results.to_table()

In [None]:
results_tab['g_calibMag'] = -2.50 * numpy.log10(results_tab['g_calibFlux']) + 31.4
results_tab['r_calibMag'] = -2.50 * numpy.log10(results_tab['r_calibFlux']) + 31.4
results_tab['i_calibMag'] = -2.50 * numpy.log10(results_tab['i_calibFlux']) + 31.4

*Action: Put the results into a `pandas` dataframe for easy access to contents. (This cell produces no output).*

In [None]:
data = results_tab.to_pandas()

This data is used to create a color-magnitude diagram in Section 2.5.

#### 2.4 The Pandas package -- a brief demo

If you're unfamiliar with pandas, a python package for dealing with tabular data (<a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html">Pandas Documentation</a>), here are some optional lines of code that demonstrate how to print the column names, the 'ra' column info, or the 'ra' column values. Uncomment (remove #) and execute the cell to view the demo output.

In [None]:
# data.columns

In [None]:
# data['coord_ra']

In [None]:
# data['coord_ra'].values

#### 2.5 Making a color-magnitude diagram

To make our diagram, we use the `plot` task of the matplotlib.pyplot package, which we imported as `plt`.
The `plot` task parameters are described in full at <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot">this matplotlib website</a>, but briefly they are: x values, y values, symbol shape ('o' is circle), marker size (`ms`), and marker transparency (`alpha`).

*Action: Use `plt.plot` to display a color magnitude diagram.*

In [None]:
plt.plot(data['r_calibMag'].values-data['i_calibMag'].values,
         data['g_calibMag'].values,
         'o', ms=2, alpha=0.2)

# Label the axes.
plt.xlabel('mag_r - mag_i')
plt.ylabel('mag_g')

# Limit the x-axis.
plt.xlim([-0.5, 2.0])

# Flip the y-axis.
plt.ylim([25.5, 16.5])

plt.show()

This plot generates many questions, such as "Why are the colors quantized?" and "Are those all really stars?". The answers are beyond the scope of this notebook, and are left as potential topics of scientific analysis that could be done with the DC2 data set.

### 3.0 Image Access
The two most common types of images that DP0 delegates will interact with are `calexps` and `deepCoAdds`:
 * `calexp` -- a single image in a single filter
 * `deepCoadd` -- a combination of single images into a deep stack or CoAdd
 
The LSST Science Pipelines processes and stores images in `tracts` and `patches`:
 * `tract` -- a portion of sky within the LSST all-sky tessellation (sky map); divided into patches
 * `patch` -- a quadrilateral sub-region of a tract, of a size that fits easily into memory on desktop computers
 
To retrieve and display an image at a desired coordinate, users will have to specify their image type and the tract and patch they want. This tutorial demonstrates how to do that.

#### 3.1 Finding and retrieving an image with the `butler`
For DP0.1, images can only be accessed via the `butler` (<a href="https://pipelines.lsst.io/modules/lsst.daf.butler/index.html">documentation</a>), an LSST Science Pipelines software package to fetch the LSST data you want without you having to know its location or format.

The `butler` can also be used to explore and discover what data exists, and decide which images you want, as well as to  fetch the same type of catalog data that we used the TAP service for, above. Other DP0 tutorials will demonstrate the full butler functionality.

*Action: Define the data repository and collection. (No output).* These will be the same for all DP0.1 data sets.

In [None]:
# repo = 's3://butler-us-central1-dp01'
# collection = '2.2i/runs/DP0.1'

### at data-int, can use this:
repo = 's3://butler-us-central1-panda-dev/dc2/butler-external.yaml'
collection = '2.2i/runs/DP0.2/v23_0_1_rc1/PREOPS-905/pilot_tract4431'

*Action: Create an instance of the `butler` using the repo and collection. (No output, though you may see a report that your butler access credentials were found.)*

In [None]:
butler = dafButler.Butler(repo, collections=collection)

For this example, let's retrieve an image of a cool-looking DC2 galaxy cluster that we already know is at coordinates: <br>RA = 03h42m59.0s, Dec = -32d16m09s. In decimal degrees these coordinates are 55.745834, -32.269167.

*Action: Use `lsst.geom` to define a SpherePoint for your coordinates.* (Full `lsst.geom` package <a href="https://pipelines.lsst.io/modules/lsst.geom/index.html">documentation</a>.)

In [None]:
my_spherePoint = lsst.geom.SpherePoint(55.745834*lsst.geom.degrees,
                                       -32.269167*lsst.geom.degrees)
print(my_spherePoint)

*Action: Get the sky map from the `butler` and use `findTract` and `findPatch`.* (Full `skymap` <a href="http://doxygen.lsst.codes/stack/doxygen/x_masterDoxyDoc/skymap.html">documentation</a>.)

In [None]:
skymap = butler.get('skyMap')
my_tract = skymap.findTract(my_spherePoint)
my_patch = my_tract.findPatch(my_spherePoint)
my_patch_id = my_tract.getSequentialPatchIndex(my_patch)
print('my_tract = ', my_tract)
print('my_patch = ', my_patch)
print('my_patch_id = ', my_patch_id)

<br>

**Hazard/Warning:** Patch formats changed recently. The "(3,2)" tuple format for patches is is part of the "Generation 2" ("Gen 2") Bulter and is formally deprecated. The new "Gen 3" format is a single integer. To convert from the "Gen 2" format of (i,j) to the "Gen 3" format, use (7 x j) + i. In this case, (7 x 2) + 3 = 17.

*Action: Use the butler to retrieve the deep i-band CoAdded image for the tract and patch. (No Output).*

In [None]:
dataId = {'band': 'i', 'tract': 4431, 'patch': 17}
my_deepCoadd = butler.get('deepCoadd', dataId=dataId)

#### 3.2 Displaying an image with `afwDisplay`.
Image data retrieved with the butler can be displayed several different ways. A simple option is to use the LSST Science Pipelines package `afwDisplay`. There is some <a href="https://pipelines.lsst.io/modules/lsst.afw.display/index.html">documentation for afwDisplay</a> available, and other DP0 tutorials go into more detail about all the display options (e.g., overlaying mask data to show bad pixels).

*Action: Set the backend of `afwDisplay` to `matplotlib`. (No output).*

In [None]:
afwDisplay.setDefaultBackend('matplotlib')

*Action: Use afwDisplay to show the image data retrieved.* (Pateince; this takes a couple of seconds to render).

In [None]:
fig = plt.figure(figsize=(10, 8))      # create a matplotlib.pyplot figure
afw_display = afwDisplay.Display(1)   # create an alias for the lsst.afw.display.Display() method
afw_display.scale('asinh', 'zscale')  # set the algorithm and scale for the pixel shading
afw_display.mtv(my_deepCoadd.image)   # display the image data you retrieved with the butler
plt.gca().axis('off')                 # turn off the x and y axes labels

<br>
To learn more about the `afwDisplay` package and its tasks, use the help function.

In [None]:
# help(afw_display.scale)
# help(afw_display.mtv)