<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=250 style="padding: 10px"> 
<b>Catalog Queries with TAP (Table Access Protocol)</b> <br>
Contact authors: Leanne Guy <br>
Last verified to run: 2022-11-22 <br>
LSST Science Piplines version: Weekly 2022_40 <br>
Container Size: medium <br>
Targeted learning level: beginner <br>

**Description:** Explore the DP0.2 catalogs via TAP and execute complex queries to retrieve data.

**Skills:** Use the TAP service. Query catalog data with ADQL. Visualize retrieved datasets.

**LSST Data Products:** Object, ForceSource, CcdVisit tables.

**Packages:** lsst.rsp, bokeh, pandas

**Credit:**
Originally developed by Leanne Guy in the context of the Rubin DP0.1.

**Get Support:**
Find DP0-related documentation and resources at <a href="https://dp0-2.lsst.io">dp0-2.lsst.io</a>. Questions are welcome as new topics in the <a href="https://community.lsst.org/c/support/dp0">Support - Data Preview 0 Category</a> of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

## 1.0. Introduction

This notebook provides an intermediate-level demonstration of how to use the Table Access Protocol (TAP) server and ADQL (Astronomy Data Query Language) to query and retrieve data from the DP0.2 catalogs.

TAP provides standardized access to catalog data for discovery, search, and retrieval.
Full <a href="http://www.ivoa.net/documents/TAP">documentation for TAP</a> is provided by the International Virtual Observatory Alliance (IVOA).
ADQL is similar to SQL (Structured Query Langage).
The <a href="http://www.ivoa.net/documents/latest/ADQL.html">documentation for ADQL</a> includes more information about syntax and keywords.

**Recommendation: review the ADQL recipes and other advice to optimize TAP queries** provided in the documentation for the DP0-era RSP's <a href="https://dp0-2.lsst.io/data-access-analysis-tools/index.html">Data Access and Analysis Tools</a>.

> **Warning:** Not all ADQL functionality is supported yet in the DP0 RSP.

### 1.1. Package imports

In [None]:
# Import general python packages
import time
import numpy as np
import matplotlib.pyplot as plt
import pandas
from pandas.testing import assert_frame_equal
from astropy import units as u
from astropy.coordinates import SkyCoord

# Import the Rubin TAP service utilities
from lsst.rsp import get_tap_service, retrieve_query

# Bokeh and holoviews for interactive visualization
import bokeh
from bokeh.io import output_file, output_notebook, show
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource, CDSView, GroupFilter, HoverTool
from bokeh.plotting import figure
from bokeh.transform import factor_cmap
import holoviews as hv

### 1.2. Define functions and parameters

Set the maximum number of rows to display from pandas, and configure bokeh to generate output in notebook cells when show() is called.

In [None]:
pandas.set_option('display.max_rows', 20)
output_notebook()

In general, the order of results from database queries cannot be assumed to be the same every time.
This function sorts the data so that we can compare the results dataframes even if the records are not in the same order from the query.
Since it is sorting, it also needs to reset the incremental index with `set_index`.

In [None]:
def sort_dataframe(df, sort_key='objectId'):
    df = df.sort_values(sort_key)
    df.set_index(np.array(range(len(df))), inplace=True)
    return df

## 2. Explore the DP0.2 schema 

### 2.1. Create the Rubin TAP Service client

Get an instance of the TAP service, and assert that it exists.

In [None]:
service = get_tap_service()
assert service is not None
assert service.baseurl == "https://data.lsst.cloud/api/tap"

### 2.2. Schema discovery

To find out what schemas, tables and columns exist, query the Rubin TAP schema.

This information is also available in the "Data Products Definitions" section of the <a href="http://dp0-2.lsst.io">DP0.2 documentation</a>.

Create the query to find out what schemas are in the Rubin TAP_SCHEMA, execute it, and see that a TAP Results object is returned.

In [None]:
query = "SELECT * FROM tap_schema.schemas"
results = service.search(query)
print(type(results))

Convert the results to an astropy table and display

In [None]:
results = service.search(query).to_table()
results

### 2.3. The DP0.2 catalogs

All the DP0 tables (catalogs) are in the "dp02_dc2_catalogs" schema (table collection).

Search for the DP0 schema name and store as a variable.

In [None]:
schema_names = results['schema_name']
for name in schema_names:
    if name.find('dp02') > -1:
        dp02_schema_name = name
        break
print("DP0.2 schema is " + dp02_schema_name)

Let's explore tables in the DP0.2 schema, ordering them by their database.
This is the order in which they will appear presented to the user in the RSP Portal.
We see the tables in the DP0.2 schema, the same tables that are presented via the Portal GUI, together with a description of each. 

In [None]:
query = "SELECT * FROM tap_schema.tables " \
        "WHERE tap_schema.tables.schema_name = '" \
        + dp02_schema_name + "' order by table_index ASC"
print(query)

results = service.search(query)
results = results.to_table()
results

<br>
Here are some definitions to help you understand the contents of the TAP schema. 

* `schema` - database terminology for the abstract design that represents the storage of data in a database. 
* `tap_schema` - the specific schema describing the TAP service. All TAP services must support a set of tables in a schema named TAP_SCHEMA that describe the tables and columns included in the service.
* `table collection` - a collection of tables. e.g., `dp02_dc2_catalogs`
* `table` - a collection of related data held in a table format in a database, e.g., the object(dp02_dc2_catalogs.Object) table 
* `results` - the query result set. The TAP service returns data from a query as a `TAPResults` object. Find more about `TAPResults` [here](https://pyvo.readthedocs.io/en/latest/api/pyvo.dal.TAPResults.html).

## 3. Querying the DP0.2 Object catalog

The Object catalog (dp02_dc2_catalogs.Object) contains sources detected in the coadded images (also called stacked or combined images).

### 3.1. Getting the columns available for a given table

Request the column names, data types, descriptions, and units for all columns in the Object catalog, and display as a Pandas table (which will automatically truncate).

> **Notice:** There are 990 columns available in the Object table.

In [None]:
results = service.search("SELECT column_name, datatype, description, unit from TAP_SCHEMA.columns "
                         "WHERE table_name = 'dp02_dc2_catalogs.Object'")
results.to_table().to_pandas()

There is no need to read through all the columns, which are also available in the "Data Products Definitions" section of the <a href="http://dp0-2.lsst.io">DP0.2 documentation</a>.

The next cell loops over all column names and, if a user-defined search string such as "coord" is found, prints the column name.

In [None]:
search_string = 'coord'
for cname in results['column_name']:
    if cname.find(search_string) > -1:
        print(cname)

Clean up.

In [None]:
del results

#### Exercise for the learner

Search for other columns, such as those related to Flux, or all columns related to the g-band filter.

### 3.2. Cone search specifying the maximum number of records

>**RA, Dec constraints yeild faster queries:**
The TAP-accessible tables are sharded by coordinate (RA, Dec).
ADQL query statements that include constraints by coordinate do not requre a whole-catalog search, and are typically faster (and can be much faster) than ADQL query statements which only include constraints for other columns.

A cone search on the Object table will be a common TAP query.
In this example, a circle centered on (RA, Dec) = (62.0, -37.0), with a radius of 1 degree is used.

Define the central coordinates and search radius using AstroPy `SkyCoord` and units.

In [None]:
center_coords = SkyCoord(62, -37, frame='icrs', unit='deg')
search_radius = 1.0*u.deg

print(center_coords)
print(search_radius)

The TAP queries take the center coordinates and the search radius -- both in units of degrees -- as strings, so also define strings to use in the query statements below.

In [None]:
use_center_coords = "62, -37"
use_radius = "1.0"

For debugging and testing queries, it is often useful to only return a few records for expediency.
This can be done in one of two ways, setting the `TOP` field in a query, or setting the `maxrec` parameter in the TAP service query.
The two methods produce identical results, as demonstrated below.

Define the maximum records to return.

In [None]:
max_rec = 5

#### 3.2.1. Set TOP

Use the "TOP" option to set the maximum number of records.

Build a query to find bright objects with magnitude < 18 and an extendedness = 0 in the r-band filter.
Recall that it is always recommended to set `detect_isPrimary = True` (which means the source has no deblended children, to avoid returning both deblended *and* blended objects).

Order the results by descending flux, both in this example and the next -- otherwise, results are returned in random order and we would not be able to confirm that both methods for setting the number of records are equivalent.

> **Warning:** Combining use of TOP with ORDER BY in ADQL queries can be dangerous, as in, may take an unexpectedly long time because the database is trying to first sort, and *then* extract the top N elements. It is best to only combine TOP and ORDER BY if your query cuts down the number of objects that would need to be sorted, first. (Which is the case in the examples below).

Execute the query, and confirm that only 5 records were retrieved.

This query usually takes <2 minutes.

In [None]:
%%time
query = "SELECT TOP " + str(max_rec) + " " + \
        "objectId, coord_ra, coord_dec, detect_isPrimary " + \
        "g_cModelFlux, r_cModelFlux, r_extendedness, r_inputCount " + \
        "FROM dp02_dc2_catalogs.Object " + \
        "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), " + \
        "CIRCLE('ICRS', " + use_center_coords + ", " + use_radius + ")) = 1 " + \
        "AND detect_isPrimary = 1 " + \
        "AND r_extendedness = 0 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) < 18.0 " + \
        "ORDER by r_cModelFlux DESC"
results = service.search(query)
assert len(results) == max_rec

#### 3.2.2. Set maxrec

Execute the same query using the maxrec parameter instead of the TOP, name the output "results1" instead of "results", and confirm that only 5 records were retrieved.

This query usually takes <2 minutes.

In [None]:
%%time
query = "SELECT objectId, coord_ra, coord_dec, detect_isPrimary " + \
        "g_cModelFlux, r_cModelFlux, r_extendedness, r_inputCount " + \
        "FROM dp02_dc2_catalogs.Object " + \
        "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), " + \
        "CIRCLE('ICRS', " + use_center_coords + ", " + use_radius + ")) = 1 " + \
        "AND detect_isPrimary = 1 " + \
        "AND r_extendedness = 0 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) < 18.0 " + \
        "ORDER by r_cModelFlux DESC"
results1 = service.search(query, maxrec=max_rec)
assert len(results1) == max_rec

#### 3.2.3. Show the results from TOP and maxrec are identical

Convert the results to pandas data frames, and  assert that the contents of the two tables are identical.
There will be no output if they are identical.

In [None]:
assert_frame_equal(sort_dataframe(results.to_table().to_pandas()),
                   sort_dataframe(results1.to_table().to_pandas()))

For more detailed analysis of results, converting to a pandas dataframe is often very useful

In [None]:
results_table = results.to_table().to_pandas()
results_table

In [None]:
results1_table = results1.to_table().to_pandas()
results1_table

Clean up.

In [None]:
del results, results1, results_table, results1_table

### 3.3. Cone search with double table join 

Create a similar query as above, for bright, non-extended Objects within a 1 degree radius, but now join with the ForcedSource table to obtain PSF photometry from the r-band processed visit images (PVIs), and join with the CcdVisit table to obtain the visits' modified Julian dates (MJDs).

Do not set a maximum number of records to return, because the goal is to generate a table containing the ForcedSource photometry for all of the Objects of interest. There is no need to use ORDER BY for this example, but it could be.

This query usually takes <2 minutes.

In [None]:
%%time
query = "SELECT fs.forcedSourceId, fs.objectId, fs.ccdVisitId, fs.detect_isPrimary, " + \
        "fs.band, scisql_nanojanskyToAbMag(fs.psfFlux) as psfMag, ccd.obsStartMJD, " + \
        "scisql_nanojanskyToAbMag(obj.r_psfFlux) as obj_rpsfMag " + \
        "FROM dp02_dc2_catalogs.ForcedSource as fs " + \
        "JOIN dp02_dc2_catalogs.CcdVisit as ccd " + \
        "ON fs.ccdVisitId = ccd.ccdVisitId " + \
        "JOIN dp02_dc2_catalogs.Object as obj " + \
        "ON fs.objectId = obj.objectId " + \
        "WHERE CONTAINS(POINT('ICRS', obj.coord_ra, obj.coord_dec), " + \
        "CIRCLE('ICRS', " + use_center_coords + ", " + use_radius + ")) = 1 " + \
        "AND obj.detect_isPrimary = 1 " + \
        "AND obj.r_extendedness = 0 " + \
        "AND scisql_nanojanskyToAbMag(obj.r_cModelFlux) > 17.5 " + \
        "AND scisql_nanojanskyToAbMag(obj.r_cModelFlux) < 18.0 " + \
        "AND fs.band = 'r' "
results = service.search(query)

Take a look at how the output results are formatted from joined tables. Notice the `obj_rpsfMag` column has the same value for all rows with the same `objectId`. The table join from ForcedSource to Object is a many-to-one match.

In [None]:
results.to_table().to_pandas()

Investigate how many unique Objects were returned, and the distribution of the number of ForcedSources (visit images) per Object.

In [None]:
unique_objectIds, counts = np.unique(results['objectId'], return_counts=True)
print(len(unique_objectIds), ' unique objects returned')

plt.hist(counts, bins=20)
plt.xlabel('Number of ForcedSources')
plt.ylabel('Number of Objects')
plt.show()

Plot the time series of ForcedSource r-band photometry for one of the Objects.
For this example, just take the first unqiue objectId from the list.

In [None]:
use_objectId = unique_objectIds[0]

tx = np.where(results['objectId'] == use_objectId)[0]

plt.axhline(results['obj_rpsfMag'][tx[0]], ls='solid', color='black',
            label='Object r-band PSF Magnitude (from deepCoadd)')
plt.plot(results['obsStartMJD'][tx], results['psfMag'][tx],
         'o', ms=10, mew=0, alpha=0.5, color='firebrick',
         label='ForcedSource r-band PSF Magnitude (from PVIs)')
plt.xlabel('MJD')
plt.ylabel('r-band magnitude')
plt.title('objectId = ' + str(use_objectId))
plt.legend(loc='lower left')
plt.show()

Whether or not that randomly chosen Object is truly variable is left as an exercise for the learner.

Clean up.

In [None]:
del results, use_objectId, unique_objectIds, counts

Here is an example of an alternative query string for obtaining the same data in the case where the `objectId` is already known -- meaning, a case in which the Object catalog does not need to be queried via any columns aside from `objectId`.

> "SELECT fs.forcedSourceId, fs.objectId, fs.ccdVisitId, fs.detect_isPrimary, " + \
        "fs.band, scisql_nanojanskyToAbMag(fs.psfFlux) as psfMag, ccd.obsStartMJD, " + \
        "scisql_nanojanskyToAbMag(obj.r_psfFlux) as obj_rpsfMag " + \
        "FROM dp02_dc2_catalogs.ForcedSource as fs " + \
        "JOIN dp02_dc2_catalogs.CcdVisit as ccd " + \
        "ON fs.ccdVisitId = ccd.ccdVisitId " + \
        "JOIN dp02_dc2_catalogs.Object as obj " + \
        "ON fs.objectId = obj.objectId " + \
        "WHERE fs.objectId = " + str(use_objectId) + " " + \
        "AND fs.band = 'r' "

## 4. Visualize large data sets resulting from a query

Now we will do some interactive analysis with the data we have above. We will use bokeh to create interactive plots so that we can explore the dataset, using multiple panels showing different representations of the same dataset. A selection applied to either panel will highlight the selected points in the other panel.

<a href="https://bokeh.org/">Bokeh Documentation</a> <br>
<a href="https://holoviews.org/">Holoviews Documentation</a>

### 4.1. Data query

Select Objects within a central search area, with r-band magnitudes between 17 and 23 mag, and with a measured r-band extendedness parameter (that is not NaN or NULL).

Return the coordinates, gri magnitudes, and r-band extendedness. 

> **Notice:** The results are being converted to a Pandas table in the same line as the search is executed.

This query usually takes <2 minutes.

In [None]:
%%time
query = "SELECT objectId, detect_isPrimary, " + \
        "coord_ra AS ra, coord_dec AS dec, " + \
        "scisql_nanojanskyToAbMag(g_cModelFlux) AS mag_g_cModel, " + \
        "scisql_nanojanskyToAbMag(r_cModelFlux) AS mag_r_cModel, " + \
        "scisql_nanojanskyToAbMag(i_cModelFlux) AS mag_i_cModel, " + \
        "r_extendedness " + \
        "FROM dp02_dc2_catalogs.Object " + \
        "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), " + \
        "CIRCLE('ICRS', " + use_center_coords + ", " + use_radius + ")) = 1 " + \
        "AND detect_isPrimary = 1 " + \
        "AND scisql_nanojanskyToAbMag(g_cModelFlux) > 17.0 " + \
        "AND scisql_nanojanskyToAbMag(g_cModelFlux) < 23.0 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) > 17.0 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) < 23.0 " + \
        "AND scisql_nanojanskyToAbMag(i_cModelFlux) > 17.0 " + \
        "AND scisql_nanojanskyToAbMag(i_cModelFlux) < 23.0 " + \
        "AND r_extendedness IS NOT NULL "
results = service.search(query).to_table().to_pandas()

Display the Pandas table.

In [None]:
results

### 4.2. Data preparation

The basis for any data visualization is the underlying data.
We will prepare ColumnDataSource (CDS) from the data returned by the query above that can be passed directly to bokeh.
The CDS is the core of bokeh plots.
Bokeh automatically creates a CDS from data passed as python lists or numpy arrays.
CDS are useful as they allow data to be shared between multiple plots and renderers, enabling brushing and linking.
A CDS is essentially a collection of sequences of data that have their own unique column name. 

Getting the data preparation phase right is key to creating powerful visualizations. 

Recall the center coordinates that we defined with AstroPy SkyCoord.
Get the central RA and Dec as values (datatype float).

In [None]:
center_ra = center_coords.ra.deg
center_dec = center_coords.dec.deg
print(center_ra, center_dec)

Create a map (a dictionary) that associated the `r_extendedness` column with the Object being a star or a galaxy.

In [None]:
object_map = {0.0: 'star', 1.0: 'galaxy'}

Create a python dictionary to store the data from the query and pass to the ColumnDataSource.
All columns in a CDS must have the same length.

In [None]:
data = dict(ra=results['ra'], dec=results['dec'],
            target_ra=results['ra']-center_ra,
            target_dec=results['dec']-center_dec,
            gmi=results['mag_g_cModel']-results['mag_i_cModel'],
            gmag=results['mag_g_cModel'],
            rmag=results['mag_r_cModel'],
            imag=results['mag_i_cModel'])
source = ColumnDataSource(data=data)

# Additional data can be added to the Column Data Source after creation
source.data['objectId'] = results['objectId']
source.data['r_extendedness'] = results['r_extendedness']

Use the object type map we created to define `object_type`, and view it.

In [None]:
source.data['object_type'] = results['r_extendedness'].map(object_map)
source.data['object_type']

### 4.3. Plot a color-magnitude diagram

We will use bokeh to plot a color-magnitude (g vs. g-i) diagram making use of the cModel magnitudes.
Hover over the points in the plot to see their values.

Define the plot asthetics and tools.

In [None]:
plot_options = {'plot_height': 400, 'plot_width': 400,
                'tools': ['box_select', 'reset', 'box_zoom', 'help']}

Define the hover tool.

In [None]:
tooltips = [
    ("Col (g-i)", "@gmi"),
    ("Mag (g)", "@gmag"),
    ("Mag (r)", "@rmag"),
    ("Mag (i)", "@imag"),
    ("Type", "@object_type")
]
hover_tool_cmd = HoverTool(tooltips=tooltips)

Create a Colour-Magnitude Diagram, color coding the different object types.

In [None]:
p = figure(title="Colour - Magnitude Diagram",
           x_axis_label='g-i', y_axis_label='g',
           x_range=(-1.8, 4.3), y_range=(23.5, 16),
           **plot_options)

Define a palette for the object types.

In [None]:
object_type_palette = ['darkred', 'lightgrey']
p.add_tools(hover_tool_cmd)
p.circle(x='gmi', y='gmag', source=source,
         size=3, alpha=0.6,
         legend_field="object_type",
         color=factor_cmap('object_type',
                           palette=object_type_palette,
                           factors=['star', 'galaxy']),
         hover_color="darkblue")

Show the interactive plot.

In [None]:
show(p)

### 4.4. Plot a color-color (r-i vs. g-r) diagram

We will add a color-color (r-i vs. g-r) diagram and make use of the advanced linking features of bokeh to enable brushing and linking between the the color-magnitude diagram and this color-color plot.
The CMD above is very crowded; here we filter on object type and plot stars only.

In [None]:
source.data['rmi'] = results['mag_r_cModel'] - results['mag_i_cModel']
source.data['gmr'] = results['mag_g_cModel'] - results['mag_r_cModel']

Use a GroupFilter to select rows from the CDS that satisfy `object_type` = "star".

In [None]:
stars = CDSView(source=source,
                filters=[GroupFilter(column_name='object_type', group="star")])

Define the various options for the plot, and create the hover tool.

In [None]:
plot_options = {'plot_height': 350, 'plot_width': 350,
                'tools': ['box_zoom', 'box_select',
                          'lasso_select', 'reset', 'help']}

hover_tool = HoverTool(tooltips=[("(RA,DEC)", "(@ra, @dec)"),
                                 ("(g-r,g)", "(@gmr, @gmag)"),
                                 ("objectId", "@objectId"),
                                 ("type", "@object_type")])

Create the three plots: spatial, colour-magnitude, and colour-colour. Plot all three together on a grid.

In [None]:
# Spatial plot
title_spatial = 'Spatial centred on (RA,DEC) = '+use_center_coords

fig_spatial = figure(title=title_spatial,
                     x_axis_label="Delta RA", y_axis_label="Delta DEC",
                     **plot_options)
fig_spatial.circle(x='target_ra', y='target_dec',
                   source=source, view=stars,
                   size=4.0, alpha=0.6,
                   color='teal', hover_color='firebrick')
fig_spatial.add_tools(hover_tool)

# Colour-magnitude plot
fig_cmag = figure(title="Colour-Magnitude Diagram",
                  x_axis_label="g-r", y_axis_label="g",
                  x_range=(-1.0, 2.0), y_range=(23.5, 16),
                  **plot_options)
fig_cmag.circle(x='gmr', y='gmag', source=source, view=stars,
                size=4.0, alpha=0.6,
                color='teal', hover_color='firebrick')
fig_cmag.add_tools(hover_tool)

# Colour-colour plot
fig_cc = figure(title="Colour-Colour Diagram",
                x_axis_label="g-r", y_axis_label="r-i",
                x_range=(-1.0, 2.0), y_range=(-1.0, 2.5),
                **plot_options)
fig_cc.circle(x='gmr', y='rmi', source=source, view=stars,
              size=4.0, alpha=0.6,
              color='teal', hover_color='firebrick')
fig_cc.add_tools(hover_tool)

# Plot all three on a grid
p = gridplot([[fig_spatial, fig_cmag, fig_cc]])
show(p)

Use the hover tool to see information about individual datapoints (e.g., the object_id). 
This information should appear automatically as you hover the mouse over the datapoints in any of the plots.
Notice the data points highlighted in red on one panel with the hover tool are also highlighted on the other panels.

Click on the selection box icon (with a "+" sign) or the selection lasso icon found in the upper right corner of the figure. 
Use the selection box and selection lasso to make various selections in either panel by clicking and dragging on either panel. 
The selected data points will be displayed in the other panel.

Clean up.

In [None]:
del results, data, p, stars, source

## 5.0. Asynchronous TAP queries

So far, we have executed all queries as synchronous queries. This means that the query will continue executing in the notebook until it is finished. You can see when the Jupyter cell is running by the asterisk to the left of the  cell. For synchronous queries, the cell will continue to run until the query completes and the results are returned. The asterisk will then become a number. This is a good option for short queries that take seconds to minutes.

For longer queries, or for running multiple queries at the same time, an asynchronous query may be more suitable. Asynchronous queries allow you to execute more python while the query runs on the database. Results can be retrieved later on. This is especially important for queries that are long or may return a lot of results. It is also safeguards long queries against network outages or timeouts. 

Below, we confirm that the results from the asynchronous query are the same as from the synchronous query.

### 5.1. Define the query

We use a table join cone search query on ForcedSource and CcdVist, but with fewer conditions and a smaller radius than above.

In [None]:
use_radius = "0.1"

query = "SELECT fs.forcedSourceId, fs.objectId, fs.ccdVisitId, " + \
        "fs.detect_isPrimary, fs.band, " + \
        "scisql_nanojanskyToAbMag(fs.psfFlux) as psfMag, ccd.obsStartMJD " + \
        "FROM dp02_dc2_catalogs.ForcedSource as fs " + \
        "JOIN dp02_dc2_catalogs.CcdVisit as ccd " + \
        "ON fs.ccdVisitId = ccd.ccdVisitId " + \
        "WHERE CONTAINS(POINT('ICRS', fs.coord_ra, fs.coord_dec), " + \
        "CIRCLE('ICRS', " + use_center_coords + ", " + use_radius + ")) = 1 " + \
        "AND scisql_nanojanskyToAbMag(fs.psfFlux) > 17.5 " + \
        "AND scisql_nanojanskyToAbMag(fs.psfFlux) < 18.0 " + \
        "AND fs.band = 'r' "

print(query)

### 5.2. Synchronous query

Run the synchronous query and wait for the results, like we have above so many times.

This query usually takes <2 minutes.

In [None]:
%%time
results = service.search(query).to_table().to_pandas()

In [None]:
len_sync_results = len(results)
print(len_sync_results)

### 5.3. Asynchronous query

Create and submit the job. This step does not run the query yet.

Then get the job URL and the job phase. It will be "pending" as we have not yet started the job.

In [None]:
job = service.submit_job(query)

print('Job URL is', job.url)

print('Job phase is', job.phase)

Run the job. You will see that the the cell completes executing, even though the query is still running.

In [None]:
job.run()

Option: Use the following cell to tell python to wait for the job to finish if you don't want to run anything else while waiting.
The cell will continue executing until the job is finished.

In [None]:
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)

A useful funtion to raise an exception if there was a problem with the query.

In [None]:
job.raise_if_error()

Once the job completes successfully, you can fetch the results.

In [None]:
async_results = job.fetch_result().to_table().to_pandas()

In [None]:
assert len(async_results) == len_sync_results

Assert that the results are the same as obtained from executing synchronous queries. We've previously used the `sort_dataframe` funtion with the default sort key, `objectId`, but this time the `objectId` is not unique in our results dataframes, and so we must use `forcedSourceId`.

In [None]:
assert_frame_equal(sort_dataframe(results, sort_key='forcedSourceId'),
                   sort_dataframe(async_results, sort_key='forcedSourceId'))

### 5.4. Retrieving the results from a previous asynchronous job
Job results may still be available from previously run queries. You can retrieve these results if you know the URL of the job.
This includes jobs executed in the Portal. You can retrieve the URL for the query and retrieve the results

In [None]:
retrieved_job = retrieve_query(job.url)

retrieved_results = retrieved_job.fetch_result().to_table().to_pandas()

assert len(retrieved_results) == len_sync_results

assert_frame_equal(sort_dataframe(retrieved_results, sort_key='forcedSourceId'),
                   sort_dataframe(async_results, sort_key='forcedSourceId'))

### 5.5. Deleting a job
Once the job is finished and you have retrieved your results, you can delete the job and the results from the server. The results will be deleted automatically after a period of time.

In [None]:
job.delete()

Clean up.

In [None]:
del results, async_results, retrieved_results