# User-Uploaded Catalogs
<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=250 style="padding: 10px"> 
<br>
Contact authors: Yumi Choi <br>
Last verified to run: 2024-06-03 <br>
LSST Science Piplines version: Weekly 2024_16 <br>
Container Size: small <br>
Targeted learning level: Beginner <br>

In [None]:
%load_ext pycodestyle_magic
%flake8_on
import logging
logging.getLogger("flake8").setLevel(logging.FATAL)

**Description:** Use the TAP upload functionality for user-supplied tables and join them with DP0.3 catalogs.

**Skills:** Use the TAP service to upload a table and join it to an LSST table with ADQL.

**LSST Data Products:** TAP tables dp03_catalogs_10yr.SSObject, dp03_catalogs_10yr.MPCORB, dp03_catalogs_10yr.DiaSource

**Packages:** lsst.rsp.get_tap_service

**Credit:**
Developed by Yumi Choi. This tutorial is based on <a href="https://dp0-3.lsst.io/tutorials-dp0-3/portal-dp0-3-5.html">a Portal tutorial</a> by Christina Williams for using user-supplied tables in queries for DP0.3 and <a href="https://github.com/rubin-dp0/cst-dev/blob/main/MLG_sandbox/DP03/gaia_hack_day_Feb2024.ipynb">a Jupyter Notebook</a> by Melissan Graham and Jake Kurlander for accessing Gaia data and matching with DP0.3 data for solar system objects.

**Get Support:** Find DP0-related documentation and resources at <a href="https://dp0.lsst.io">dp0.lsst.io</a>. Questions are welcome as new topics in the <a href="https://community.lsst.org/c/support/dp0">Support - Data Preview 0 Category</a> of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

## 1. Introduction

This notebook illustrates the process of uploading user-provided tables via the TAP service and integrating them into queries for DP0.3. It focuses on two types of user-provided tables: (1) tables generated outside the RSP, and (2) tables created within the RSP but retrieved from an external database using [PyVO](https://github.com/astropy/pyvo).

### 1.1. Import packages

Import general python packages and the Rubin Table Access Protocol (TAP) service.

[PyVO](https://pyvo.readthedocs.io/en/latest/) is a package providing access to remote data and services of the Virtual observatory (VO) using Python.

In [None]:
import os
import getpass
import matplotlib.pyplot as plt
import numpy as np
import pyvo
from astropy.table import Table
from lsst.rsp import get_tap_service

### 1.2. Define parameters

Set a few style parameters for the plots.

In [None]:
plt.style.use('tableau-colorblind10')
params = {'axes.labelsize': 12,
          'font.size': 12,
          'legend.fontsize': 10}
plt.rcParams.update(params)

Define the path to the home directory, where outputs generated by this tutorial will be saved.

In [None]:
my_home_dir = '/home/' + getpass.getuser() + '/'
print(my_home_dir)

Start the TAP service and assert that it exists.

In [None]:
rsp_tap = get_tap_service("ssotap")
assert rsp_tap is not None

## 2. Spatial and temporal cross-match to diaSources

The scientific scenario here is that a list of expected locations of moving objects
across multiple nights is in-hand, and the question is whether LSST detected them.

The query below uploads this list to the TAP service and cross-matches to the table of detected sources 
that results from difference image analysis: the `diaSource` table
(learn more about [DP0.3 catalogs](https://dp0-3.lsst.io/data-products-dp0-3/index.html#dp0-3-data-products-definition-document-dpdd)).

The list of identifier indicies, right ascensions, declinations, and modified julian dates (id, ra, dec, and mjd) is
provided in a file (nine rows, i.e., nine expected locations and times).

The format for user-uploadable tables is to have the column names in the first row, with no # symbol at the start of the row.

In [None]:
fnm1 = 'data/dp03_06_user_table_1.cat'

**Option:** View the contents of the file with `more`, or return the word count (lines, words, characters) with `wc`.

In [None]:
# os.system('more ' + fnm1)
# os.system('wc ' + fnm1)

Read the file as user table 1, `ut1`.

In [None]:
ut1 = Table.read(fnm1, format='ascii.basic')

Define a query to cross match the coordinates and dates in the file to detections in the DP0.3 `diaSource` catalog.

This query is applying a spatial threshold of 10 arcseconds (0.00278 degrees), and a temporal threshold of half a day.

Notice that the columns from the user-uploaded table are being renamed in the query, e.g., `ut1.ra AS ut1_ra`.
If this renaming is not done, the TAP service will rename the columns from the user table as `ra2` and `dec2`
to distinguish them from the `ra` and `dec` columns from the `DiaSource` table.

In [None]:
query = """
    SELECT dias.ra, dias.dec, dias.midPointMjdTai, dias.ssObjectId,
    ut1.ra AS ut1_ra, ut1.dec AS ut1_dec, ut1.mjd AS ut1_mjd, ut1.id AS ut1_id
    FROM dp03_catalogs_10yr.DiaSource AS dias, TAP_UPLOAD.ut1 AS ut1
    WHERE CONTAINS(POINT('ICRS', dias.ra, dias.dec),
    CIRCLE('ICRS', ut1.ra, ut1.dec, 0.00278))=1
    AND ABS(dias.midPointMjdTai - ut1.mjd) < 0.5
    ORDER BY dias.ssObjectId
    """

Create the job by submitting the query and then run it asynchronously.

In [None]:
job = rsp_tap.submit_job(query, uploads={"ut1": ut1})
job.run()

Check that the job is completed.

In [None]:
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)

Retrieve the results and display them.

In [None]:
results = job.fetch_result().to_table()
results

In the table above, notice that the `ssObjectId` is all the same.
This is because the file was created to contain the detections of a single moving object across multiple nights.

Notice also that whereas the user-uploaded table had 9 rows, there are 15 matches to the `diaSource` table.
Six of the rows had two matches.
This means that six times, this moving object was detected twice within 10 arcseconds in the same night.

**Create a plot out of this data.**
Moving objects do not exhibit a constant rate of motion on the sky in terms of, e.g., arcsec per minute.
However, for the purposes of plotting the results of this cross-matching, calculate the spatial and temporal
offsets between the detected `diaSource`s and the user uploaded expectations, and then plot the spatial vs. the temporal offsets.
Eight of these will be zero (no offset; perfect match) but for the additional, newly-discovered detections in
the `diaSource` catalog which were not in the user-uploaded table, there should be larger spatial offsets
for larger temporal offsets.

In [None]:
cos_dec = np.cos(np.deg2rad(results['dec']))
delta_ra = cos_dec * np.abs(results['ra'] - results['ut1_ra'])
delta_dec = np.abs(results['dec'] - results['ut1_dec'])
s_off = 3600.0 * np.sqrt(delta_ra**2 + delta_dec**2)
t_off = 24.0 * 60.0 * np.abs(results['midPointMjdTai'] - results['ut1_mjd'])
del cos_dec, delta_ra, delta_dec

In [None]:
fig = plt.figure(figsize=(6, 4))
plt.plot(t_off, s_off, 'o', ms=10, alpha=0.3, mew=0)
plt.xlabel('time difference (minutes)')
plt.ylabel('sky distance (arcseconds)')
plt.title('distance between same-night observations')
plt.show()

> Figure 1: Above, the sky distance in arcseconds is plotted versus the time difference in minutes for the 15 detections cross-matched with the 9 anticipated coordinates and times in the user-uploaded table.

Clean up.

In [None]:
del t_off, s_off
del fnm1, ut1, query, job, results

## 3. ADQL table join with a user-supplied table of SSObject IDs

This section demonstrates how to upload a user-supplied table and join it with a DP0.3 table using a list of SSObjectIds. The same example table of SSObjectIds used in Step 2 of <a href="https://dp0-3.lsst.io/tutorials-dp0-3/portal-dp0-3-5.html">this Portal tutorial</a> is stored in the local `data/` directory. The table includes two SSObjectIds.

### 3.1 Read and upload a user-supplied table of SSObject IDs and execute query against the `diaSource` table

In [None]:
t_cat_2 = Table.read('./data/portal_tut05_useruploadcat2.cat', format='ascii.basic')

In [None]:
query = """
    SELECT ut2.ssObjectId_user, dia.ssObjectId, dia.ra, dia.dec
    FROM TAP_UPLOAD.t2 as ut2
    INNER JOIN dp03_catalogs_10yr.diaSource as dia
    ON ut2.ssObjectId_user = dia.ssObjectId
    """

results = lsst_tap.search(query, uploads={"t2": t_cat_2}).to_table()
results

In [None]:
uniqIds = np.unique(results['ssObjectId'])
print('Confirm that %d matched unique objects are returned.' % (len(uniqIds)))

For a quick sanity check, create a plot showing sky distribution of these two unique objects over 10 years.

In [None]:
for i in uniqIds:
    add = results['ssObjectId'] == i
    plt.scatter(results['ra'][add], results['dec'][add], 
                label='ssObjectId:'+str(i))
plt.xlabel('ra [deg]')
plt.ylabel('dec [deg]')
plt.legend()
plt.show()

## 4. Gaia data for DP0.3 asteroids

The previous two sections describe the cases of pre-exist user-supplied tables. This section demonstrates how to retrieve an external table, as an example, from the Gaia Archive, using `PyVO` in real time within the RSP and upload the table to use in a joint query with the LSST database. This allows you to directly access to external databases, if they provide a TAP service URL, and use them within the RSP. For example, <a href="https://datalab.noirlab.edu">Data Lab</a> TAP service URL is: https://datalab.noirlab.edu/tap. 

### 4.1 Access to Gaia database and retrieve Gaia data for main-belt asteroids (MBAs)

Get an instance of the `Gaia TAP` service using `PyVO`, and assert that it exists.

In [None]:
gaia_tap_url = 'https://gea.esac.esa.int/tap-server/tap'
gaia_tap = pyvo.dal.TAPService(gaia_tap_url)
assert gaia_tap is not None
assert gaia_tap.baseurl == gaia_tap_url

Query MBAs from the Gaia database following the population definition used by the JPL Horizons small body database query tool (https://ssd.jpl.nasa.gov/tools/sbdb_query.html): 2.0 < `a` < 3.25 au and `q` > 1.666 au. To expedite the query, restrict the number of observations to be more than 200 and the number of objects to retrieve to 1000. Save the result table in a pandas DataFrame to faciliate string modification in the following steps.

In [None]:
Nobj = 1000

query = """
    SELECT TOP {} denomination, inclination,
           eccentricity, semi_major_axis
    FROM gaiadr3.sso_orbits
    WHERE num_observations > 200
    AND semi_major_axis > 2.0
    AND semi_major_axis < 3.2 
    AND semi_major_axis*(1-eccentricity) > 1.666
    """.format(Nobj)

t_gaia = gaia_tap.search(query).to_table().to_pandas()
t_gaia

Make plots showing eccentricity (`e`) and inclination (`i`) against semi-major axis (`a`) of the retrived MBAs. These plots should resemble the second figure in Section 2.2.2 of the notebook tutorial: <a href="https://github.com/rubin-dp0/tutorial-notebooks/blob/main/DP03_02_Main_Belt_Asteroids.ipynb">DP03_02_Main_Belt_Astetroids.ipynb</a>.

In [None]:
fig = plt.figure(figsize=(13, 5))

plt.subplot(121)
plt.scatter(t_gaia['semi_major_axis'], t_gaia['eccentricity'], 
            s=3, alpha=0.3)
plt.xlabel('a [au]')
plt.ylabel('e')

plt.subplot(122)
plt.scatter(t_gaia['semi_major_axis'], np.sin(t_gaia['inclination']), 
            s=3, alpha=0.3)
plt.xlabel('a [au]')
plt.ylabel(r'sin($i$)')
plt.show()

The `MPCORB` table in DP0.3 follows the standard format for MPC designation, which is the year, then a space, and then an identifier that is several characters, capital letters and numbers. The `denomination` in some Gaia tables uses underscores instead of spaces and/or lower-case letters instead of capital letters. Thus, the `denomination` column needs to be converted to the standard MPC designation format.

In [None]:
t_gaia['mpc_desig1'] = t_gaia['denomination'].str.replace('_', ' ').str.upper()
t_gaia['mpc_desig2'] = t_gaia['denomination'].str.replace('_', ' ')
t_gaia

Save the result as a csv file in `my_home_dir` for the next step as well as for any follow-up analysis in the future. This is optional. 

In [None]:
# t_gaia.to_csv(my_home_dir+'gaia_mba.cat', index=False)

### 4.2 Upload the Gaia table and Join with DP0.3 tables

The TAP service does not accept uploads in a Pandas DataFrame format. Thus, conversion to an Astropy Table is needed.

In [None]:
t_cat_3 = Table.from_pandas(t_gaia)
t_cat_3.dtype

Uploading a table including a column(s) with the dtype of the Unicode string does not work. Converting unicode columns (dtype.kind='U') to bytestring (dtype.kind='S') using the `convert_unicde_to_bytestring` method is required.

In [None]:
t_cat_3.convert_unicode_to_bytestring()

In [None]:
query = """
    SELECT mpc.mpcDesignation, mpc.ssObjectId,
           ut3.denomination, ut3.mpc_desig2,
           dia.ssObjectId, dia.midPointMjdTai, dia.mag, dia.band
    FROM dp03_catalogs_10yr.MPCORB AS mpc
    INNER JOIN TAP_UPLOAD.t3 as ut3
    ON ut3.mpc_desig1 = mpc.mpcDesignation
    INNER JOIN dp03_catalogs_10yr.DiaSource as dia
    ON dia.ssObjectId = mpc.ssObjectId
    Where dia.band = 'g'
    """

t_lsst = lsst_tap.search(query, uploads={"t3": t_cat_3}).to_table()

Select one random asteroid and save its data entry as a small table called `random_mba`.

In [None]:
uniqueObj = np.random.choice(t_lsst['ssObjectId'], 1)[0]
print(uniqueObj)
random_mba = t_lsst[t_lsst['ssObjectId'] == uniqueObj]

Retrieve Gaia individual observations for the selected asteroid. 

Occasionally, the following error message may occur: 'DALServiceError: 401 Client Error: 401 for url: https://gea.esac.esa.int/tap-server/tap/sync'. To avoid it, get an instance of the `Gaia TAP` service again before running the query below.

In [None]:
gaia_tap_url = 'https://gea.esac.esa.int/tap-server/tap'
gaia_tap = pyvo.dal.TAPService(gaia_tap_url)
assert gaia_tap is not None
assert gaia_tap.baseurl == gaia_tap_url

In [None]:
query = """
    SELECT epoch_utc, g_mag
    FROM gaiadr3.sso_observation
    WHERE denomination = '{}'
    """.format(random_mba['mpc_desig2'][0])

t_gaia_obs = gaia_tap.search(query).to_table()

Convert Universal Time Coordinated (UTC) observation date to Modified Julian Date (MJD) observation date.

From the [documentation](https://gea.esac.esa.int/archive/documentation/GDR2/Gaia_archive/chap_datamodel/sec_dm_sso_tables/ssec_dm_sso_observation.html), `epoch_utc` is the Gaiacentric epoch UTC, while the LSST records observing time in MJD. The conversion is MJD = UTC + 55197.5 (day).

In [None]:
t_gaia_obs['epoch_mjd'] = t_gaia_obs['epoch_utc'] + 55197.5

Plot the `g`-band light curve for the selected asteroid.

In [None]:
fig = plt.figure(figsize=(6, 4))
plt.plot(t_gaia_obs['epoch_mjd'], t_gaia_obs['g_mag'],
         'o', color='cyan', label='Gaia G')
plt.plot(random_mba['midPointMjdTai'], random_mba['mag'],
         'o', color='dodgerblue', label='LSST g')
ymin = min(t_gaia_obs['g_mag'].min(), min(random_mba['mag']))
ymax = max(t_gaia_obs['g_mag'].max(), min(random_mba['mag']))
plt.ylim(ymax+0.1, ymin-0.1)
plt.legend(loc='upper left')
plt.xlabel('MJD')
plt.ylabel('mag')
plt.title(random_mba['mpcDesignation'][0])
plt.show()

## 5. Excercises for the learner

1. Generate your own user table, perform a spatial and temporal search of the DiaSource table to look for a sample of solar system bodies observed in a specific part of the sky at a specific time. Save the query result table, and use it to search the SSSource table for all observations that exist, by matching on SSObjectId.