<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=250 style="padding: 10px"> 
<b>DiaObject Sample Identification</b> <br>
Contact author: Melissa Graham <br>
Last verified to run: <i>yyyy-mm-dd</i> <br>
LSST Science Piplines version: Weekly <i>yyyy_xx</i> <br>
Container Size: medium <br>
Targeted learning level: intermediate <br>

In [None]:
# %load_ext pycodestyle_magic
# %flake8_on
# import logging
# logging.getLogger("flake8").setLevel(logging.FATAL)

**Description:** To use the DiaObject table parameters to identify a sample of time-variable objects of interest.

**Skills:** Use the TAP service and the DP0.2 DiaObject and DiaSource tables.

**LSST Data Products:** TAP tables dp02_dc2_catalogs.DiaObject and DiaSource.

**Packages:** lsst.rsp, astropy.cosmology, numpy, matplotlib

**Credit:** Originally developed by Melissa Graham and the Rubin Community Engagement Team for Data Preview 0. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.

**Get Support:**
Find DP0-related documentation and resources at <a href="https://dp0-2.lsst.io">dp0-1.lsst.io</a>. Questions are welcome as new topics in the <a href="https://community.lsst.org/c/support/dp0">Support - Data Preview 0 Category</a> of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

## 1. Introduction

This notebook demonstrates how to use the DiaObject table's lightcurve summary statistics to identify samples of objects for further study, and how to retrieve and plot lightcurves for DiaObjects of interest from the DiaSource table.

As a example, a sample of potential Type Ia supernovae are identified and retrieved from the DiaObject table.

Learn more about the contents of the DiaObject and DiaSource tables in the <a href="https://dp0-2.lsst.io">DP0.2 Documentation</a>.

### 1.1. Package Imports

**lsst.rsp:** The LSST Science Pipelines package for RSP functionality such as the TAP service (<a href="http://pipelines.lsst.io">pipelines.lsst.io</a>).

**astropy.cosmology:** An open-source package of cosmology tools (<a href="https://docs.astropy.org/en/stable/cosmology/index.html"> the astropy cosmology documentation</a>).

In [None]:
import time
from IPython.display import Image

import numpy
import matplotlib.pyplot as plt

from lsst.rsp import get_tap_service

from astropy.cosmology import FlatLambdaCDM
cosmo = FlatLambdaCDM(H0=70, Om0=0.3)

### 1.2. Define functions and parameters

Start the TAP service.

In [None]:
service = get_tap_service()

Set a few parameters to use later, when plotting lightcurves.

In [None]:
plt.style.use('tableau-colorblind10')
plt.rcParams.update({'font.size':16})

plot_filter_labels = ['u', 'g', 'r', 'i', 'z', 'y']
plot_filter_colors = ['darkviolet', 'darkgreen', 'firebrick', 'darkorange', 'brown', 'black']
plot_filter_symbols = ['o', '^', 'v', 's', '*', 'p']

## 2. Understand the DiaObject table's contents

**DiaSource Table:** Measurements for sources detected with a signal-to-noise ratio > 5 in the difference-images (i.e., lightcurves).
Note that these measurements are not forced photometry at the locations of all DiaObjects, but detections only.

**DiaObject Table:** Summary parameters for DiaSources associated by coordinate (including lightcurve summary statistics).

Option to print all of the available column names for the DiaObject table.

In [None]:
# results = service.search("SELECT column_name from TAP_SCHEMA.columns "\
#                          "WHERE table_name = 'dp02_dc2_catalogs.DiaObject'")
# for column_name in results['column_name']:
#     print(column_name)

### 2.1. Review these descriptions for selected lightcurve summary statistics

The DiaObjects table includes the following lightcurve summary statistics, which are derived from the contents of the DiaSource table.

**nDiaSources:** The number of difference-image detections in any filter (i.e., number of DiaSources associated with a given DiaObject).
 
The following statistics are all based on _difference-image point source (PS) flux values_ for each filter [f].<br>
**[f]PSFluxMin:** The faintest flux. <br>
**[f]PSFluxMax:** The brightest flux. <br>
**[f]PSFluxMean:** The average flux. <br>
**[f]PSFluxSigma:** The standard deviation of the fluxes. <br>
**[f]PSFluxMAD:** The mean absolute deviation of the fluxes (i.e., the average distance from the mean). <br>
**[f]PSFluxChi2:** The Chi2 statistic for the scatter of the fluxes around the mean. <br>
**[f]PSFluxNdata:** The number of data points used to compute [f]PSFluxChi2. <br>
**[f]PSFluxSkew:** A measure of asymmentry in the distribution of fluxes about the mean (where 0 means symmetric). <br>
**[f]PSFluxStetsonJ:** A variability index developed for Cepheids (defined in <a href="https://ui.adsabs.harvard.edu/abs/1996PASP..108..851S/abstract">Stetson 1996</a>). <br>
**[f]PSFluxPercentile05, 25, 50, 75, 95:** Derived from the cumulative distribution of flux values. <br>

The following statistics are all based on the _direct-image total (TOT) flux values_ for each filter [f]. <br>
**[f]TOTFluxMean:** The average flux.  <br>
**[f]TOTFluxSigma:** The standard deviation of the fluxes.  <br>

> **Note:** The DP0.2 DiaObject table is missing some variability characterization parameters (<a href="https://dmtn-118.lsst.io">DMTN-118</a>) and host association parameters (<a href="https://dmtn-151.lsst.io">DMTN-151</a>) which will exist for future data releases.

### 2.2. Plot summary-statistic histograms for a random sample of DiaObjects

In order to learn a bit more abut these lightcurve summary statistics, plot histograms of their values for a random selection of DiaObjects are plotted.

First, retrieve a random sample of DiaObjects. This query takes about 1.5 minutes.

In [None]:
%%time

results = service.search("SELECT ra, decl, diaObjectId, nDiaSources, "
                         "rPSFluxMin, rPSFluxMax, rPSFluxMean, rPSFluxSigma, "
                         "rPSFluxMAD, rPSFluxChi2, rPSFluxNdata, rPSFluxSkew, "
                         "rPSFluxStetsonJ, rPSFluxPercentile05, rPSFluxPercentile25, "
                         "rPSFluxPercentile50, rPSFluxPercentile75, rPSFluxPercentile95, "
                         "rTOTFluxMean, rTOTFluxSigma "
                         "FROM dp02_dc2_catalogs.DiaObject ",
                         maxrec=100000)

DiaObjs = results.to_table()
del results

Plot the distribution of the number of DiaSources per DiaObject (i.e., the total number of difference-image detections in any filter; at left), and the distribution of the number of r-band DiaSources per DiaObject (at right).

In [None]:
fig, ax = plt.subplots( 1,2, figsize=(14,4), sharey=False, sharex=False)

ax[0].hist(DiaObjs['nDiaSources'], bins=50, log=True, color='grey')
ax[0].set_xlabel('nDiaSources')
ax[0].set_ylabel('log(Number of DiaObjects)')

ax[1].hist(DiaObjs['rPSFluxNdata'], bins=50, log=True, color=plot_filter_colors[2])
ax[1].set_xlabel('rPSFluxNdata')

plt.tight_layout()
plt.show()

Plot the distribution of minimum and maximum r-band PS flux from the difference-image detections.
The PS fluxes can be negative because they are measured on the difference images.

In [None]:
fig, ax = plt.subplots( 1,2, figsize=(14,4), sharey=False, sharex=False)

ax[0].hist(DiaObjs['rPSFluxMin'], bins=100, log=True, color=plot_filter_colors[2])
ax[0].set_xlabel('rPSFluxMin')
ax[0].set_ylabel('log(Number of DiaObjects)')

ax[1].hist(DiaObjs['rPSFluxMax'], bins=100, log=True, color=plot_filter_colors[2])
ax[1].set_xlabel('rPSFluxMax')

plt.tight_layout()
plt.show()

It is left as an exercise for the learner to investigate typical values of rPSFluxMean, rPSFluxSigma, rPSFluxMAD, rPSFluxChi2, rPSFluxSkew, and/or rPSFluxStetsonJ.

Plot the distributions of the DiaObject parameters (mean and sigma) derived from the total fluxes from the direct-images. Note that the TOT fluxes _cannot be_ negative because they are measured on the direct images.

In [None]:
fig, ax = plt.subplots( 1,2, figsize=(14,4), sharey=False, sharex=False)

ax[0].hist(DiaObjs['rTOTFluxMean'], bins=50, log=True, color=plot_filter_colors[2])
ax[0].set_xlabel('rTOTFluxMean')
ax[0].set_ylabel('log(Number of DiaObjects)')

ax[1].hist(DiaObjs['rTOTFluxSigma'], bins=50, log=True, color=plot_filter_colors[2])
ax[1].set_xlabel('rTOTFluxSigma')

plt.tight_layout()
plt.show()

### 2.3. Investigate the DiaSource data for one random, bright, well-sampled DiaObject

Choose a DiaObject that was detected >40 times in an r-band difference image, had an average total (direct-image) flux > 1e6, and an average PS difference-image flux > 5e5.

In [None]:
tx = numpy.where((DiaObjs['rPSFluxNdata'] > 40) & \
                 (DiaObjs['rTOTFluxMean'] > 1000000) & \
                 (DiaObjs['rPSFluxMean'] > 500000))[0]

use_index = tx[0]
use_diaObjectId = DiaObjs['diaObjectId'][tx[0]]
del tx

Retrieve the DiaSource data for this DiaObject. (A UnitsWarning about nanojansky not supported by VOUnit is OK to ignore.)

In [None]:
results = service.search("SELECT ra, decl, diaObjectId, diaSourceId, "
                         "filterName, midPointTai, psFlux, totFlux "
                         "FROM dp02_dc2_catalogs.DiaSource "
                         "WHERE diaObjectId = "+str(use_diaObjectId))

DiaSrcs = results.to_table()
del results

fx = numpy.where(DiaSrcs['filterName'] == 'r')[0]

Plot the difference-image (PSFlux) and direct-image (TOTFlux) lightcurves. 
Mark the DiaObject minimum and maximum flux with solid lines; average flux with dashed lines; and the standard deviation in flux with dotted lines.

In [None]:
fig, ax = plt.subplots( 2, figsize=(14,8), sharey=False, sharex=False)

ax[0].plot(DiaSrcs['midPointTai'][fx], DiaSrcs['psFlux'][fx], 
           plot_filter_symbols[2], ms=15, mew=0, alpha=0.5, color=plot_filter_colors[2])
ax[0].axhline(DiaObjs['rPSFluxMin'][use_index])
ax[0].axhline(DiaObjs['rPSFluxMax'][use_index])
ax[0].axhline(DiaObjs['rPSFluxMean'][use_index], ls='dashed')
ax[0].axhline(DiaObjs['rPSFluxMean'][use_index] - DiaObjs['rPSFluxSigma'][use_index], ls='dotted')
ax[0].axhline(DiaObjs['rPSFluxMean'][use_index] + DiaObjs['rPSFluxSigma'][use_index], ls='dotted')
ax[0].set_xlabel('Modified Julian Date')
ax[0].set_ylabel('PS Flux')
ax[0].set_title('Difference-Image PSFlux r-band Lightcurve')

ax[1].plot(DiaSrcs['midPointTai'][fx], DiaSrcs['totFlux'][fx], 
         plot_filter_symbols[2], ms=15, mew=0, alpha=0.5, color=plot_filter_colors[2])
ax[1].axhline(DiaObjs['rTOTFluxMean'][use_index], ls='dashed')
ax[1].axhline(DiaObjs['rTOTFluxMean'][use_index] - DiaObjs['rTOTFluxSigma'][use_index], ls='dotted')
ax[1].axhline(DiaObjs['rTOTFluxMean'][use_index] + DiaObjs['rTOTFluxSigma'][use_index], ls='dotted')
ax[1].set_xlabel('Modified Julian Date')
ax[1].set_ylabel('Total Flux')
ax[1].set_title('Direct-Image TOTFlux r-band Lightcurve')

plt.tight_layout()
plt.show()

Plot the distribution (left) and normalized cumulative distribution (right) of difference-image fluxes (PSFlux), along with the relevant DiaObject characterization parameters (e.g., mean, sigma, and percentiles).
Mark the average flux with dashed lines and the standard deviation in flux with dotted lines.

In [None]:
fig, ax = plt.subplots( 1,2, figsize=(14,4), sharey=False, sharex=False)

ax[0].hist(DiaSrcs['psFlux'][fx], bins=20, color=plot_filter_colors[2])
ax[0].axvline(DiaObjs['rPSFluxMean'][use_index], ls='dashed')
ax[0].axvline(DiaObjs['rPSFluxMean'][use_index] - DiaObjs['rPSFluxSigma'][use_index], ls='dotted')
ax[0].axvline(DiaObjs['rPSFluxMean'][use_index] + DiaObjs['rPSFluxSigma'][use_index], ls='dotted')
ax[0].set_xlabel('r-band PS Flux')
ax[0].set_ylabel('Number of DiaSources')
ax[0].text(-0.18e6,12.5,'skew=%5.2f' % DiaObjs['rPSFluxSkew'][use_index])

ax[1].hist(DiaSrcs['psFlux'][fx], bins=len(fx), color=plot_filter_colors[2],\
           cumulative=True, density=True, histtype='step')
ax[1].plot(DiaObjs['rPSFluxPercentile05'][use_index], 0.05, '*', ms=10, color='black', \
           label='percentiles: 0.05, 0.25, 0.50, 0.75, and 0.95')
ax[1].plot(DiaObjs['rPSFluxPercentile25'][use_index], 0.25, '*', ms=10, color='black')
ax[1].plot(DiaObjs['rPSFluxPercentile50'][use_index], 0.50, '*', ms=10, color='black')
ax[1].plot(DiaObjs['rPSFluxPercentile75'][use_index], 0.75, '*', ms=10, color='black')
ax[1].plot(DiaObjs['rPSFluxPercentile95'][use_index], 0.95, '*', ms=10, color='black')
ax[1].set_xlabel('r-band PS Flux')
ax[1].set_ylabel('Cumulative Fraction')
ax[1].legend(loc='upper left', fontsize=14)

plt.tight_layout()
plt.show()

Clean up by deleting a few arrays that are no longer needed.

In [None]:
del DiaObjs, DiaSrcs, fx
del use_index, use_diaObjectId

## 3. Identify DiaObjects that are potential Type Ia Supernovae

For this example, a sample of potential low-redshift, well-sampled Type Ia supernovae (SNIa) are identified.

Comared to other types of supernovae, SNIa have homogenous lightcurves with very similar peak absolute brightnesses (about -19 mag in B-band), and similar rise and decline times (i.e., similar observed lightcurve amplitudes and durations, for a survey with a given limiting magnitude).

In LSST-like data sets with an r-band limiting magnitude of about 24.5 mag, such as the DC2 simulation, low-redshift SNIa (0.1 < _z_ < 0.3) lightcurves would look approximately like those plotted in the image below.

In this example, the expected peak brightnesses, amplitude, and duration for the lightcurves of low-redshift SNIa are used to construct a TAP query on the DiaObjects table to identify a sample of potential SNIa.

In [None]:
Image(filename='data/template_lowz_Vband.png') 

### 3.1. Establish lightcurve parameter constraints to identify potential SNIa

Define the desired redshift boundaries, the known approximate peak absolute magnitude for SNeIa (-19 mag), and the desired peak range to use to create the TAP query. This range roughly includes the intrinsic diversity in SNIa brightness and color, and host-galaxy reddening.

In [None]:
redshift_min = 0.1
redshift_max = 0.3

snia_peak_mag = -19.0
snia_peak_mag_range = 0.5

Use the astropy.cosmology package to convert redshift to distance modulus. 
Define the minimum and maximum peak apparent r-band magnitudes -- allowing for the intrinsic diversity range specified above -- to use in the 
TAP query.

In [None]:
snia_peak_mr_min = cosmo.distmod(redshift_min).value + snia_peak_mag - snia_peak_mag_range
snia_peak_mr_max = cosmo.distmod(redshift_max).value + snia_peak_mag + snia_peak_mag_range
print('The minimum and maximum apparent r-band magnitudes '
      'to use in the TAP query are %5.2f and %5.2f mag.' %
      (snia_peak_mr_min, snia_peak_mr_max))

Define maximum magnitudes in the g- and i-bands to use in the TAP query. The point of this is to simply enforce detection in at least the three filters g, r, and i. With knowledge of SNIa colors, this could be made more constraining, but for the purposes of this example these values are fine.

In [None]:
snia_peak_mg_max = 24.0
snia_peak_mi_max = 24.0

Define the r-band minimum and maximum lightcurve amplitudes to use in the TAP query (i.e., the difference between the brightest and faintest detections in the difference image, in magnitudes). Well-sampled, low-redshift SNIa should be observed to change by at least 1.5 mag (_z_=0.3), and up to 5.5 mag (_z_=0.1).

In [None]:
snia_ampl_mr_min = 1.5
snia_ampl_mr_max = 5.5

Define the minimum and maximum number of DiaSources (i.e., difference-image detections) to use in the TAP query.
The goal of this example is to identify potential _well-sampled_ Type Ia supernovae, and here a minimum of 15 detections is used.
Since the DC2 dataset was simulated using a baseline observing strategy (and does not include deep drilling fields), there are no more than 100 visits per year per field.
Any DiaObject with >100 DiaSources had a duration >1 year, and is not a SNIa.

In [None]:
snia_nDiaSources_min = 15
snia_nDiaSources_max = 100

Define the minimum and maximum lightcurve duration, in days. 
The duration is the time between the first and last difference-image detection in any filter.
As seen in the template lightcurve plot above, SNIa at redshifts 0.1 < _z_ < 0.3 will have durations of 50 to 300 days.

In [None]:
snia_duration_min = 50
snia_duration_max = 300

> **Important Note:** Of the above parameters defined to identify potential SNIa, only the lightcurve duration is _not_ represented in the DiaObjects table, and cannot be included as a constraint in the TAP query used below.
Instead, the lightcurve durations are calculated and used in Section 3.3.

### 3.2. Retrieve a sample of potentially SNIa-like DiaObjects

Only retrieve 1000 DiaObjects for this example.
When the TAP query completes, transfer the results to an astropy table.

This TAP query takes about a minute.

In [None]:
%%time

results = service.search("SELECT ra, decl, diaObjectId, nDiaSources, "
                         "scisql_fluxToAbMag(rPSFluxMin/1e32) AS rMagMax, "
                         "scisql_fluxToAbMag(rPSFluxMax/1e32) AS rMagMin, "
                         "scisql_fluxToAbMag(gPSFluxMax/1e32) AS gMagMin, "
                         "scisql_fluxToAbMag(iPSFluxMax/1e32) AS iMagMin, "
                         "scisql_fluxToAbMag(rPSFluxMin/1e32) - scisql_fluxToAbMag(rPSFluxMax/1e32) AS rMagAmp "
                         "FROM dp02_dc2_catalogs.DiaObject "
                         "WHERE nDiaSources > "+str(snia_nDiaSources_min)+" "
                         "AND nDiaSources < "+str(snia_nDiaSources_max)+" "
                         "AND scisql_fluxToAbMag(rPSFluxMax/1e32) > "+str(snia_peak_mr_min)+" "
                         "AND scisql_fluxToAbMag(rPSFluxMax/1e32) < "+str(snia_peak_mr_max)+" "
                         "AND scisql_fluxToAbMag(gPSFluxMax/1e32) < "+str(snia_peak_mg_max)+" "
                         "AND scisql_fluxToAbMag(iPSFluxMax/1e32) < "+str(snia_peak_mi_max)+" "
                         "AND scisql_fluxToAbMag(rPSFluxMin/1e32) - scisql_fluxToAbMag(rPSFluxMax/1e32) < "+str(snia_ampl_mr_max)+" "
                         "AND scisql_fluxToAbMag(rPSFluxMin/1e32) - scisql_fluxToAbMag(rPSFluxMax/1e32) > "+str(snia_ampl_mr_min)+" ",
                         maxrec=1000)

DiaObjs = results.to_table()
del results

Option to display the table

In [None]:
# DiaObjs

Plot histograms to characterize a few of the DiaObject parameters for this initial sample.

In [None]:
fig, ax = plt.subplots( 1,3, figsize=(14,4), sharey=False, sharex=False)

ax[0].hist(DiaObjs['rMagMin'], bins=20, color=plot_filter_colors[2])
ax[0].set_xlabel('Brightest Detected r-band Magnitude')
ax[0].set_ylabel('Number of Potential SNIa')

ax[1].hist(DiaObjs['rMagAmp'], bins=20, color=plot_filter_colors[2])
ax[1].set_xlabel('Amplitude in r-band Magnitude')

ax[2].hist(DiaObjs['nDiaSources'], bins=20, color='grey')
ax[2].set_xlabel('Number of DiaSources')

plt.tight_layout()
plt.show()

### 3.3. Calculate lightcurve duration and identify potential SNIa

The lightcurve duration -- time between the first and last detected DiaSource in any filter -- is not included in the DiaObject table.
It is calculated below, using all of the DiaSources for each DiaObject.

This query takes about 4 minutes to calculate the lightcurve duration for the 1000 DiaObjects.

In [None]:
%%time

DiaObjs['duration'] = numpy.zeros(len(DiaObjs), dtype='float')

for j,DiaObjId in enumerate(DiaObjs['diaObjectId']):
    results = service.search("SELECT diaObjectId, midPointTai "
                             "FROM dp02_dc2_catalogs.DiaSource "
                             "WHERE diaObjectId = "+str(DiaObjId))
    results = results.to_table()
    DiaObjs['duration'][j] = numpy.max(results['midPointTai']) - numpy.min(results['midPointTai'])
    del results

Select only DiaObjects that have lightcurve durations within the specified range for SNIa.

In [None]:
tx = numpy.where((DiaObjs['duration'] > snia_duration_min) & \
                 (DiaObjs['duration'] < snia_duration_max))[0]
print(len(tx))

Make a few histograms to characterize the lightcurve durations.

In [None]:
fig, ax = plt.subplots( 1,2, figsize=(14,4), sharey=False, sharex=False)

ax[0].hist(DiaObjs['duration'][tx], bins=20, color='grey')
ax[0].set_xlabel('Lightcurve Duration [days]')
ax[0].set_ylabel('Number of Potential SNIa')

ax[1].plot(DiaObjs['duration'][tx], DiaObjs['rMagAmp'][tx], 'o', color='grey')
ax[1].set_xlabel('Lightcurve Duration [days]')
ax[1].set_ylabel('Amplitude in r-band [mag]')

plt.tight_layout()
plt.show()

### 3.4. Plot multi-band lightcurves for the first 20 potential SNIa

In [None]:
fig, ax = plt.subplots( 5,4, figsize=(14,14), sharey=False, sharex=False)

x = 0

for i in range(5):
    for j in range(4):
        results = service.search("SELECT ra, decl, diaObjectId, diaSourceId, "
                                 "filterName, midPointTai, "
                                 "scisql_fluxToAbMag(psFlux/1e32) AS psAbMag "
                                 "FROM dp02_dc2_catalogs.DiaSource "
                                 "WHERE diaObjectId = "+str(DiaObjs['diaObjectId'][tx[x]]))
        results = results.to_table()

        for f, filt in enumerate(plot_filter_labels):
            fx = numpy.where(results['filterName'] == filt)[0]
            ax[i,j].plot(results['midPointTai'][fx], results['psAbMag'][fx], 
                         plot_filter_symbols[f], ms=15, mew=0, alpha=0.5, 
                         color=plot_filter_colors[f])
            del fx
        
        x += 1
        del results
        ax[i,j].invert_yaxis()

plt.tight_layout()
plt.show()

## 4. Further Exercises

From here, there are a variety of exercises that a learner could undertake.

1. Change the query constraints and identify a sample of high-redshift SNIa.
2. Tighten the query contraints to identify only SNIa with pre-peak observations.
3. Apply a lightcurve-template fitter, or try a photometric classifier on the sample.