<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=250 style="padding: 10px"> 
<b>DiaObject Sample Identification</b> <br>
Contact author: Melissa Graham <br>
Last verified to run: <i>yyyy-mm-dd</i> <br>
LSST Science Piplines version: Weekly <i>yyyy_xx</i> <br>
Container Size: medium <br>
Targeted learning level: intermediate <br>

In [None]:
# %load_ext pycodestyle_magic
# %flake8_on
# import logging
# logging.getLogger("flake8").setLevel(logging.FATAL)

**Description:** To use the DiaObject catalog parameters to identify samples of time-variable objects of interest.

**Skills:** Use the TAP service. Use the DP0.2 time-domain catalogs.

**LSST Data Products:** DP0.2 DiaObject and DiaSource table data.

**Packages:** lsst.rsp, astropy.cosmology, numpy, matplotlib

**Credit:** Originally developed by Melissa Graham and the Rubin Community Engagement Team for Data Preview 0. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.

**Get Support:**
Find DP0-related documentation and resources at <a href="https://dp0-2.lsst.io">dp0-1.lsst.io</a>. Questions are welcome as new topics in the <a href="https://community.lsst.org/c/support/dp0">Support - Data Preview 0 Category</a> of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

## 1. Introduction

This notebook guides the use of DiaObject summary parameters to identify samples of objects for further study.

Learn more about the contents of the DiaObject and DiaSource tables in the <a href="https://dp0-2.lsst.io">DP0.2 Documentation</a>.

### 1.1. Package Imports

**lsst.rsp:** The LSST Science Pipelines package for RSP functionality such as the TAP service (<a href="http://pipelines.lsst.io">pipelines.lsst.io</a>).

**astropy.cosmology:** An open-source package of cosmology tools (<a href="https://docs.astropy.org/en/stable/cosmology/index.html"> the astropy cosmology documentation</a>).

In [None]:
import time

import numpy
import matplotlib.pyplot as plt
plt.style.use('tableau-colorblind10')

from lsst.rsp import get_tap_service

from astropy.cosmology import FlatLambdaCDM
cosmo = FlatLambdaCDM(H0=70, Om0=0.3)

### 1.2. Define functions and parameters

Start the TAP service.

In [None]:
service = get_tap_service()

Set a few parameters to use when plotting lightcurves.

In [None]:
plot_filter_labels = ['u', 'g', 'r', 'i', 'z', 'y']
plot_filter_colors = ['darkviolet', 'darkgreen', 'firebrick', 'darkorange', 'brown', 'black']
plot_filter_symbols = ['o', '^', 'v', 's', '*', 'p']

## 2. Understand the DiaObject parameters

Option to print all of the available column names for the DiaObject table.

In [None]:
# results = service.search("SELECT column_name from TAP_SCHEMA.columns "\
#                          "WHERE table_name = 'dp02_dc2_catalogs.DiaObject'")
# for column_name in results['column_name']:
#     print(column_name)

**Brief descriptions for a selection of the DiaObject parameters.**

nDiaSources --> The number of difference-image detections.
 
The following are all statistics of the _detected difference-image point source (PS) flux values_ for each filter [f].<br>
[f]PSFluxMin --> The faintest flux. <br>
[f]PSFluxMax --> The brightest flux. <br>
[f]PSFluxMean --> The average flux. <br>
[f]PSFluxSigma --> The standard deviation of the fluxes. <br>
[f]PSFluxMAD --> The mean absolute deviation of the fluxes (i.e., the average distance from the mean). <br>
[f]PSFluxChi2 --> The Chi2 statistic for the scatter of the fluxes around the mean. <br>
[f]PSFluxNdata --> The number of data points used to compute [f]PSFluxChi2. <br>
[f]PSFluxSkew --> A measure of asymmentry in the distribution of fluxes about the mean (where 0 means symmetric). <br>
[f]PSFluxStetsonJ --> A variability index developed for Cepheids (defined in <a href="https://ui.adsabs.harvard.edu/abs/1996PASP..108..851S/abstract">Stetson 1996</a>). <br>
[f]PSFluxPercentile05, 25, 50, 75, 95 --> Derived from the cumulative distribution of flux values. <br>

The following are statistics of the _total (TOT) direct-image flux values_ for each filter [f]. <br>
[f]TOTFluxMean --> The average flux.  <br>
[f]TOTFluxSigma --> The standard deviation of the fluxes.  <br>

> **Note:** The DP0.2 DiaObject table is missing some variability characterization parameters (<a href="https://dmtn-118.lsst.io">DMTN-118</a>) and host association parameters (<a href="https://dmtn-151.lsst.io">DMTN-151</a>) which will exist for future data releases.


### 2.1. Retrieve a random sample of DiaObjects

In [None]:
%%time

results = service.search("SELECT ra, decl, diaObjectId, nDiaSources, "
                         "rPSFluxMin, rPSFluxMax, rPSFluxMean, rPSFluxSigma, "
                         "rPSFluxMAD, rPSFluxChi2, rPSFluxNdata, rPSFluxSkew, "
                         "rPSFluxStetsonJ, rPSFluxPercentile05, rPSFluxPercentile25, "
                         "rPSFluxPercentile50, rPSFluxPercentile75, rPSFluxPercentile95, "
                         "rTOTFluxMean, rTOTFluxSigma "
                         "FROM dp02_dc2_catalogs.DiaObject ",
                         maxrec=100000)

DiaObjs = results.to_table()
del results

### 2.2. Plot histograms to characterize the DiaObject parameters

Below, the left plot shows the distribution of the number of DiaSources per DiaObject (i.e., the total number of difference-image detections in any filter), and the right plot shows the distribution of the number of r-band DiaSources per DiaObject.

In [None]:
fig, ax = plt.subplots( 1,2, figsize=(10,3), sharey=False, sharex=False)

ax[0].hist(DiaObjs['nDiaSources'], bins=50, log=True, color='grey')
ax[0].set_xlabel('nDiaSources')
ax[0].set_ylabel('log(Number of DiaObjects)')

ax[1].hist(DiaObjs['rPSFluxNdata'], bins=50, log=True, color=plot_filter_colors[2])
ax[1].set_xlabel('rPSFluxNdata')

plt.show()

Below, a grid of distributions of the DiaObject parameters derived from the PS fluxes from the difference-image detections. Note that the PS fluxes can be negative because they are measured on the difference images.

In [None]:
fig, ax = plt.subplots( 2,4, figsize=(14,6), sharey=False, sharex=False)
        
ax[0,0].hist(DiaObjs['rPSFluxMin'], bins=50, log=True, color='grey')
ax[0,0].set_xlabel('rPSFluxMin')
ax[0,0].set_ylabel('log(Number of DiaObjects)')
ax[0,1].hist(DiaObjs['rPSFluxMax'], bins=50, log=True, color='grey')
ax[0,1].set_xlabel('rPSFluxMax')
ax[0,2].hist(DiaObjs['rPSFluxMean'], bins=50, log=True, color='grey')
ax[0,2].set_xlabel('rPSFluxMean')
ax[0,3].hist(DiaObjs['rPSFluxSigma'], bins=50, log=True, color='grey')
ax[0,3].set_xlabel('rPSFluxSigma')

ax[1,0].hist(DiaObjs['rPSFluxMAD'], bins=50, log=True, color='grey')
ax[1,0].set_xlabel('rPSFluxMAD')
ax[1,0].set_ylabel('log(Number of DiaObjects)')
ax[1,1].hist(DiaObjs['rPSFluxChi2'], bins=50, log=True, color='grey')
ax[1,1].set_xlabel('rPSFluxChi2')
ax[1,2].hist(DiaObjs['rPSFluxSkew'], bins=50, log=True, color='grey')
ax[1,2].set_xlabel('rPSFluxSkew')
ax[1,3].hist(DiaObjs['rPSFluxStetsonJ'], bins=50, log=True, color='grey')
ax[1,3].set_xlabel('rPSFluxStetsonJ')

plt.tight_layout()
plt.show()

Below, the distributions of the DiaObject parameters (mean and sigma) derived from the total fluxes from the direct-images. Note that the TOT fluxes _cannot be_ negative because they are measured on the direct images.

In [None]:
fig, ax = plt.subplots( 1,2, figsize=(8,3), sharey=False, sharex=False)

ax[0].hist(DiaObjs['rTOTFluxMean'], bins=50, log=True, color='grey')
ax[0].set_xlabel('rTOTFluxMean')
ax[0].set_ylabel('log(Number of DiaObjects)')
ax[1].hist(DiaObjs['rTOTFluxSigma'], bins=50, log=True, color='grey')
ax[1].set_xlabel('rTOTFluxSigma')

plt.tight_layout()
plt.show()

### 2.3. Investigate one random, bright, well-sampled DiaObject

Choose a DiaObject that was detected >40 times in an r-band difference image, had an average total (direct-image) flux > 1e6, and an average PS difference-image flux > 5e5.

In [None]:
tx = numpy.where((DiaObjs['rPSFluxNdata'] > 40) & \
                 (DiaObjs['rTOTFluxMean'] > 1000000) & \
                 (DiaObjs['rPSFluxMean'] > 500000))[0]

use_index = tx[0]
use_diaObjectId = DiaObjs['diaObjectId'][tx[0]]
del tx

Retrieve the lightcurve from the DiaSource table.

In [None]:
results = service.search("SELECT ra, decl, diaObjectId, diaSourceId, "
                         "filterName, midPointTai, psFlux, totFlux "
                         "FROM dp02_dc2_catalogs.DiaSource "
                         "WHERE diaObjectId = "+str(use_diaObjectId))

DiaSrcs = results.to_table()
del results

fx = numpy.where(DiaSrcs['filterName'] == 'r')[0]

Below, plot the difference-image (PSFlux) and direct-image (TOTFlux) lightcurves.

In [None]:
fig, ax = plt.subplots( 2, figsize=(14,6), sharey=False, sharex=False)

ax[0].plot(DiaSrcs['midPointTai'][fx], DiaSrcs['psFlux'][fx], 
           plot_filter_symbols[2], ms=15, mew=0, alpha=0.5, color=plot_filter_colors[2])
ax[0].axhline(DiaObjs['rPSFluxMin'][use_index])
ax[0].axhline(DiaObjs['rPSFluxMax'][use_index])
ax[0].axhline(DiaObjs['rPSFluxMean'][use_index], ls='dashed')
ax[0].axhline(DiaObjs['rPSFluxMean'][use_index] - DiaObjs['rPSFluxSigma'][use_index], ls='dotted')
ax[0].axhline(DiaObjs['rPSFluxMean'][use_index] + DiaObjs['rPSFluxSigma'][use_index], ls='dotted')
ax[0].set_xlabel('Modified Julian Date')
ax[0].set_ylabel('PS Flux')
ax[0].set_title('Difference-Image PS Flux r-band Lightcurve')

ax[1].plot(DiaSrcs['midPointTai'][fx], DiaSrcs['totFlux'][fx], 
         plot_filter_symbols[2], ms=15, mew=0, alpha=0.5, color=plot_filter_colors[2])
ax[1].axhline(DiaObjs['rTOTFluxMean'][use_index], ls='dashed')
ax[1].axhline(DiaObjs['rTOTFluxMean'][use_index] - DiaObjs['rTOTFluxSigma'][use_index], ls='dotted')
ax[1].axhline(DiaObjs['rTOTFluxMean'][use_index] + DiaObjs['rTOTFluxSigma'][use_index], ls='dotted')
ax[1].set_xlabel('Modified Julian Date')
ax[1].set_ylabel('Total Flux')
ax[1].set_title('Direct-Image Total Flux r-band Lightcurve')

plt.tight_layout()
plt.show()

print('Above, solid lines mark the DiaObject minimum and maximum flux; '\
      'dashed lines the average flux, and dotted lines the standard deviation in flux.')

<br>

Below, plot the distribution (left) and normalized cumulative distribution (right) of difference-image fluxes (PSFlux), along with the relevant DiaObject characterization parameters (e.g., mean, sigma, and percentiles).

In [None]:
fig, ax = plt.subplots( 1,2, figsize=(14,4), sharey=False, sharex=False)

ax[0].hist(DiaSrcs['psFlux'][fx], bins=20, color=plot_filter_colors[2])
ax[0].axvline(DiaObjs['rPSFluxMean'][use_index], ls='dashed')
ax[0].axvline(DiaObjs['rPSFluxMean'][use_index] - DiaObjs['rPSFluxSigma'][use_index], ls='dotted')
ax[0].axvline(DiaObjs['rPSFluxMean'][use_index] + DiaObjs['rPSFluxSigma'][use_index], ls='dotted')
ax[0].set_xlabel('r-band PS Flux')
ax[0].set_ylabel('Number of DiaSources')
ax[0].set_title('Distribution of r-band PS Fluxes (skew = '+str(DiaObjs['rPSFluxSkew'][use_index])+')')

ax[1].hist(DiaSrcs['psFlux'][fx], bins=len(fx), color=plot_filter_colors[2],\
           cumulative=True, density=True, histtype='step')
ax[1].plot(DiaObjs['rPSFluxPercentile05'][use_index], 0.05, '*', ms=10, color='black', \
           label='percentiles: 0.05, 0.25, 0.50, 0.75, and 0.95')
ax[1].plot(DiaObjs['rPSFluxPercentile25'][use_index], 0.25, '*', ms=10, color='black')
ax[1].plot(DiaObjs['rPSFluxPercentile50'][use_index], 0.50, '*', ms=10, color='black')
ax[1].plot(DiaObjs['rPSFluxPercentile75'][use_index], 0.75, '*', ms=10, color='black')
ax[1].plot(DiaObjs['rPSFluxPercentile95'][use_index], 0.95, '*', ms=10, color='black')
ax[1].set_xlabel('r-band PS Flux')
ax[1].set_ylabel('Cumulative Fraction of DiaSources')
ax[1].set_title('Normalized Cumulative Distribution of r-band PS Fluxes')
ax[1].legend(loc='upper left')

plt.tight_layout()
plt.show()

print('Above left, dashed lines mark the average flux and dotted lines the standard deviation in flux.')

Clean up.

In [None]:
del DiaObjs, DiaSrcs, fx

## 3. Identify a Type Ia Supernova sample

For this example, a sample of potential low-redshift Type Ia supernovae (SNIa) are identified.

Comared to other types of supernovae, SNIa have homogenous light curves with very similar peak absolute brightnesses (about -19 mag in B-band), and similar rise and decline times (i.e., similar durations for a given limiting magnitude).

In LSST-like data sets such as the DC2 simulation, low-redshift SNIa (0.1 < _z_ < 0.3)

**CONTINUE WRITING HERE**
<br><br>

### 3.1. Establish TAP query constraints for a SNIa sample



#### 3.1.1 Apparent Magnitude

Define the desired redshift boundaries, the approximate peak absolute magnitude for SNeIa, and the desired range in peak to consider.

Use the astropy.cosmology package to convert redshift to distance modulus and define the range of peak apparent r-band magnitudes, assuming that Type Ia supernovae have an intrinsic brightness of about -19 magnitudes.

Define maximum magnitudes in the g- and i-bands to enforce detection in at least the three filters g, r, and i.

Define the r-band minimum and maximum lightcurve amplitudes to consider (i.e., the difference between the brightest and faintest detections in the difference image, in magnitudes).

In [None]:
redshift_min = 0.1
redshift_max = 0.3
snia_peak_mag = -19.0
snia_peak_mag_range = 0.5

snia_peak_mr_min = cosmo.distmod(redshift_min).value + snia_peak_mag - snia_peak_mag_range
snia_peak_mr_max = cosmo.distmod(redshift_max).value + snia_peak_mag + snia_peak_mag_range
print('Min and max apparent r-band magnitudes are %5.2f and %5.2f mag.' %
      (snia_peak_mr_min, snia_peak_mr_max))

snia_peak_mg_max = 24.0
snia_peak_mi_max = 24.0

snia_rampl_min = 1.5
snia_rampl_max = 6.0

#### 3.1.2. Number of DiaSources

The goal was to identify potential _well-sampled_ Type Ia supernovae, so define the minimum number of lightcurve points (number of DiaSources).

Since the DC2 dataset was simulated using a baseline observing strategy (and does not include deep drilling fields), there are no more than 100 visits per year per field. Any DiaObject with >100 DiaSources had a duration >1 year, and is not a SNIa.

In [None]:
nDiaSources_min = 15
nDiaSources_max = 100

### 3.2. Retrieve potential SNeIa from the DiaObjects table

The query takes about a minute.

When query completes, transfer the results to an astropy table.

In [None]:
%%time

results = service.search("SELECT ra, decl, diaObjectId, nDiaSources, "
                         "scisql_fluxToAbMag(rPSFluxMin/1e32) AS rMagMax, "
                         "scisql_fluxToAbMag(rPSFluxMax/1e32) AS rMagMin, "
                         "scisql_fluxToAbMag(gPSFluxMax/1e32) AS gMagMin, "
                         "scisql_fluxToAbMag(iPSFluxMax/1e32) AS iMagMin, "
                         "scisql_fluxToAbMag(rPSFluxMin/1e32) - scisql_fluxToAbMag(rPSFluxMax/1e32) AS rMagAmp "
                         "FROM dp02_dc2_catalogs.DiaObject "
                         "WHERE nDiaSources > "+str(nDiaSources_min)+" "
                         "AND nDiaSources < "+str(nDiaSources_max)+" "
                         "AND scisql_fluxToAbMag(rPSFluxMax/1e32) > "+str(snia_peak_mr_min)+" "
                         "AND scisql_fluxToAbMag(rPSFluxMax/1e32) < "+str(snia_peak_mr_max)+" "
                         "AND scisql_fluxToAbMag(gPSFluxMax/1e32) < "+str(snia_peak_mg_max)+" "
                         "AND scisql_fluxToAbMag(iPSFluxMax/1e32) < "+str(snia_peak_mi_max)+" "
                         "AND scisql_fluxToAbMag(rPSFluxMin/1e32) - scisql_fluxToAbMag(rPSFluxMax/1e32) < "+str(snia_rampl_max)+" "
                         "AND scisql_fluxToAbMag(rPSFluxMin/1e32) - scisql_fluxToAbMag(rPSFluxMax/1e32) > "+str(snia_rampl_min)+" ",
                         maxrec=1000)

DiaObjs = results.to_table()
del results

Option to display the table

In [None]:
# DiaObjs

### 3.3. Plot histograms to further characterize these potential SNeIa

In [None]:
fig, ax = plt.subplots( 1,3, figsize=(14,3), sharey=False, sharex=False)

ax[0].hist(DiaObjs['rMagMin'], bins=20)
ax[0].set_xlabel('Brightest Detected r-band Magnitude')

ax[1].hist(DiaObjs['rMagAmp'], bins=20)
ax[1].set_xlabel('Amplitude in r-band Magnitude')

ax[2].hist(DiaObjs['nDiaSources'], bins=20)
ax[2].set_xlabel('Number of Difference-Image Detections')

plt.show()

### 3.4. Use DiaSources to calculate lightcurve duration

DiaSources are the difference-image photometry (lightcurves).

The sample of interest used in this notebook, low-redshift Type Ia supernovae, would be bright enough for detection (<24.5 mag) for up to one year.

Calculate the lightcurve duration -- the difference between the dates of last and first DiaSource -- for all of the DiaObjects returned.

This query takes about 3.5 minutes.

In [None]:
%%time

DiaObjs['duration'] = numpy.zeros(len(DiaObjs), dtype='float')

for j,DiaObjId in enumerate(DiaObjs['diaObjectId']):
    results = service.search("SELECT ra, decl, diaObjectId, diaSourceId, "
                             "filterName, midPointTai, psFlux, psFluxErr "
                             "FROM dp02_dc2_catalogs.DiaSource "
                             "WHERE diaObjectId = "+str(DiaObjId))
    results = results.to_table()
    DiaObjs['duration'][j] = numpy.max(results['midPointTai']) - numpy.min(results['midPointTai'])
    del results

### 3.5. Make plots to characterize the lightcurve durations

In [None]:
fig, ax = plt.subplots( 1,3, figsize=(14,3), sharey=False, sharex=False)

ax[0].hist(DiaObjs['duration'], bins=20)
ax[0].set_xlabel('Lightcurve Duration (Any Filter)')

ax[1].plot(DiaObjs['duration'], DiaObjs['rMagAmp'], 'o')
ax[1].set_xlabel('Lightcurve Duration (Any Filter)')
ax[1].set_ylabel('Amplitude in r-band Magnitude')

ax[2].plot(DiaObjs['duration'], DiaObjs['nDiaSources'], 'o')

plt.show()

### 3.6. plot lightcurves of potential SNIa

Consider "potential SNIa" as the DiaObjects with durations < 1 year and with a magnitude amplitude >0.5 mag in the r-filter.

For plotting, define the filter names, colors, and symbols to be used.

The number of DiaObjects that have a duration of up to one year.

In [None]:
tx = numpy.where(DiaObjs['duration'] < 250)[0]
print(len(tx))

In [None]:
fig, ax = plt.subplots(len(DiaObjs[tx]), figsize=(14,20), sharey=False, sharex=False)

for i, j in enumerate(tx):
    results = service.search("SELECT ra, decl, diaObjectId, diaSourceId, "
                             "filterName, midPointTai, "
                             "scisql_fluxToAbMag(psFlux/1e32) AS psAbMag "
                             "FROM dp02_dc2_catalogs.DiaSource "
                             "WHERE diaObjectId = "+str(DiaObjs['diaObjectId'][j]))
    results = results.to_table()

    for f, filt in enumerate(filter_names):
        fx = numpy.where(results['filterName'] == filt)[0]
        ax[i].plot(results['midPointTai'][fx], results['psAbMag'][fx], 
                   filter_symbol[f], ms=15, mew=0, alpha=0.5, color=filter_color[f])
        del fx
    
    ax[i].set_ylim([numpy.max(results['psAbMag'])+0.3, numpy.min(results['psAbMag'])-0.3])
    
    del results

## 4. Further Work

The next steps towards science with the sample of interest might include applying a lightcurve template fitter, or photometric classification codes, to the sample.

Another analysis option would be to use the DiaForcedSource table for the lightcurves, in order to include photometry measured below the detection limit.