<b>Nuclear Transient Searches and Contamination</b> <br>
Contact author: K. Decker French <br>
Last verified to run: 2022-08-17<br>
LSST Science Piplines version: Weekly 2022_22 <br>
Container Size: medium <br>

**Description:** Use the DP0.2 Object, DiaObject, and DiaSource tables to investigate noise that will be present when searching for nuclear transients, and to identify transients near selected host galaxies.

**Skills:** Use the TAP service and the DP0.2 Object, DiaObject and DiaSource tables.

**LSST Data Products:** TAP tables dp02_dc2_catalogs.Object, DiaObject and DiaSource.

**Packages:** lsst.rsp, astropy.cosmology, numpy, matplotlib

**Credit:** Developed by K. Decker French, based on material originally developed by Leanne Guy, Melissa Graham, Jeff Carlin and the Rubin Community Engagement Team for Data Preview 0. This work was made possible through the Preparing for Astrophysics with LSST Program, supported by the Heising-Simons Foundation and managed by Las Cumbres Observatory.

# 1. Introduction and setup

This notebook demonstrates an example science case investigating nuclear transients in galaxies. The simulated dataset in DP0.2 does not contain AGN variability or Tidal Disruption Events (TDEs), so we can use the data to characterize the underlying noise that will be present in real lightcurves of galaxy nuclei. This notebook also shows an example of using host galaxy properties to divide up variable sources to look at correlations with host galaxy color.

We will use three tables: 
- (1) Object: "Properties of the astronomical objects detected and measured on the deep coadded images."
- (2) DiaObject: "Properties of time-varying astronomical objects based on association of data from one or more spatially-related DiaSource detections on individual single-epoch difference images."
- (3) DiaSource: "Properties of transient-object detections on the single-epoch difference images."

Schema for each table can be found here: https://dm.lsst.org/sdm_schemas/browser/dp02.html

This notebook was developed using DP0.2 tutorials 2, 7a and 7b (Leanne Guy, Melissa Graham, Jeff Carlin and the Rubin Community Engagement Team for Data Preview 0), which contain more information on TAP queries and lightcurve analyses.

## 1.1. Package Imports

**lsst.rsp:** The LSST Science Pipelines package for RSP functionality such as the TAP service (<a href="http://pipelines.lsst.io">pipelines.lsst.io</a>).

**astropy.cosmology:** An open-source package of cosmology tools (<a href="https://docs.astropy.org/en/stable/cosmology/index.html">the astropy cosmology documentation</a>).

In [None]:
#package imports
import time
from IPython.display import Image

import numpy as np
import matplotlib.pyplot as plt
from astropy.table import Table

from lsst.rsp import get_tap_service

from astropy.cosmology import FlatLambdaCDM
cosmo = FlatLambdaCDM(H0=70, Om0=0.3)

## 1.2. Set up TAP service

In [None]:
# start TAP service
service = get_tap_service()

# 2. Identify transient sources in the nuclei of galaxies

DP0.2 does not contain AGN variability or Tidal Disruption Events (TDEs), so we can use the DiaObject and DiaSource results to investigate likely contaminants for future nuclear transient searches.



## 2.1. Select Galaxies 

To start, we can identify galaxies in the Object catalog by requiring refExtendedness = 1. This identifies galaxies that appear extended in the reference band (typically the i band). We also require detect_isPrimary = 1 to get only deblended objects.

Later on in this notebook, we'll aim to select based on host galaxy properties, so we start by restricting the objects to be brighter than 18 mag.

This query with max_rec=1000 took about 15 seconds of wall time

In [None]:
%%time

max_rec = 1000 

query = "SELECT objectId, coord_ra, coord_dec, detect_isPrimary, " + \
        "scisql_nanojanskyToAbMag(u_cModelFlux) AS u_cModelMag, " + \
        "scisql_nanojanskyToAbMag(r_cModelFlux) AS r_cModelMag, refExtendedness " + \
        "FROM dp02_dc2_catalogs.Object " + \
        "WHERE detect_isPrimary = 1 " + \
        "AND refExtendedness = 1 AND scisql_nanojanskyToAbMag(r_cModelFlux) < 18"

galaxies = service.search(query, maxrec=max_rec)

Inspect the results:

In [None]:
galaxies

## 2.2. Identify nuclear "transients" in these galaxies

For each galaxy, we now need to query the DiaObject catalog to find transient events in the nuclei of these galaxies. 

We will use the coordinates gathered from the results query above to cross-match. Note: In general, transient-host matching can be complex (see for example [Gagliano et al. 2021](https://ui.adsabs.harvard.edu/abs/2021ApJ...908..170G/abstract), [Qin et al. 2022](https://ui.adsabs.harvard.edu/abs/2022ApJS..259...13Q/abstract)). However, for this application, transients within 0.5 arcsec of the galaxy coordinates are likely to be associated with that galaxy. As a further caveat, we will only be selecting one DiaObject per Object.

In order to study the lightcurves of the resulting objects, we also require that there be >20 g-band data points.

For this example, we'll select the total flux measurements gTOTFluxMean and gTOTFluxSigma. These measurements will contain flux from both the host galaxy and any transient emission. We use the TOTFlux measurements from DiaObject and DiaSource instead of the flux measurements from ForcedSource, as the DiaObject sources are deblended. We also select the gPSFluxMean to test the typical flux on the difference images for these sources.

This next cell will execute a large number of queries (1000 if max_rec was not modified), but is quite fast in wall time (about 3 minutes for 1000 queries).

In [None]:
%%time

#set up a Table to collect our results
ra_arr = galaxies['coord_ra']
dec_arr = galaxies['coord_dec']
oid = galaxies['objectId']
results_table = Table({'ra': ra_arr, 
                       'dec': dec_arr,
                       'objectID': oid,
                       'u_cModelMag': galaxies['u_cModelMag'],
                       'r_cModelMag': galaxies['r_cModelMag'],
                       'diaObjectID': np.zeros_like(oid),
                       'matches': np.zeros_like(ra_arr),
                       'gTOTMagMean': np.zeros_like(ra_arr),
                       'gTOTMagSigma': np.zeros_like(ra_arr),
                       'gPSFluxMean': np.zeros_like(ra_arr)})

#iterate over each galaxy selected above from Object
for jj, res in enumerate(galaxies):
    ra = res['coord_ra']
    dec = res['coord_dec']
    center_coords = '{0:2f},  {1:2f}'.format(ra, dec)
    radius = "0.000139" #0.5 arcsec in degrees
    query = "SELECT ra, decl, diaObjectId, gPSFluxNdata, " + \
            "scisql_nanojanskyToAbMag(gTOTFluxMean) AS gTOTMagMean, " + \
            "gPSFluxMean, " + \
            "scisql_nanojanskyToAbMagSigma(gTOTFluxMean, gTOTFluxSigma) as gTOTMagSigma " + \
            "FROM dp02_dc2_catalogs.DiaObject " + \
            "WHERE CONTAINS(POINT('ICRS', ra, decl), " + \
            "CIRCLE('ICRS', " + center_coords + ", " + radius + ")) = 1 " + \
            "AND gPSFluxNdata > 20"
    result_ds = service.search(query)
    if len(result_ds) == 0: continue #if there is nothing returned, results_table['matches'] will stay = 0
    else:
        results_table['diaObjectID'][jj] = result_ds['diaObjectId'][0]
        results_table['matches'][jj] = len(result_ds)
        results_table['gTOTMagMean'][jj] = result_ds['gTOTMagMean'][0]
        results_table['gTOTMagSigma'][jj] = result_ds['gTOTMagSigma'][0]
        results_table['gPSFluxMean'][jj] = result_ds['gPSFluxMean'][0]


Inspect the results

In [None]:
results_table

Count how many galaxies had a match in DiaObject within 0.5" of the center

In [None]:
np.size(np.where(results_table['matches'] >0))

About 1/3-2/3 of the galaxies have a match. 

## 2.3. Plot lightcurves

Next, let's query the DiaObject sources in the DiaSource catalog to get lightcurves, and plot some of them. This next cell is directly adapted from tutorial notebook 7a., and plots the first 5 objects.

In [None]:
#only select the DiaObject sources that are matched
matched = results_table[results_table['matches'] > 0]

#define plot symbols and colors
plot_filter_labels = ['u', 'g', 'r', 'i', 'z', 'y']
plot_filter_colors = {'u' : '#56b4e9', 'g' : '#008060', 'r' : '#ff4000',
                     'i' : '#850000', 'z' : '#6600cc', 'y' : '#000000'}
plot_filter_symbols = {'u' : 'o', 'g' : '^', 'r' : 'v', 'i' : 's', 'z' : '*', 'y' : 'p'}

fig, ax = plt.subplots(5, 1, figsize=(10, 10), sharey=False, sharex=False)


for i in range(5):
    result = service.search("SELECT ra, decl, diaObjectId, diaSourceId, "
                             "filterName, midPointTai, "
                             "scisql_nanojanskyToAbMag(totFlux) AS psAbMag "
                             "FROM dp02_dc2_catalogs.DiaSource "
                             "WHERE diaObjectId = "+str(matched['diaObjectID'][i]))

    for f, filt in enumerate(plot_filter_labels):
        fx = np.where(result['filterName'] == filt)[0]
        ax[i].plot(result['midPointTai'][fx], result['psAbMag'][fx],
                      plot_filter_symbols[filt], ms=10, mew=0, alpha=0.5,
                      color=plot_filter_colors[filt],label=plot_filter_labels[f])
        del fx

    ax[i].invert_yaxis()
    ax[i].set_title(matched['diaObjectID'][i])

    if i == 4:
        ax[i].xaxis.set_label_text('MJD (days)')
    ax[i].yaxis.set_label_text('mag')


    del result

plt.tight_layout()
plt.legend(bbox_to_anchor=(1.04,1), loc="upper left")
plt.show()

These look spurious, consistent with our expectation that there should be no true nuclear transients in this data set (with the exception of potentially finding a lucky SN Ia).

## 2.4. Analyze DiaObject information

We can use the information from the DiaObject table to see the range of lightcurve statistics for the selected sources.

For example, the next cell shows a histogram of the mean flux in the difference images.

In [None]:
plt.hist(matched['gPSFluxMean'],bins=40,range=np.array([-20000,20000]))
plt.axvline(0,color='k')
plt.xlabel('gPSFluxMean')
plt.ylabel('Number')
plt.show()

The distribution looks roughly symmetric around zero. Many of the sources have negative mean fluxes in the difference image. This could be caused by two effects: either noise from the lack of any true transients, or real transients like SN Ia contaminating the reference images.

We can test whether the mean or median flux is positive and zoom in on the center of this plot:

In [None]:
plt.hist(matched['gPSFluxMean'],bins=40,range=np.array([-2000,2000]))
plt.axvline(0,color='k')
plt.axvline(np.mean(matched['gPSFluxMean']),label='Mean',color='red')
plt.axvline(np.median(matched['gPSFluxMean']),label='Median',color='purple')
plt.xlabel('gPSFluxMean')
plt.ylabel('Number')
plt.legend()
plt.show()

The mean and median are both positive, although barely. 

Next, let's explore the total flux parameters. The gTOTMagSigma quantity will describe the underlying noise we will have to deal with in future searches for nuclear transients.

In [None]:
plt.hist(matched['gTOTMagSigma'])
plt.xlabel('gTOTMagSigma')
plt.ylabel('Number')
plt.show()

We can select some of the sources with high sigma and inspect the lightcurves.

In [None]:
subset = matched[(matched['gTOTMagSigma'] > 0.2)]

fig, ax = plt.subplots(5, 1, figsize=(10, 10), sharey=False, sharex=False)


for i in range(5):
    result = service.search("SELECT ra, decl, diaObjectId, diaSourceId, "
                             "filterName, midPointTai, "
                             "scisql_nanojanskyToAbMag(totFlux) AS psAbMag "
                             "FROM dp02_dc2_catalogs.DiaSource "
                             "WHERE diaObjectId = "+str(subset['diaObjectID'][i]))

    f = 1
    filt = plot_filter_labels[1]
    fx = np.where(result['filterName'] == filt)[0]
    ax[i].plot(result['midPointTai'][fx], result['psAbMag'][fx],
                  plot_filter_symbols[filt], ms=10, mew=0, alpha=0.5,
                  color=plot_filter_colors[filt],label=plot_filter_labels[f])
    del fx

    ax[i].invert_yaxis()
    ax[i].set_title(subset['diaObjectID'][i])

    if i == 4:
        ax[i].xaxis.set_label_text('MJD (days)')
    ax[i].yaxis.set_label_text('mag')


    del result

plt.tight_layout()
plt.legend(bbox_to_anchor=(1.04,1), loc="upper left")
plt.show()

Some of these sources have one large outlier driving up sigma, others seem to show real variability, which could be confused with damped random walk AGN-like variability.

# 3. Select candidate transients based on host galaxy properties

Next, let's use the information about the host galaxies from the Object catalog. We'll first look at the galaxy colors to see if the [classic galaxy bimodality](https://ui.adsabs.harvard.edu/abs/2004ApJ...600..681B/abstract) is observed. 

In [None]:
#visualize color bimodality
urcolor = galaxies['u_cModelMag'] - galaxies['r_cModelMag']
plt.hist(urcolor,bins=20,range=np.array([0,3.5]))
plt.xlabel('u-r color')
plt.ylabel('Number')
plt.title('Galaxies from Object search')
plt.show()

We see a clear red and blue peaks. Because we have only included galaxies with r < 18 mag in part 2.1 above, the fact that we have not k-corrected these colors has not washed out the bimodality. One avenue to explore is using the ugrizY host magnitudes to estimate k-corrections.

Next, let's see whether the host galaxies with a match in DiaSource have a different distribution in u-r colors.

In [None]:
urcolor = matched['u_cModelMag'] - matched['r_cModelMag']
plt.hist(urcolor,bins=20,range=np.array([0,3.5]))
plt.xlabel('u-r color')
plt.ylabel('Number')
plt.title('Galaxies from DiaObject match')
plt.show()

We observe fewer red galaxies in the sample after cross matching with the DiaObject catalog. This effect may be due to a lesser contribution by SN Ia in this sample, or a systematic effect in creating DiaObject sources.

Next, we can divide out the host galaxies into red and blue. and look at the lightcurve variation of each. Here, we'll use the gTOTMagSigma for each lightcurve from the DiaObject catalog to parameterize the typical noise in the nuclear variability of each type of galaxy.

In [None]:
red = np.where((urcolor > 2) & (urcolor < 3.5))[0]
blue = np.where((urcolor > 1.7) & (urcolor < 2))[0]

colors = np.array([1.85, 2.75])
colors_range = np.array([0.15, 0.75])
sigma = np.zeros_like(colors)
sigma_unc = np.zeros_like(colors)
#blue
sigma[0] = np.mean(matched['gTOTMagSigma'][blue])
sigma_unc[0] = np.std(matched['gTOTMagSigma'][blue])/np.sqrt(np.size(blue))
#red
sigma[1] = np.mean(matched['gTOTMagSigma'][red])
sigma_unc[1] = np.std(matched['gTOTMagSigma'][red])/np.sqrt(np.size(red))

plt.errorbar(colors,sigma,xerr=colors_range,yerr=sigma_unc,linestyle='None',marker='o')
plt.xlabel('u-r color')
plt.ylabel('Average st. dev. of total g-band (mag)')
plt.show()


Red galaxies have a factor $\sim1.5\times$ more noise than blue galaxies. This effect may be related to high central SÃ©rsic indices for quiescent red galaxies (typically $n\sim4$) compared to blue star-forming galaxies (which typically have $n\sim1$ exponential disks). Galaxies with high central densities may be difficult to detect nuclear transients in. With a larger sample (and applying k-corrections), differing trends could be explored for green valley galaxies etc. (Note: in some runs, these results seem to change due to the inclusion of events with large outliers. When testing this notebook with much larger sample sizes (~10,000 galaxies), the higher average sigma with redder color seems to hold.)