# SNIa Host Association

Contact author: Melissa Graham

Date last verified to run: Fri Dec 30, 2022

RSP environment version: Weekly 2022_40

## 1. Introduction

The DP0.2 `DiaObjects` catalog **does not contain** the host association parameters described in the Rubin Observatory Data Products Definitions Document (DPDD; <a href="https://docushare.lsstcorp.org/docushare/dsweb/Get/LSE-163/LSE-163_DataProductsDefinitionDocumentDPDD.pdf">ls.st/dpdd</a>) and in the Data Management Tech Note on "Host Galaxy Association for DIAObjects" (<a href="https://dmtn-151.lsst.io/">DMTN-151</a>), but future data releases will.

As listed in Table 3 of Version 3.7 of the DPDD, the future `DiaObjects` catalog will contain the following columns.

 * `nearbyObj` = Closest `Objects` (3 stars and 3 galaxies) in the Data Release database.
 * `nearbyObjDist` = Distances to `nearbyObj`.
 * `nearbyExtObj` = Three extended `Objects` with lowest separations in the Data Release database.
 * `nearbyExtObjSep` = Separations of `nearbyExtObj`.

Whereas `nearbyObj` would be identified based on the on-sky two-dimensional distances using only the object centroids, the `nearbyExtObj` "separations" would be calculated with respect to the transient location using the second moments of each Object’s luminosity profile (as described in DMTN-151).

In this notebook, two options for measuring the "separation" between SNIa coordinates and a potential host galaxy are considered.

1. **Effective Kron separation.**  
The Kron radius for a galaxy is based on the first image moment (Kron 1980).
It is more typically used for flux measurements than for galaxy size and shape, as twice the Kron radius will capture >90% of the galaxy flux.
In the DP0.2 `Object` table, the Kron radius for a given filter, e.g., `r_kronRad`, is the square root of the product of the semi-major and semi-minor axes of the galaxy, and it is given in arcseconds.
The _effective_ Kron separation is the distance bewteen the SNIa and the galaxy barycenter, divided by the galaxy's Kron radius.
In this notebook we explore both the r- and i-band effective Kron separations.

2. **Effective elliptical separation.**
This is based on the adaptive second moments of the galaxy's luminosity profile.
In the DP0.2 `Object` table, the second moments are provided in the `shape_xx` ($Ixx$), `shape_yy` ($Iyy$), and `shape_xy` ($Ixy$) columns, in arcseconds.
Where $x$ and $y$ are RA and Declination, respectively, and $ \delta x = x_{SN}-x_{galaxy} $ and $ \delta y = y_{SN}-y_{galaxy} $ are in arcsec, the effective elliptical separation $R$ can be calculated with the equations below. See also <a href="https://sextractor.readthedocs.io/en/latest/Position.html#ellipse-parameters-cxx-cyy-cxy">this description and graphic of the ellipse parameters CXX, CYY, CXY in the Source Extractor documentation by E. Bertin</a>.
$$ C_{xx} = {I_{yy} \over {I_{xx}I_{yy} - I_{xy}^2}} $$ 
$$ C_{yy} = {I_{xx} \over {I_{xx}I_{yy} - I_{xy}^2}} $$ 
$$ C_{xy} = {-2 I_{xy} \over {I_{xx}I_{yy} - I_{xy}^2}} $$
$$ R^2 = C_{xx}(\delta x)^2 + C_{yy}(\delta y)^2 + C_{xy}(\delta x)(\delta y) $$

As we will see in Section 2 and 3, even a separation distance that accounts for a host galaxy's luminosity profile is not going to be sufficient for identifying the correct host galaxy 100% of the time.

As host association is more challenging at low-redshifts, in the future `DiaObjects` will also be cross-matched to a catalog of nearby low-redshift galaxies (e.g., NGC/IC or similar), with the following columns added to the `DiaObjects` table (as in Table 3 of the DPDD).
 * `nearbyLowzGal` = External catalog name of the nearest low-redshift galaxy.
 * `nearbyLowzGalSep` = Separation of `nearbyLowzGal`.
 
Using priors based on galaxy morphology and photometric-redshifts, combined with information about the SN type or brightness, can also improve host association algorithms.

SN host association is also an active area of ongoing research.
Two recent papers exploring host association options include:
 * DELIGHT: Deep Learning Identification of Galaxy Hosts of Transients using Multiresolution Images (<a href="https://ui.adsabs.harvard.edu/abs/2022AJ....164..195F/abstract">Förster et al. 2022</a>)
 * GHOST: Using Only Host Galaxy Information to Accurately Associate and Distinguish Supernovae (<a href="https://ui.adsabs.harvard.edu/abs/2021ApJ...908..170G/abstract">Gagliano et al. 2021</a>) 
 
<br>

**In this notebook.**

**Section 2.** Choose one true low-z SNIa with a large offset from its big bright host.
Explore the properties of its true host, and identify nearby objects using the measures above.
Visualize the environment by displaying the deepCoadd image and marking the SN, its host, and interloping galaxies.
 
**Section 3.** Retrieve a large sample of SNIa, calculate the properties of their hosts, and associate them with nearby objects using the measures above. Determine what fraction of true hosts are identified as the "nearest" galaxy, or in the top three, as a function of redshift.

**Appendix.** Steps through the process for identifying the one true low-z SNIa with a large offset from its big bright host, used in Section 2.
 
> **Important!** Keep in mind that the demonstration in this notebook is _very_ simple! In practice, when doing science with SNIa and their host galaxies, association algorithms incorporate SNIa and host galaxy information as priors, such as brightness, redshift, and morphology. 

### 1.1. Import packages

In [None]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import time
import gc

from astropy.wcs import WCS
from astropy.units import UnitsWarning
from astropy.coordinates import SkyCoord
from astropy.coordinates import match_coordinates_sky
import astropy.units as u

from lsst.rsp import get_tap_service
import lsst.afw.display as afwDisplay
from lsst.daf.butler import Butler
import lsst.geom as geom

### 1.2. Set global parameters and functions

In [None]:
butler = Butler('dp02', collections='2.2i/runs/DP0.2')
skymap = butler.get('skyMap')

In [None]:
service = get_tap_service()

In [None]:
pd.set_option('display.max_rows', 20)
afwDisplay.setDefaultBackend('matplotlib')

In [None]:
plot_filter_labels = ['u', 'g', 'r', 'i', 'z', 'y']
plot_filter_colors = {'u' : '#56b4e9', 'g' : '#008060', 'r' : '#ff4000',
                      'i' : '#850000', 'z' : '#6600cc', 'y' : '#000000'}
plot_filter_symbols = {'u' : 'o', 'g' : '^', 'r' : 'v', 
                       'i' : 's', 'z' : '*', 'y' : 'p'}

In [None]:
def cutout_coadd(butler, ra, dec, band='r', datasetType='deepCoadd',
                 skymap=None, cutoutSideLength=51, **kwargs):
    """
    Produce a cutout from a coadd at the given ra, dec position.

    Adapted from DC2 tutorial notebook by Michael Wood-Vasey.

    Parameters
    ----------
    butler: lsst.daf.persistence.Butler
        Servant providing access to a data repository
    ra: float
        Right ascension of the center of the cutout, in degrees
    dec: float
        Declination of the center of the cutout, in degrees
    band: string
        Filter of the image to load
    datasetType: string ['deepCoadd']
        Which type of coadd to load.  Doesn't support 'calexp'
    skymap: lsst.afw.skyMap.SkyMap [optional]
        Pass in to avoid the Butler read.  Useful if you have lots of them.
    cutoutSideLength: float [optional]
        Size of the cutout region in pixels.

    Returns
    -------
    MaskedImage
    """
    radec = geom.SpherePoint(ra, dec, geom.degrees)
    cutoutSize = geom.ExtentI(cutoutSideLength, cutoutSideLength)

    if skymap is None:
        skymap = butler.get("skyMap")

    # Look up the tract, patch for the RA, Dec
    tractInfo = skymap.findTract(radec)
    patchInfo = tractInfo.findPatch(radec)
    xy = geom.PointI(tractInfo.getWcs().skyToPixel(radec))
    bbox = geom.BoxI(xy - cutoutSize // 2, cutoutSize)
    patch = tractInfo.getSequentialPatchIndex(patchInfo)

    coaddId = {'tract': tractInfo.getId(), 'patch': patch, 'band': band}
    parameters = {'bbox': bbox}

    cutout_image = butler.get(datasetType, parameters=parameters,
                              dataId=coaddId)

    return cutout_image

## 2. Explore the true and potential hosts for one SNIa

In the Appendix below, low-z SNIa with large host offsets from big bright host galaxies are identified.

One is here selected to explore further -- one SNIa which has multiple other galaxies interloping between it and its true host.

This SNIa has `id_truth_type` = MS_9684_23_3.

Retrieve the `ra`, `dec`, and `host_galaxy` identifier for this SNIa from the `TruthSummary` table.

In [None]:
%%time

query = "SELECT id_truth_type, ra, dec, host_galaxy "\
        "FROM dp02_dc2_catalogs.TruthSummary "\
        "WHERE id_truth_type = 'MS_9684_23_3'"
results = service.search(query).to_table().to_pandas()

sn_ra = float(results.loc[0, 'ra'])
sn_dec = float(results.loc[0, 'dec'])
sn_coords = SkyCoord(ra=sn_ra*u.degree, dec=sn_dec*u.degree)
sn_host_id = int(results.loc[0, 'host_galaxy'])
del query, results

### 2.1. True host

Retrieve `objectId` for the true host from the `MatchesTruth` table.

In [None]:
%%time

query = "SELECT match_objectId "\
        "FROM dp02_dc2_catalogs.MatchesTruth "\
        "WHERE id = "+str(sn_host_id)
results = service.search(query).to_table().to_pandas()

sn_host_objectId = int(results.loc[0, 'match_objectId'])
del query, results

Retrieve information about the true host from the `Object` table.

The shape and Kron parameters are all in units of arcseconds.

In [None]:
%%time

query = "SELECT objectId, coord_ra, coord_dec, x, y, refExtendedness, "\
        "shape_xx, shape_xy, shape_yy, r_kronRad, i_kronRad "\
        "FROM dp02_dc2_catalogs.Object "\
        "WHERE objectId = "+str(sn_host_objectId)
Host = service.search(query).to_table().to_pandas()
del query

Print the difference in RA and Declination between the SNIa and its true host. (Note that the cos-dec term is not included here, these are not on-sky distances, they are just differences in coordinates).

In [None]:
print('delta-RA (SN-host): ', 3600.0 * (sn_ra - Host.loc[0, 'coord_ra']), ' arcsec')
print('delta-Dec (SN-host): ', 3600.0 * (sn_dec - Host.loc[0, 'coord_dec']), ' arcsec')

Calculate the 2-dimensional (scalar) offset between SNIa and true host, and add it to the `Host` table.

In [None]:
Host['sn_offset'] = np.zeros(1, dtype='float')
host_coords = SkyCoord(ra=float(Host.loc[0, 'coord_ra'])*u.degree, 
                       dec=float(Host.loc[0, 'coord_dec'])*u.degree)
Host.loc[0, 'sn_offset'] = sn_coords.separation(host_coords).arcsec
print('sn_offset = %10.6f arcsec' % Host.loc[0, 'sn_offset'])

Calculate the separation distance in Kron radii (in r- and i-band), and add it to the `Host` table.

In [None]:
Host['sn_kronSep_r'] = np.zeros(1, dtype='float')
Host.loc[0, 'sn_kronSep_r'] = Host.loc[0, 'sn_offset'] / Host.loc[0, 'r_kronRad']
Host['sn_kronSep_i'] = np.zeros(1, dtype='float')
Host.loc[0, 'sn_kronSep_i'] = Host.loc[0, 'sn_offset'] / Host.loc[0, 'i_kronRad']
print('sn_kronSep_r = %10.6f arcsec' % Host.loc[0, 'sn_kronSep_r'])
print('sn_kronSep_i = %10.6f arcsec' % Host.loc[0, 'sn_kronSep_i'])

Calculate the separation distance in elliptical radii, and add it to the `Host` table as `sn_R`.

In [None]:
temp = host_coords.spherical_offsets_to(sn_coords)
xr = 3600.0 * temp[0].deg
yr = 3600.0 * temp[1].deg
del temp
print('xr: %6.3f   yr: %6.3f   (%6.3f)' % (xr, yr, np.sqrt(xr**2 + yr**2)))

In [None]:
Ixx = Host.loc[0, 'shape_xx']
Iyy = Host.loc[0, 'shape_yy']
Ixy = Host.loc[0, 'shape_xy']
print('Ixx: %6.3f   Iyy: %6.3f   Ixy: %6.3f' % (Ixx, Iyy, Ixy))

In [None]:
Cxx = Iyy / ((Ixx * Iyy) - Ixy)
Cyy = Ixx / ((Ixx * Iyy) - Ixy)
Cxy = -2.0 * (Ixy) / ((Ixx * Iyy) - Ixy)
print('Cxx: %7.4f   Cyy: %7.4f   Cxy: %7.4f' % (Cxx, Cyy, Cxy))

In [None]:
Host['sn_R'] = np.zeros(1, dtype='float')
Host.loc[0, 'sn_R'] = np.sqrt((Cxx * xr**2) + (Cyy * yr**2) + (Cxy * xr * yr))
print('sn_R = ', Host.loc[0, 'sn_R'])

In [None]:
del xr, yr, Ixx, Iyy, Ixy, Cxx, Cyy, Cxy

In [None]:
Host

**Summary:** The true host, although the separation distance in effective Kron radii is <3, the separation distance in effective elliptical radii is almost 5. 

### 2.2. Potential hosts

Retrieve all `Objects` within 1 arcminute of the true SNIa.

As usual with the `Objects` catalog, impose that `detect_isPrimary` be true.

Only retrieve extended objects (i.e., reject point sources which are likely stars) by imposing that `refExtendedness` be =1.

In [None]:
%%time

str_snra = str(np.round(sn_ra, 6))
str_sndec = str(np.round(sn_dec, 6))
str_rdeg = str(np.round(60.0 / 3600.0, 6))
query = "SELECT objectId, coord_ra, coord_dec, x, y, refExtendedness, "\
        "shape_xx, shape_xy, shape_yy, r_kronRad, i_kronRad "\
        "FROM dp02_dc2_catalogs.Object "\
        "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "\
        "CIRCLE('ICRS', "+str_snra+", "+str_sndec+", "+str_rdeg+")) = 1 "\
        "AND detect_isPrimary = 1 AND refExtendedness = 1"
Obj = service.search(query).to_table().to_pandas()
del str_snra, str_sndec, str_rdeg, query

In [None]:
# Obj

Calculate the 2-dimensional (scalar) offset between SNIa and each `Object`.

In [None]:
Obj['sn_offset'] = np.zeros(len(Obj), dtype='float')
for i in range(len(Obj)):
    coord_obj = SkyCoord(ra=Obj.loc[i, 'coord_ra']*u.degree,
                         dec=Obj.loc[i, 'coord_dec']*u.degree)
    Obj.loc[i, 'sn_offset'] = sn_coords.separation(coord_obj).arcsec
    del coord_obj

Calculate the separation distance in Kron radii (in r- and i-band).

In [None]:
Obj['sn_kronSep_r'] = np.zeros(len(Obj), dtype='float')
Obj['sn_kronSep_i'] = np.zeros(len(Obj), dtype='float')
for i in range(len(Obj)):
    if np.isfinite(Obj['r_kronRad'][i]):
        Obj.loc[i, 'sn_kronSep_r'] = Obj['sn_offset'][i] / Obj['r_kronRad'][i]
    else:
        Obj.loc[i, 'sn_kronSep_r'] = 999.9
    if np.isfinite(Obj['i_kronRad'][i]):
        Obj.loc[i, 'sn_kronSep_i'] = Obj['sn_offset'][i] / Obj['i_kronRad'][i]
    else:
        Obj.loc[i, 'sn_kronSep_i'] = 999.9

Calculate the separation distance in elliptical radii.

In [None]:
%%time

Obj['sn_R'] = np.zeros(len(Obj), dtype='float')
for i in range(len(Obj)):
    obj_coords = SkyCoord(ra=Obj.loc[i, 'coord_ra']*u.degree,
                          dec=Obj.loc[i, 'coord_dec']*u.degree)
    temp = obj_coords.spherical_offsets_to(sn_coords)
    xr = 3600.0 * temp[0].deg
    yr = 3600.0 * temp[1].deg
    del temp, obj_coords
    Ixx = Obj.loc[i, 'shape_xx']
    Iyy = Obj.loc[i, 'shape_yy']
    Ixy = Obj.loc[i, 'shape_xy']
    Cxx = Iyy / ((Ixx * Iyy) - Ixy)
    Cyy = Ixx / ((Ixx * Iyy) - Ixy)
    Cxy = -2.0 * (Ixy) / ((Ixx * Iyy) - Ixy)
    Obj.loc[i, 'sn_R'] = np.sqrt((Cxx * xr**2) + (Cyy * yr**2) + (Cxy * xr * yr))
    del Ixx, Iyy, Ixy, Cxx, Cyy, Cxy

Plot distributions of host offset measures, with the corresponding value for the true host as a vertical line.

In [None]:
fig = plt.figure(figsize=(4, 2))
plt.hist(Obj.loc[:, 'sn_offset'], bins=50, histtype='step', color='grey', label='nearby objects')
plt.axvline(Host.loc[0, 'sn_offset'], color='dodgerblue', label='true host')
plt.xlabel('SN Offset [arcsec]')
plt.ylabel('# Nearby Objects')
plt.legend(loc='best')
plt.show()

In [None]:
fig = plt.figure(figsize=(4, 2))
tx = np.where(Obj.loc[:, 'sn_kronSep_r'] < 900.0)[0]
plt.hist(Obj.loc[tx, 'sn_kronSep_r'], bins=50, histtype='step', color='orange', label='r-band')
del tx
tx = np.where(Obj.loc[:, 'sn_kronSep_i'] < 900.0)[0]
plt.hist(Obj.loc[tx, 'sn_kronSep_i'], bins=50, histtype='step', color='brown', label='i-band')
del tx
plt.axvline(Host.loc[0, 'sn_kronSep_r'], color='dodgerblue', lw=3, alpha=0.3, label='true host (r)')
plt.axvline(Host.loc[0, 'sn_kronSep_i'], color='dodgerblue', lw=1, ls='dashed', label='true host (i)')
plt.xlabel('SN Separation in Kron Radii [arcsec]')
plt.ylabel('# Nearby Objects')
plt.legend(loc='upper right')
plt.show()

In [None]:
fig = plt.figure(figsize=(4, 2))
plt.hist(Obj.loc[:, 'sn_R'], bins=50, histtype='step', color='grey', label='nearby objects')
plt.axvline(Host.loc[0, 'sn_R'], color='dodgerblue', label='true host')
plt.xlabel('SN Offset in Elliptical Radii [arcsec]')
plt.ylabel('# Nearby Objects')
plt.legend(loc='best')
plt.show()

It is clear that for this SNIa, the true host is not in the top three by any measure. 

> **Important!** This is not a surprise! Remember that this particular SNIa was selected for exploration because it was expected to be a challenging case.

### 2.3. Visualize the environment

Create a cutout of the r-band `deepCoadd` at the location of the SNIa.

In [None]:
cutout = cutout_coadd(butler, sn_ra, sn_dec, band='r', cutoutSideLength=201)

Convert the SNIa coordinates into pixels.

In [None]:
sn_radec = geom.SpherePoint(sn_ra, sn_dec, geom.degrees)
wcs = cutout.getWcs()
sn_xy = geom.PointI(wcs.skyToPixel(sn_radec))

Display the cutout with a red plus for the SNIa location and a green circle for the host.

In [None]:
fig, ax = plt.subplots()
display = afwDisplay.Display(frame=fig)
display.scale('asinh', 'zscale')
display.mtv(cutout.image)
with display.Buffering():
    display.dot('+', sn_xy.getX(), sn_xy.getY(), ctype=afwDisplay.RED)
    display.dot('o', Host.loc[0, 'x'], Host.loc[0, 'y'], size=10, ctype='green')
plt.show()

Display the cutout as above, but add colored circles for the four `Objects` with the lowest effective Kron radii in the r-band (circles). (Using i-band identifies the same four `Objects`).

In [None]:
sx = np.argsort(Obj.loc[:, 'sn_kronSep_r'])

fig, ax = plt.subplots()
display = afwDisplay.Display(frame=fig)
display.scale('asinh', 'zscale')
display.mtv(cutout.image)
with display.Buffering():
    display.dot('+', sn_xy.getX(), sn_xy.getY(), ctype=afwDisplay.RED)
    display.dot('o', Host.loc[0, 'x'], Host.loc[0, 'y'], size=10, ctype='green')
    display.dot('o', Obj.loc[sx[0], 'x'], Obj.loc[sx[0], 'y'], size=4, ctype='yellow')        
    display.dot('o', Obj.loc[sx[1], 'x'], Obj.loc[sx[1], 'y'], size=4, ctype='orange')        
    display.dot('o', Obj.loc[sx[2], 'x'], Obj.loc[sx[2], 'y'], size=4, ctype='red')        
    display.dot('o', Obj.loc[sx[3], 'x'], Obj.loc[sx[3], 'y'], size=4, ctype='magenta')     
plt.show()
del sx

Display the cutout as above, but add colored circles for the four `Objects` with the lowest effective elliptical radii.

In [None]:
sx = np.argsort(Obj.loc[:, 'sn_R'])

fig, ax = plt.subplots()
display = afwDisplay.Display(frame=fig)
display.scale('asinh', 'zscale')
display.mtv(cutout.image)
with display.Buffering():
    display.dot('+', sn_xy.getX(), sn_xy.getY(), ctype=afwDisplay.RED)
    display.dot('o', Host.loc[0, 'x'], Host.loc[0, 'y'], size=10, ctype='green')
    display.dot('o', Obj.loc[sx[0], 'x'], Obj.loc[sx[0], 'y'], size=4, ctype='yellow')        
    display.dot('o', Obj.loc[sx[1], 'x'], Obj.loc[sx[1], 'y'], size=4, ctype='orange')        
    display.dot('o', Obj.loc[sx[2], 'x'], Obj.loc[sx[2], 'y'], size=4, ctype='red')        
    display.dot('o', Obj.loc[sx[3], 'x'], Obj.loc[sx[3], 'y'], size=4, ctype='magenta')        
plt.show()
del sx

In [None]:
del sn_ra, sn_dec, sn_coords, sn_host_id, sn_host_objectId
del Host, host_coords, Obj
del cutout, sn_radec, wcs, sn_xy

## 3. Evaluate the frequency of host association errors

Select a bunch of random true SNIa from the `TruthSummary` table.

In [None]:
%%time

query = "SELECT id_truth_type, id, ra, dec, truth_type, redshift, host_galaxy "\
        "FROM dp02_dc2_catalogs.TruthSummary "\
        "WHERE CONTAINS(POINT('ICRS', ra, dec), CIRCLE('ICRS', 57.5, -36.5, 2)) = 1 "\
        "AND truth_type = 3"
TrueSNIa = service.search(query).to_table().to_pandas()
del query

In [None]:
TrueSNIa

Use the `host_galaxy` column to retrieve the `Object` catalog identifiers from the `match_objectId` column from the `MatchesTruth` table.

In [None]:
tuple_string_hostId = '('
for i in range(len(TrueSNIa)):
    tuple_string_hostId += str(TrueSNIa.loc[i, 'host_galaxy'])
    if i < len(TrueSNIa)-1: tuple_string_hostId += ', '
    else: tuple_string_hostId += ')'

The following cell typically takes about a minute.

In [None]:
%%time

query = "SELECT id, match_objectId "\
        "FROM dp02_dc2_catalogs.MatchesTruth "\
        "WHERE id IN "+tuple_string_hostId+" "\
        "AND match_objectId >= 0"
TrueSNIaHosts = service.search(query).to_table().to_pandas()
del query

In [None]:
TrueSNIaHosts

Add the `match_objectId` to the `TrueSNIa` table as new column `host_objectId`.

In [None]:
%%time

TrueSNIa['host_objectId'] = np.zeros(len(TrueSNIa), dtype='int') - 1
for i in range(len(TrueSNIa)):
    tx = np.where(TrueSNIaHosts.loc[:, 'id'] == str(TrueSNIa.loc[i, 'host_galaxy']))[0]
    if len(tx) == 1:
        TrueSNIa.loc[i, 'host_objectId'] = TrueSNIaHosts.loc[tx[0], 'match_objectId']
    del tx

In [None]:
TrueSNIa

In [None]:
tx = np.where(TrueSNIa.loc[:, 'host_objectId'] >= 0)[0]
print('Number of TrueSNIa with a true host galaxy matched to the Object table: ', len(tx))
del tx

In [None]:
del TrueSNIaHosts

In [None]:
# TrueSNIa

Retrieve all the `Objects` that are in the same region as the true SNIa.

Impose that `detect_isPrimary` be true.

However, do not exclude objects that are point sources (i.e., do NOT require that `refExtendedness` be = 1).

This is a big query and it takes a few minutes.

In [None]:
%%time

query = "SELECT objectId, coord_ra, coord_dec, x, y, refExtendedness, "\
        "shape_xx, shape_xy, shape_yy, r_kronRad, i_kronRad "\
        "FROM dp02_dc2_catalogs.Object "\
        "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "\
        "CIRCLE('ICRS', 57.5, -36.5, 2)) = 1 "\
        "AND detect_isPrimary = 1"
Obj = service.search(query).to_table().to_pandas()

In [None]:
# Obj

### 3.1. Characterize all true hosts

Calculate the separations for the true hosts.

In [None]:
TrueSNIa['host_detected'] = np.zeros(len(TrueSNIa), dtype='int')
TrueSNIa['host_offset'] = np.zeros(len(TrueSNIa), dtype='float')
TrueSNIa['host_kronSep'] = np.zeros(len(TrueSNIa), dtype='float')
TrueSNIa['host_ellRad'] = np.zeros(len(TrueSNIa), dtype='float')

This cell takes a couple of minutes.

In [None]:
%%time

for i in range(len(TrueSNIa)):
    
    sn_coords = SkyCoord(ra=TrueSNIa.loc[i, 'ra']*u.degree, 
                         dec=TrueSNIa.loc[i, 'dec']*u.degree)
    
    if TrueSNIa.loc[i, 'host_objectId'] >= 0:
        cx = np.where(TrueSNIa.loc[i, 'host_objectId'] == Obj.loc[:, 'objectId'])[0]
        if len(cx) == 1:
            TrueSNIa.loc[i, 'host_detected'] = 1
            
            host_coords = SkyCoord(ra=Obj.loc[cx[0], 'coord_ra']*u.degree,
                                   dec=Obj.loc[cx[0], 'coord_dec']*u.degree)
            
            offset = float(sn_coords.separation(host_coords).arcsec)
            kronRad = float(Obj.loc[cx[0], 'r_kronRad'])
            
            TrueSNIa.loc[i, 'host_offset'] = offset
            TrueSNIa.loc[i, 'host_kronSep'] = offset / kronRad
            
            Ixx = float(Obj.loc[cx[0], 'shape_xx'])
            Iyy = float(Obj.loc[cx[0], 'shape_yy'])
            Ixy = float(Obj.loc[cx[0], 'shape_xy'])
            
            temp = host_coords.spherical_offsets_to(sn_coords)
            xr = 3600.0 * temp[0].deg
            yr = 3600.0 * temp[1].deg
            Cxx = Iyy / ((Ixx * Iyy) - Ixy)
            Cyy = Ixx / ((Ixx * Iyy) - Ixy)
            Cxy = -2.0 * (Ixy) / ((Ixx * Iyy) - Ixy)
            TrueSNIa.loc[i, 'host_ellRad'] = np.sqrt((Cxx * xr**2) + (Cyy * yr**2) + (Cxy * xr * yr))
            del Cxx, Cyy, Cxy, xr, yr, temp
            
            del Ixx, Iyy, Ixy, kronRad, offset, host_coords
        del cx
    del sn_coords

In [None]:
TrueSNIa

Plot the distributions of true host offsets.

In [None]:
tx = np.where(TrueSNIa.loc[:, 'host_detected'] >= 1)[0]

In [None]:
fig = plt.figure(figsize=(4, 2))
plt.hist(np.log10(TrueSNIa.loc[tx, 'host_offset']), bins=50, histtype='step', log=True, color='grey')
plt.xlabel('log10(Host Offset) [arcsec]')
plt.ylabel('# SNIa')
plt.show()

In [None]:
fig = plt.figure(figsize=(4, 2))
plt.hist(TrueSNIa.loc[tx, 'host_kronSep'], bins=50, histtype='step', log=True, color='orange')
plt.xlabel('Host Effective Kron Radii [arcsec]')
plt.ylabel('# SNIa')
plt.show()

In [None]:
fig = plt.figure(figsize=(4, 2))
plt.hist(TrueSNIa.loc[tx, 'host_ellRad'], bins=50, histtype='step', log=True, color='orange')
plt.xlabel('Host Effective Elliptical Radii [arcsec]')
plt.ylabel('# SNIa')
plt.show()

In [None]:
del tx

### 3.2. Characterize all associated Objects (potential hosts)

For each true SNIa that has a true host which was matched to the `Object` catalog (i.e., `TrueSNIa['host_objectId']` >= 0), identify whether the true host is the most nearby or in the top three nearby galaxies, via the three separation distances.

The following are = 1 if the true host is the `Obj` with the lowest:
 * `host_nearest_offset` : two-dimensional sky distance
 * `host_nearest_kronSep` : r-band _effective_ Kron radius (Kron separation)
 * `host_nearest_ellRad` : _effective_ elliptical radius (based on second moments)

The following are = 1 if the true host is within the top three `Obj` with the lowest:
 * `host_topthree_offset` : two-dimensional sky distance
 * `host_topthree_kronSep` : r-band _effective_ Kron radius (Kron separation)
 * `host_topthree_ellRad` : _effective_ elliptical radius (based on second moments)
 
The above columns are = 0 if the true host was matched to the `Object` catalog, but wasn't the nearest or in the top three, and they are = -1 if the true host was not matched to the `Object` catalog.

In [None]:
TrueSNIa['host_nearest_offset'] = np.zeros(len(TrueSNIa), dtype='int') - 1
TrueSNIa['host_nearest_kronSep'] = np.zeros(len(TrueSNIa), dtype='int') - 1
TrueSNIa['host_nearest_ellRad'] = np.zeros(len(TrueSNIa), dtype='int') - 1
TrueSNIa['host_topthree_offset'] = np.zeros(len(TrueSNIa), dtype='int') - 1
TrueSNIa['host_topthree_kronSep'] = np.zeros(len(TrueSNIa), dtype='int') - 1
TrueSNIa['host_topthree_ellRad'] = np.zeros(len(TrueSNIa), dtype='int') - 1

In [None]:
tx = np.where(TrueSNIa.loc[i, 'host_objectId'] >= 0)[0]
TrueSNIa.loc[tx, 'host_nearest_offset'] = int(0)
TrueSNIa.loc[tx, 'host_nearest_kronSep'] = int(0)
TrueSNIa.loc[tx, 'host_nearest_ellRad'] = int(0)
TrueSNIa.loc[tx, 'host_topthree_offset'] = int(0)
TrueSNIa.loc[tx, 'host_topthree_kronSep'] = int(0)
TrueSNIa.loc[tx, 'host_topthree_ellRad'] = int(0)
del tx

In [None]:
Obj['Cxx'] = Obj['shape_yy'] / \
            ((Obj['shape_xx'] * Obj['shape_yy']) - Obj['shape_xy'])
Obj['Cyy'] = Obj['shape_xx'] / \
            ((Obj['shape_xx'] * Obj['shape_yy']) - Obj['shape_xy'])
Obj['Cxy'] = -2.0 * Obj['shape_xy'] / \
            ((Obj['shape_xx'] * Obj['shape_yy']) - Obj['shape_xy'])

This computation is intense, and can take a while.

Use the `temp_N` to only do it for a subset of the SNIa.

For 5000 SNIa, it takes about 4 minutes to compute.

In [None]:
temp_N = 5000

In [None]:
%%time

TrueSNIa['host_assoc_attempt'] = np.zeros(len(TrueSNIa), dtype='int')

for i in range(temp_N):

    if TrueSNIa.loc[i, 'host_objectId'] >= 0:
        temp_hostId = TrueSNIa.loc[i, 'host_objectId']
        sn_coords = SkyCoord(ra=TrueSNIa.loc[i, 'ra']*u.degree, dec=TrueSNIa.loc[i, 'dec']*u.degree)
        
        temp_limit = 10.0 / 3600.0
        if TrueSNIa.loc[i, 'host_offset'] > 1.0:
            temp_limit = 10.0 * TrueSNIa.loc[i, 'host_offset'] / 3600.0
        tx = np.where((np.abs(Obj.loc[:, 'coord_ra'] - TrueSNIa.loc[i, 'ra']) < temp_limit)
                     & (np.abs(Obj.loc[:, 'coord_dec'] - TrueSNIa.loc[i, 'dec']) < temp_limit))[0]
        
        if len(tx) < 3:
            temp_limit = 90.0 / 3600.0
            tx = np.where((np.abs(Obj.loc[:, 'coord_ra'] - TrueSNIa.loc[i, 'ra']) < temp_limit)
                         & (np.abs(Obj.loc[:, 'coord_dec'] - TrueSNIa.loc[i, 'dec']) < temp_limit))[0]
                    
        if len(tx) > 3:
            TrueSNIa.loc[i, 'host_assoc_attempt'] = 1

            coords = SkyCoord(ra=Obj.loc[tx, 'coord_ra'].values*u.degree,
                              dec=Obj.loc[tx, 'coord_dec'].values*u.degree)
            offsets = np.asarray(sn_coords.separation(coords).arcsec, dtype='float')    
            kronSeps = np.asarray(offsets / Obj.loc[tx, 'r_kronRad'].values, dtype='float')
            temps = coords.spherical_offsets_to(sn_coords)
            xr = 3600.0 * temps[0].deg
            yr = 3600.0 * temps[1].deg
            ellRads = np.sqrt((Obj.loc[tx, 'Cxx'].values * xr**2) + \
                              (Obj.loc[tx, 'Cyy'].values * yr**2) + \
                              (Obj.loc[tx, 'Cxy'].values * xr * yr))
            
            sx1 = np.argsort(offsets)
            sx2 = np.argsort(kronSeps)
            sx3 = np.argsort(ellRads)
                                    
            if (Obj.loc[tx[sx1[0]], 'objectId'] == temp_hostId) \
            | (Obj.loc[tx[sx1[1]], 'objectId'] == temp_hostId) \
            | (Obj.loc[tx[sx1[2]], 'objectId'] == temp_hostId):
                TrueSNIa.loc[i, 'host_topthree_offset'] = 1
            if (Obj.loc[tx[sx1[0]], 'objectId'] == temp_hostId):
                TrueSNIa.loc[i, 'host_nearest_offset'] = 1
                
            if (Obj.loc[tx[sx2[0]], 'objectId'] == temp_hostId) \
            | (Obj.loc[tx[sx2[1]], 'objectId'] == temp_hostId) \
            | (Obj.loc[tx[sx2[2]], 'objectId'] == temp_hostId):
                TrueSNIa.loc[i, 'host_topthree_kronSep'] = 1
            if (Obj.loc[tx[sx2[0]], 'objectId'] == temp_hostId):
                TrueSNIa.loc[i, 'host_nearest_kronSep'] = 1
            
            if (Obj.loc[tx[sx3[0]], 'objectId'] == temp_hostId) \
            | (Obj.loc[tx[sx3[1]], 'objectId'] == temp_hostId) \
            | (Obj.loc[tx[sx3[2]], 'objectId'] == temp_hostId):
                TrueSNIa.loc[i, 'host_topthree_ellRad'] = 1
            if (Obj.loc[tx[sx3[0]], 'objectId'] == temp_hostId):
                TrueSNIa.loc[i, 'host_nearest_ellRad'] = 1
                
            del coords, offsets, kronSeps, temps, xr, yr, ellRads
            del sx1, sx2, sx3
                
        del temp_limit, sn_coords, tx, temp_hostId

In [None]:
# TrueSNIa

Show for how many true SNIa was the nearest host association attempted.

In [None]:
tx = np.where(TrueSNIa['host_assoc_attempt'] > 0)[0]
print(len(tx))
del tx

### 3.3. Fraction of true hosts associated with their SNIa

Plot the number of true SNIa, as a function of redshift, was the true host the nearest, or in the top three, by each of the separation measures.

In [None]:
tx = np.where((TrueSNIa.loc[:, 'host_assoc_attempt'] > 0))[0]
tx1 = np.where(TrueSNIa.loc[tx, 'host_nearest_offset'] == 1)[0]
tx2 = np.where(TrueSNIa.loc[tx, 'host_topthree_offset'] == 1)[0]
tx3 = np.where(TrueSNIa.loc[tx, 'host_nearest_kronSep'] == 1)[0]
tx4 = np.where(TrueSNIa.loc[tx, 'host_topthree_kronSep'] == 1)[0]
tx5 = np.where(TrueSNIa.loc[tx, 'host_nearest_ellRad'] == 1)[0]
tx6 = np.where(TrueSNIa.loc[tx, 'host_topthree_ellRad'] == 1)[0]

In [None]:
fig = plt.figure(figsize=(5, 3))
plt.hist(TrueSNIa.loc[tx, 'redshift'], bins=20, histtype='step', 
         log=False, color='grey', label='total')
plt.hist(TrueSNIa.loc[tx[tx1], 'redshift'], bins=20, histtype='step', 
         log=False, color='dodgerblue', label='offset')
plt.hist(TrueSNIa.loc[tx[tx3], 'redshift'], bins=20, histtype='step', 
         log=False, color='darkorange', label='kron radii')
plt.hist(TrueSNIa.loc[tx[tx5], 'redshift'], bins=20, histtype='step', 
         log=False, color='brown', label='elliptical radii')
plt.xlabel('Redshift')
plt.ylabel('# SNIa')
plt.title('The True Host was the Nearest Object')
plt.legend(loc='upper left')
plt.show()

In [None]:
fig = plt.figure(figsize=(5, 3))
plt.hist(TrueSNIa.loc[tx, 'redshift'], bins=20, histtype='step', 
         log=False, color='grey', label='total')
plt.hist(TrueSNIa.loc[tx[tx2], 'redshift'], bins=20, histtype='step', 
         log=False, color='dodgerblue', label='offset')
plt.hist(TrueSNIa.loc[tx[tx4], 'redshift'], bins=20, histtype='step', 
         log=False, color='darkorange', label='kron radii')
plt.hist(TrueSNIa.loc[tx[tx6], 'redshift'], bins=20, histtype='step', 
         log=False, color='brown', label='elliptical radii')
plt.xlabel('Redshift')
plt.ylabel('# SNIa')
plt.title('The True Host was in the Top Three')
plt.legend(loc='upper left')
plt.show()

In [None]:
del tx, tx1, tx2, tx3, tx4, tx5, tx6

Print out the fraction of SNIa for which the true host was the nearest, or in the top three, by redshift bin.

In [None]:
zbins = [[0.0, 0.2], [0.2, 0.4], [0.4, 0.6], [0.6, 0.8], [0.8, 1.0]]

print('          offset              kron radii           elliptical radii')
print('z1  z2    nearest  topthree   nearest  topthree    nearest topthree')
for zbin in zbins:
    tx = np.where((TrueSNIa.loc[:, 'redshift'] >= zbin[0]) &\
                  (TrueSNIa.loc[:, 'redshift'] < zbin[1]) &\
                  (TrueSNIa.loc[:, 'host_assoc_attempt'] > 0))[0]
    tx1 = np.where(TrueSNIa.loc[tx, 'host_nearest_offset'] == 1)[0]
    tx2 = np.where(TrueSNIa.loc[tx, 'host_topthree_offset'] == 1)[0]
    tx3 = np.where(TrueSNIa.loc[tx, 'host_nearest_kronSep'] == 1)[0]
    tx4 = np.where(TrueSNIa.loc[tx, 'host_topthree_kronSep'] == 1)[0]
    tx5 = np.where(TrueSNIa.loc[tx, 'host_nearest_ellRad'] == 1)[0]
    tx6 = np.where(TrueSNIa.loc[tx, 'host_topthree_ellRad'] == 1)[0]
    
    print('%3.1f %3.1f   %4.2f     %4.2f       %4.2f     %4.2f        %4.2f     %4.2f' % \
         (zbin[0], zbin[1], \
          len(tx1)/len(tx), len(tx2)/len(tx), \
          len(tx3)/len(tx), len(tx4)/len(tx), \
          len(tx5)/len(tx), len(tx6)/len(tx)))

When I originally ran Section 4 with `temp_N = 5000`, the above table values were as follows.

> **Important!** Although the above demonstration shows that 20-30% of the low-redshift SNIa do not have their true host identified as the nearest galaxy by these three measures, the true host _is_ included in the top three at least 90% of the time.

> And, the above demonstration is very simple! In practice, when doing science with SNIa and their host galaxies, association algorithms incorporate SNIa and host galaxy information as priors, such as brightness, redshift, morphology. This notebook has not tried to do anything like that.

In [None]:
del TrueSNIa, Obj

<br>
<br>

# Appendix

## A. Find a low-z SNIa with a large offset from a big bright host

Start with a random bunch of true SNIa with z<0.2 near the center of the DC2 simulation area.

In [None]:
%%time

query = "SELECT id_truth_type, id, ra, dec, truth_type, redshift, host_galaxy "\
        "FROM dp02_dc2_catalogs.TruthSummary "\
        "WHERE CONTAINS(POINT('ICRS', ra, dec), CIRCLE('ICRS', 57.5, -36.5, 2)) = 1 "\
        "AND truth_type = 3 AND redshift < 0.2"
TrueSNIa = service.search(query).to_table().to_pandas()
del query

In [None]:
# TrueSNIa

Make a tuple-formatted string of the `host_galaxy` column, which is the `id` column in both the `TruthSummary` and `MatchesTruth` tables.

In [None]:
tuple_string_hostId = '('
for i in range(len(TrueSNIa)):
    tuple_string_hostId += str(TrueSNIa.loc[i, 'host_galaxy'])
    if i < len(TrueSNIa)-1: tuple_string_hostId += ', '
    else: tuple_string_hostId += ')'

In [None]:
# print(tuple_string_hostId)

Use the `MatchesTruth` table to retrieve the `objectId` for the hosts.

In [None]:
%%time

query = "SELECT id, truth_type, id_truth_type, match_objectId "\
        "FROM dp02_dc2_catalogs.MatchesTruth "\
        "WHERE id IN "+tuple_string_hostId
TrueSNIaHosts = service.search(query).to_table().to_pandas()
del query

In [None]:
# TrueSNIaHosts

Make a tuple-formatted string of the `match_objectId` column. 

Notice that one of the hosts has a `match_objectId` = `<NA>`, and skip that one.

In [None]:
tuple_string_objectId = '('
for i in range(len(TrueSNIaHosts)):
    if str(TrueSNIaHosts.loc[i, 'match_objectId']) != '<NA>':
        tuple_string_objectId += str(TrueSNIaHosts.loc[i, 'match_objectId'])
        if i < len(TrueSNIaHosts)-1: tuple_string_objectId += ', '
        else: tuple_string_objectId += ')'

In [None]:
# print(tuple_string_objectId)

Retrieve measurements for the host galaxies from the `Object` table.

In [None]:
%%time

query = "SELECT objectId, coord_ra, coord_dec, footprintArea, "\
        "scisql_nanojanskyToAbMag(r_cModelFlux) as r_cModelMag "\
        "FROM dp02_dc2_catalogs.Object "\
        "WHERE objectId IN "+tuple_string_objectId
Obj = service.search(query).to_table().to_pandas()
del query

In [None]:
# Obj

Add the host information to the `TrueSNIa` table.

In [None]:
TrueSNIa['host_objectId'] = np.zeros(len(TrueSNIa), dtype='int')
TrueSNIa['host_ra'] = np.zeros(len(TrueSNIa), dtype='float')
TrueSNIa['host_dec'] = np.zeros(len(TrueSNIa), dtype='float')
TrueSNIa['host_footprintArea'] = np.zeros(len(TrueSNIa), dtype='float')
TrueSNIa['host_r_cModelMag'] = np.zeros(len(TrueSNIa), dtype='float')
TrueSNIa['host_offset'] = np.zeros(len(TrueSNIa), dtype='float')

for i in range(len(TrueSNIa)):
    str_id = str(TrueSNIa.loc[i, 'host_galaxy'])
    tx = np.where(str_id == TrueSNIaHosts.loc[:, 'id'])[0]
    if len(tx) == 1:
        tx2 = np.where(TrueSNIaHosts.loc[tx[0], 'match_objectId'] == Obj.loc[:, 'objectId'])[0]
        if len(tx2) == 1:
            TrueSNIa.loc[i, 'host_objectId'] = Obj.loc[tx2[0], 'objectId']
            TrueSNIa.loc[i, 'host_ra'] = Obj.loc[tx2[0], 'coord_ra']
            TrueSNIa.loc[i, 'host_dec'] = Obj.loc[tx2[0], 'coord_dec']
            TrueSNIa.loc[i, 'host_footprintArea'] = Obj.loc[tx2[0], 'footprintArea']
            TrueSNIa.loc[i, 'host_r_cModelMag'] = Obj.loc[tx2[0], 'r_cModelMag']
            coord_SNIa = SkyCoord(ra=TrueSNIa.loc[i, 'ra']*u.degree, 
                                  dec=TrueSNIa.loc[i, 'dec']*u.degree)
            coord_host = SkyCoord(ra=Obj.loc[tx2[0], 'coord_ra']*u.degree, 
                                  dec=Obj.loc[tx2[0], 'coord_dec']*u.degree)
            offset = coord_SNIa.separation(coord_host)
            TrueSNIa.loc[i, 'host_offset'] = offset.arcsec
            del coord_SNIa, coord_host, offset
        del tx2
    del tx

In [None]:
# TrueSNIa

Plot the host galaxy brightness vs. footprint area.

In [None]:
fig = plt.figure(figsize=(4, 2))
tx = np.where((TrueSNIa.loc[:, 'host_footprintArea'] > 1.0) 
              & (TrueSNIa.loc[:, 'host_r_cModelMag'] > 1.0))[0]
plt.plot(np.log10(TrueSNIa.loc[tx, 'host_footprintArea']), 
         TrueSNIa.loc[tx, 'host_r_cModelMag'], 'o', alpha=0.3)
plt.xlabel('log10(host_footprintArea)')
plt.ylabel('host_r_cModelMag')
plt.ylim([22,14])
plt.show()
del tx

Plot the host galaxy redshift vs. SNIa offset.

In [None]:
fig = plt.figure(figsize=(4, 2))
tx = np.where(TrueSNIa.loc[:, 'host_offset'] > 0.001)[0]
plt.plot(TrueSNIa.loc[tx, 'redshift'], 
         np.log10(TrueSNIa.loc[tx, 'host_offset']), 'o', alpha=0.3)
plt.xlabel('redshift')
plt.ylabel('log10(host_offset)')
plt.show()
del tx

Select SNIa that are well-offset from their large bright (but not too bright) host.

In [None]:
tx = np.where((TrueSNIa.loc[:, 'host_footprintArea'] > 25000)
              & (TrueSNIa.loc[:, 'host_r_cModelMag'] < 18)
              & (TrueSNIa.loc[:, 'host_r_cModelMag'] > 17)
              & (TrueSNIa.loc[:, 'host_offset'] > 10.0))[0]

for x in tx:
    str_snra = str(np.round(float(TrueSNIa.loc[x, 'ra']), 6))
    str_sndec = str(np.round(float(TrueSNIa.loc[x, 'dec']), 6))
    str_rdeg = str(np.round(float(TrueSNIa.loc[x, 'host_offset'] / 3600.0), 6))
    query = "SELECT objectId "\
            "FROM dp02_dc2_catalogs.Object "\
            "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "\
            "CIRCLE('ICRS', "+str_snra+", "+str_sndec+", "+str_rdeg+")) = 1 "\
            "AND detect_isPrimary = 1 AND refExtendedness = 1"
    results = service.search(query)
    del query
    print('%3i %13s %8.6f %7.1f %5.2f %5.2f %3i' % 
          (x, TrueSNIa.loc[x, 'id_truth_type'], TrueSNIa.loc[x, 'redshift'], 
           TrueSNIa.loc[x, 'host_footprintArea'], TrueSNIa.loc[x, 'host_r_cModelMag'],
           TrueSNIa.loc[x, 'host_offset'], len(results)))
    del results, str_snra, str_sndec, str_rdeg

del tx

Let's choose to explore SNIa MS_9684_23_3 in Section 2 above.

There are 16 other extended (non point source) objects between it and its host.

Clean up the arrays that are not needed anymore.

In [None]:
del TrueSNIa, TrueSNIaHosts, Obj