<b>DiaObject / TruthSummary Catalog Matching</b> <br>
Contact author: Douglas Tucker <br>
Last verified to run: 2022-12-08<br>
LSST Science Piplines version: Weekly 2022_40 <br>
Container Size: medium <br>

**Description:** This notebook was prompted by a discussion in the Community Forum concerning the identification of candidate SNe Ia as in the DP0.2 data set as described in the DP0.2 Tutorial Notebook <a href="https://github.com/rubin-dp0/tutorial-notebooks/blob/main/07a_DiaObject_Samples.ipynb">07a_DiaObject_Samples.ipynb</a>.  Here, we use the method described in that notebook to identify candidate SNe Ia from the DP0.2 DiaObject and DiaSource tables, match the resulting catalog of candidate DP0.2 SNe Ia to the TruthSummary table that is a basis of the original simulated data, and then analyze the results.

Basically, we find that c. 95% of the candidate SNe Ia identified in DP0.2 are in the TruthSummary table.

We also look at the inverse problem:  how many of the variable objects (variable stars and SNe Ia) from the TruthSummary table are detected in the DiaObject table?  Here we find the not-unexpected result that many of the variable sources in the much deeper TruthSummary table are too faint to be detected by the DIA pipeline.

For the catalog matching step, we make use of the astropy function `match_coordinates_sky`, a fast, KD-tree-based method for matching two catalogs.  For those interested just in the catalog-matching step, please see Sections 2.3 and 3.3 below.

**Credit:** The material for identifying candidate SNeIa from the DiaObject and DiaSource tables was developed by Melissa Graham for the DP0.2 Tutorial Notebook <a href="https://github.com/rubin-dp0/tutorial-notebooks/blob/main/07a_DiaObject_Samples.ipynb">07a_DiaObject_Samples.ipynb</a>.

## 1. Setup

Import packages.

In [None]:
import time
from IPython.display import Image

import math
from astropy.units import UnitsWarning
from astropy.coordinates import SkyCoord
from astropy.coordinates import match_coordinates_sky
import astropy.units as u
from astropy.table import Table
import pandas as pd
import numpy
import matplotlib.pyplot as plt

from lsst.rsp import get_tap_service

from astropy.cosmology import FlatLambdaCDM

Set plotting parameters.

In [None]:
plt.style.use('tableau-colorblind10')

Set the cosmology to use with the astropy.cosmology package.

In [None]:
cosmo = FlatLambdaCDM(H0=70, Om0=0.3)

Set identification parameters for candidate 0.1 < z 0.3 SNeIa, per DP0.2 Tutorial Notebook <a href="https://github.com/rubin-dp0/tutorial-notebooks/blob/main/07a_DiaObject_Samples.ipynb">07a_DiaObject_Samples.ipynb</a>.

In [None]:
redshift_min = 0.1
redshift_max = 0.3

snia_peak_mag = -19.0
snia_peak_mag_range = 0.5

snia_peak_mr_min = cosmo.distmod(redshift_min).value + snia_peak_mag - snia_peak_mag_range
snia_peak_mr_max = cosmo.distmod(redshift_max).value + snia_peak_mag + snia_peak_mag_range

snia_peak_mg_max = 24.0
snia_peak_mi_max = 24.0

snia_ampl_mr_min = 1.5
snia_ampl_mr_max = 5.5

snia_nDiaSources_min = 15
snia_nDiaSources_max = 100

snia_duration_min = 50
snia_duration_max = 300

Start the TAP service.

In [None]:
service = get_tap_service()

## 2.  Do Candidate SNe Ia from the DiaObject table have suitable matches in the TruthSummary table?

### 2.1. Retrieve a sample of potentially SNIa-like DiaObjects


Again, this follows the methodolody of the DP0.2 Tutorial Notebook <a href="https://github.com/rubin-dp0/tutorial-notebooks/blob/main/07a_DiaObject_Samples.ipynb">07a_DiaObject_Samples.ipynb</a>, except we are **not** retricting the query to the first 1000 entries returned.

The following cell typically takes much less than a minute.

In [None]:
%%time

results = service.search("SELECT ra, decl, diaObjectId, nDiaSources, "
                         "scisql_nanojanskyToAbMag(rPSFluxMin) AS rMagMax, "
                         "scisql_nanojanskyToAbMag(rPSFluxMax) AS rMagMin, "
                         "scisql_nanojanskyToAbMag(gPSFluxMax) AS gMagMin, "
                         "scisql_nanojanskyToAbMag(iPSFluxMax) AS iMagMin, "
                         "scisql_nanojanskyToAbMag(rPSFluxMin)"
                         " - scisql_nanojanskyToAbMag(rPSFluxMax) AS rMagAmp "
                         "FROM dp02_dc2_catalogs.DiaObject "
                         "WHERE nDiaSources > "+str(snia_nDiaSources_min)+" "
                         "AND nDiaSources < "+str(snia_nDiaSources_max)+" "
                         "AND scisql_nanojanskyToAbMag(rPSFluxMax) > "+str(snia_peak_mr_min)+" "
                         "AND scisql_nanojanskyToAbMag(rPSFluxMax) < "+str(snia_peak_mr_max)+" "
                         "AND scisql_nanojanskyToAbMag(gPSFluxMax) < "+str(snia_peak_mg_max)+" "
                         "AND scisql_nanojanskyToAbMag(iPSFluxMax) < "+str(snia_peak_mi_max)+" "
                         "AND scisql_nanojanskyToAbMag(rPSFluxMin)"
                         " - scisql_nanojanskyToAbMag(rPSFluxMax) < "+str(snia_ampl_mr_max)+" "
                         "AND scisql_nanojanskyToAbMag(rPSFluxMin)"
                         " - scisql_nanojanskyToAbMag(rPSFluxMax) > "+str(snia_ampl_mr_min)+" ")

DiaObjs = results.to_table()
del results

Note that we found 6570 DiaObjects over the entire DP0.2 footprint that meet the above criteria.

In [None]:
DiaObjs

### 2.1. Calculate lightcurve duration and identify potential SNIa

The lightcurve duration -- time between the first and last detected DiaSource in any filter -- is not included in the DiaObject table.
It is calculated below, using all of the DiaSources for each DiaObject.

Time is reported in the DiaSource table as `midPointTai`, which is in the SI unit of "TAI" (<a href="https://en.wikipedia.org/wiki/International_Atomic_Time">International Atomic Time</a>), and is presented in days (in particular, as "<a href="https://en.wikipedia.org/wiki/Julian_day">Modified Julian Days</a>").

Here we deviate slightly from the current (2022-10-25) version of <a href="https://github.com/rubin-dp0/tutorial-notebooks/blob/main/07a_DiaObject_Samples.ipynb">07a_DiaObject_Samples.ipynb</a>, using a combination of the ADQL `WHERE IN` command plus some `pandas`-based manipulations to achieve the same results.  (Kudos to Fritz Mueller of the Data Management Team who pointed out the speedy `WHERE IN` command in response to the <a href="https://community.lsst.org/t/query-objects-providing-a-list-of-ids/7256">Community Forum Post 7256</a>.)

First, create a list of diaObjectIds:

In [None]:
diaObjectId_list = DiaObjs['diaObjectId'].tolist()

Next, convert this python list into a string containing the comma-separated list of diaObjectId's:

In [None]:
diaObjectId_list_str = ','.join(map(str,diaObjectId_list))

Now we can query for the `midPointTai` timestamp for all the DiaSources associated with the DiaObject.  This cell typically takes about 1 minute to run.

In [None]:
%%time

query = """
        SELECT diaObjectId, midPointTai 
        FROM dp02_dc2_catalogs.DiaSource 
        WHERE diaObjectId IN (%s)
        """ % (diaObjectId_list_str)

#print(query)

results = service.search(query)

df_tmp = results.to_table().to_pandas()

del results

_Optional:_  Take a quick look at the results from this query.  There are typically many `midPointTai`'s -- one per DiaSource for that DiaObject -- for each `diaObjectId`:

In [None]:
#df_tmp

We can use the pandas `groupby` function to find the earliest and latest `midPointTai` associated with each diaObject.  We can either estimate the duration for each diaObject by subtracting the minimum `midPointTai` from the maximum `midPointTai` for that diaObject, or by using the numpy `ptp` function to find the range of `midPointTai`'s for each diaObject.

In [None]:
df_tmp1 = df_tmp.groupby("diaObjectId")['midPointTai'].agg(["min", "max", numpy.ptp])
df_tmp1['duration'] = df_tmp1['max'] - df_tmp1['min']
df_tmp1.reset_index(inplace=True)

_Optional:_  Take a quick look at the contents of df_tmp1:

In [None]:
#df_tmp1

Let's convert the DiaObjs Table to a pandas DataFrame:

In [None]:
df_DiaObjs = DiaObjs.to_pandas()

We will now merge the `df_DiaObjs` DataFrame with the the `df_tmp1` DataFrame by their `diaObjectId`'s.  Now, we have all the info we need for each diaObject. Note that the contents of the `ptp` column are the same as the manually generated contents of the `duration` column.

In [None]:
df_DiaObjs_new = pd.merge(df_DiaObjs, df_tmp1, on='diaObjectId', how='inner')

_Optional:_  Take a quick look at the contents of df_DiaObjs_new:

In [None]:
#df_DiaObjs_new

Now, let us select only DiaObjects in the `df_DiaObjs_new` DataFrame that have lightcurve durations within the specified range for SNIa.  Note, there are 331 candidate SNeIa that meet these criteria over the whole area and time period of the DP0.2 (simulated) observations.

**These 331 candidate SNe Ia compose the sample we wish to match with the TruthSummary table.**

In [None]:
mask = (df_DiaObjs_new['duration'] > snia_duration_min) & (df_DiaObjs_new['duration'] < snia_duration_max)
df_DiaObjs_cand_SNeIa = df_DiaObjs_new[mask].copy(deep=True)
df_DiaObjs_cand_SNeIa.reset_index(inplace=True, drop=True)
df_DiaObjs_cand_SNeIa

### 2.2. Retrieve all variable stars and SNe Ia from the TruthSummary table.

Many of these potential SNe Ia will not be SNe Ia.  For example, many will be variable stars.  We will match our list of potential SNe Ia with both SNe Ia *and* variable stars from the TruthSummary table.

First, we will download the id, ra, dec, is_variable, mag_r, redshift, and truth_type for all stars and SNe Ia.

The following cell can take several minutes to run...

In [None]:
%%time

query = """SELECT id, ra, dec, is_pointsource, is_variable, mag_r, redshift, truth_type 
           FROM dp02_dc2_catalogs.TruthSummary 
           WHERE 
           truth_type=2 OR truth_type=3 
        """

print(query)

results = service.search(query)
df_ts23 = results.to_table().to_pandas()
del results

df_ts23

### 2.3. Match the catalog of candidate SNe Ia from the DiaObject/DiaSource tables with the catalog of SNe Ia from the TruthSummary table.


We'll match the two DataFrames using a KD-tree matching routine provided by the astropy package.  See https://docs.astropy.org/en/stable/coordinates/matchsep.html?highlight=matching#matching-catalogs

First, we need to create two `SkyCoord` arrays -- one for df_DiaObjs_cand_SNeIa and one for df_ts23:

In [None]:
c_df_DiaObjs_cand_SNeIa = SkyCoord(ra=df_DiaObjs_cand_SNeIa.loc[:,'ra'].values*u.degree, dec=df_DiaObjs_cand_SNeIa.loc[:,'decl'].values*u.degree)
c_df_ts23 = SkyCoord(ra=df_ts23.loc[:,'ra'].values*u.degree, dec=df_ts23.loc[:,'dec'].values*u.degree)

We will use the astropy function `match_coordinates_sky` to match the two catalogs.

**Note that `match_coordinates_sky` finds the nearest match in the reference catalog (the second parameter, here `c_df_ts23`) for each object in the source catalog (the first parameter, here `c_df_diaObjs_tx`).  The nearest match could be very distant (and thus unphysical); so you may wish to apply a limit on the separation after the match to remove unphysical matches.** Here, we will consider all matches, at least for the time being.

`match_coordinates_sky` returns 3 arrays:  `idx` is the array of indices in c_df_ts23 that have matches to c_df_DiaObjs_cand_SNeIa, `d2d` is the array of 2D separations between the object in c_df_ts23 and its match in c_df_DiaObjs_cand_SNeIa, and `d3d` is the array of the 3D separations between the object in c_df_ts23 and its match in c_df_DiaObjs_cand_SNeIa. Since we do not include a third coordinate for distance for spatial matches (a unit distance is assumed), we ignore `d3d`.


In [None]:
idx, d2d, d3d = match_coordinates_sky(c_df_DiaObjs_cand_SNeIa, c_df_ts23)

### 2.4. Analysis of the match.


First, we note that there is indeed a match for each of the 331 candidate SNe Ia in c_df_DiaObjs_cand_SNeIa.  (That said, since we have not imposed a limiting match radius, some of these matches may be bad/unphysical).

In [None]:
len(d2d)

Now let's look at these matches in df_ts23.  Note that most are indeed `truth_type=3` (i.e., SNe Ia).

In [None]:
df_ts23.loc[idx]

Let's make a copy of df_ts23 and reset its index, to simplify merging it with df_DiaObjs_cand_SNeIa later.

In [None]:
df_ts23_cand_SNeIa = df_ts23.loc[idx].copy(deep=True)
df_ts23_cand_SNeIa.reset_index(inplace=True, drop=True)

_Optional_: view the contents of df_ts23_cand_SNeIa:

In [None]:
#df_ts23_cand_SNeIa

The index for each row in df_ts23_cand_SNeIa (`0, 1, 2, 3, ...`) is now the same as the index of the corresponding DiaObject in df_DiaObjs_cand_SNeIa; so we can perform this simple merge, joining by the index:

In [None]:
df_merged = pd.merge(df_DiaObjs_cand_SNeIa,df_ts23_cand_SNeIa, left_index=True, right_index=True)

Now we have the relevant TruthSummary info for each matched DiaObject in the df_DiaObjs_tx DataFrame all in one df_merged DataFrame:

In [None]:
df_merged

Let's plot the DiaObjects sky positions.  Grey triangles are matched with TruthSummary SNe Ia; Cyan circles are matched with TruthSummary variable stars.  Note, most of the variables are near corners of the field, which may indicate some sort of mis-match.

In [None]:
color_truth_type = {2 : '#56b4e9', 3 : 'grey'}
marker_truth_type = {2 : 'o', 3 : '^'}

groups = df_merged.groupby('truth_type')
for name, group in groups:
    plt.plot(group.ra_x, group.decl, marker=marker_truth_type[name], linestyle='', markersize=5, label=name, color=color_truth_type[name])

plt.legend()

Now, let's calculate the separation in arcseconds between each DiaObject in the merged DataFrame and its TruthSummary match.  We could do this with `astropy`, but here we will just do a quick calculation using the following function.

(Alternatively, we could have just used the results from `idx, d2d, d3d = match_coordinates_sky(c_df_DiaObjs_tx, c_df_ts23)`, via `df_merged['d2d_arcsec'] = d2d.arcsec`.  Note that the results from `match_coordinates_sky` differ from those from the following `sepGetArray` function by c. 1 milli-arcsec.)

In [None]:
# Return angular separation (in degrees) of two points on the Celestial Sphere
def sepDegGetArray(raDeg1,decDeg1,raDeg2,decDeg2):

    import numpy

    raRad1  = numpy.radians(raDeg1)
    decRad1 = numpy.radians(decDeg1)
    raRad2  = numpy.radians(raDeg2)
    decRad2 = numpy.radians(decDeg2)

    cosSep = numpy.sin(decRad1)*numpy.sin(decRad2) + numpy.cos(decRad1)*numpy.cos(decRad2)*numpy.cos(raRad1-raRad2)
    sepRad = numpy.arccos(cosSep)
    sepDeg = numpy.degrees(sepRad)

    return sepDeg

In [None]:
df_merged['sepArcsec'] = 3600.*sepDegGetArray(df_merged['ra_x'],df_merged['decl'],df_merged['ra_y'],df_merged['dec'])

_Optional_:  view the contents of df_merged with the new `sepArcsec` column:

In [None]:
#df_merged

Now let's plot the histogram of separations.  There are clearly some "matches" that have a separation over a degree away!

In [None]:
df_merged.hist('sepArcsec', bins=50)
plt.xlabel('separation [arcsec]')
plt.ylabel('Number')

Let's plot DiaObjects matched with TruthSummary variable stars separately from those matched with TruthSummary SNe Ia.

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(10, 3), sharey=False, sharex=False)

ax[0].hist(df_merged[df_merged.truth_type==2]['sepArcsec'], bins=20, color='grey')
ax[0].set_title('Matched to a Variable Star')
ax[0].set_xlabel('separation [arcsec]')
ax[0].set_ylabel('Number matches')
ax[0].grid(True)

ax[1].hist(df_merged[df_merged.truth_type==3]['sepArcsec'], bins=20, color='grey')
ax[1].set_title('Matched to an SNIa')
ax[1].set_xlabel('separation [arcsec]')
ax[1].set_ylabel('Number matches')
ax[1].grid(True)

plt.tight_layout()
plt.show()

Let's zoom in for both of these.  A 1.0-arcsec match radius sounds like a pretty reasonable limit for a good match.

In [None]:
# Limit ot a range of 1.00 arcsec:
range = [0.00, 1.00]

fig, ax = plt.subplots(1, 2, figsize=(10, 3), sharey=False, sharex=False)

ax[0].hist(df_merged[df_merged.truth_type==2]['sepArcsec'], bins=100, range=range, color='grey')
ax[0].set_title('Matched to a Variable Star')
ax[0].set_xlabel('separation [arcsec]')
ax[0].set_ylabel('Number matches')
ax[0].grid(True)

ax[1].hist(df_merged[df_merged.truth_type==3]['sepArcsec'], bins=100, range=range, color='grey')
ax[1].set_title('Matched to an SNIa')
ax[1].set_xlabel('separation [arcsec]')
ax[1].set_ylabel('Number matches')
ax[1].grid(True)

plt.tight_layout()
plt.show()

Yes, a match radius of 1.00 arcsec is indeed reasonable.  It tooks like three legitimate matches between the SNIa-candidate DiaObjects and variable stars out of a total of 331 SNIa-candidate DiaObjects.  The overwhelming majority of SNIa-candidate DiaObjects (313 out of 331) are matched to genuine SNeIa's in the TruthSummary table:

In [None]:
len( df_merged[ (df_merged.truth_type==3) & (df_merged.sepArcsec < 1.00) ]['sepArcsec'] )

Let's plot the sky distribution of again, but color-code the symbols by `sepArcsec`.  It is clear here that the stars in the upper right of the plot are all bad matches.

In [None]:
plt.scatter(df_merged.ra_x, df_merged.decl, marker='o', s=20, c=df_merged.sepArcsec)
plt.colorbar()

Finally, for this subsection, let's plot the redshifts of the 313 DiaObjects are well-matched with SNeIa in the TruthSummary table.  

In [None]:
mask = (df_merged.truth_type==3) & (df_merged.sepArcsec < 1.00)
df_merged[mask].hist('redshift', bins=25)

So, yes, this is quite reasonable.  The original cuts performed on the DiaObject/DiaSource tables from the DP0.2 Tutorial Notebook <a href="https://github.com/rubin-dp0/tutorial-notebooks/blob/main/07a_DiaObject_Samples.ipynb">07a_DiaObject_Samples.ipynb</a> were intentionally "back-of-the-envelope" to find candidate low-redshift SNe Ia, and this is what we found.  The resulting redshift range was a bit larger (0.1 < z < 0.5) than originally sought (0.1 < z < 0.3), but still very reasonable for a "back-of-the-envelope" search, and nearly 95% (313./331.) of the DiaObject/DiaSource candidates did turn out to be well matched with SNe Ia in the TruthSummary table.

## 3.  What fraction of variable objects in the TruthSummary table show up in the DiaObject table?

This is sort of the reverse question from the one in Section 2.  

There, we extracted a set of candidate SNe Ia from the DiaObjects table using the observed properties from the DiaObject and DiaSource tables and tried to find their matches in the TruthSummary table.

Here, we are looking at the TruthSummary table, and want to see if the variable objects found there can also be found in thd DiaObjects table.  (Note:  we don't expect all the entries in the TruthSummary table to have corresponding entries in the DiaObject table, since the TruthSummmary table goes deeper than the Rubin detection thresholds.)

### 3.1 Query all variable objects within a small sky area within the TruthSummary table

First, let's grab an RA,DEC-restricted sample of variable objects from the TruthSummary table.  This cell typically takes less than a minute to run.

In [None]:
%%time

query = """SELECT id, ra, dec, is_variable, mag_r, redshift, truth_type 
           FROM dp02_dc2_catalogs.TruthSummary 
           WHERE ra >= 61.0 AND ra <= 62.0 AND dec >= -33.0 AND dec <= -32.0
           AND is_variable=1
        """

print(query)

results = service.search(query)
df_ts123 = results.to_table().to_pandas()
del results

df_ts123

It looks like there are a lot of variable stars (`truth_type=2`).  

### 3.2 Query all objects within the same small sky area within the DiaObjects table

Next, we will query all the objects in the DiaObject table within the same RA,DEC range.  (Although we use `maxrec`, this limit is actually not reached.  In the RA,DEC range used, there are 100,643 entries returned.)  The query is similar for the DiaObjs query in Section 3.2, but with fewer restrictions in the `WHERE` clause.  Basically, we wanted just about everything in the DiaObjects table within a limited RA,DEC range.  The following cell typically takes less than a minute to run.

In [None]:
%%time

results = service.search("SELECT ra, decl, diaObjectId, nDiaSources, "
                         "scisql_nanojanskyToAbMag(rPSFluxMin) AS rMagMax, "
                         "scisql_nanojanskyToAbMag(rPSFluxMax) AS rMagMin, "
                         "scisql_nanojanskyToAbMag(gPSFluxMax) AS gMagMin, "
                         "scisql_nanojanskyToAbMag(iPSFluxMax) AS iMagMin, "
                         "scisql_nanojanskyToAbMag(rPSFluxMin)"
                         " - scisql_nanojanskyToAbMag(rPSFluxMax) AS rMagAmp "
                         "FROM dp02_dc2_catalogs.DiaObject "
                         "WHERE ra BETWEEN 61.0 AND 62.0 AND decl BETWEEN -33.0 AND -32.0",
                         maxrec=1000000)

DiaObjsAll = results.to_table()
del results

In [None]:
DiaObjsAll

Let's convert the `astropy` table to a pandas DataFrame.

In [None]:
df_DiaObjsAll = DiaObjsAll.to_pandas()

### 3.3. Match the catalog of variable objects from TruthSummary objects with catalog the objects from the DiaObject table.

As in Section 2.3, we'll match the two DataFrames using the astropy function, `match_coordinates_sky`.  (As before , details for this astropy function can be found here: https://docs.astropy.org/en/stable/coordinates/matchsep.html?highlight=matching#matching-catalogs.)

As in Section 2.3, we first need to create two `SkyCoord` arrays -- one for df_ts123 and one for df_DiaObjsAll:

In [None]:
c_df_ts123 = SkyCoord(ra=df_ts123.loc[:,'ra'].values*u.degree, dec=df_ts123.loc[:,'dec'].values*u.degree)
c_df_DiaObjsAll = SkyCoord(ra=df_DiaObjsAll.loc[:,'ra'].values*u.degree, dec=df_DiaObjsAll.loc[:,'decl'].values*u.degree)

As in Section 2.3, 2e will use the astropy function `match_coordinates_sky` to match the two catalogs.

That said, there is one difference:  before, we wanted to find the best match in the TruthSummary catalog (DataFrame df_ts23) for each entry in our DiaObject-based SNIa-candidate catalog (DataFrame df_DiaObjs_cand_SNeIa); here, we want to find the best match in the DiagObject catalog (DataFrame df_DiaObjsAll) for each entry in our TruthSummary catalog (DataFrame df_ts123).  So, in the current case, the reference catalog (the second parameter in `match_coordinates_sky`) is c_df_DiaObjsAll, and the source catalog (the first paramter in `match_coordinates_sky`) is c_df_ts123.

Recall that `match_coordinates_sky` returns 3 arrays: idx is the array of indices in the reference catalog (c_df_DiaObjsAll) that have matches to the source catalog (c_df_ts123), d2d is the array of 2D separations between the object in reference catalog (c_df_DiaObjsAll) and its match in the source catalog (c_df_ts123), and d3d is the array of the 3D separations between the object in the reference catalog (c_df_DiaObjsAll) and its match in the source catalog (c_df_ts123). Since we do not include a third coordinate for distance for spatial matches (a unit distance is assumed), we ignore d3d.


In [None]:
idx, d2d, d3d = match_coordinates_sky(c_df_ts123, c_df_DiaObjsAll)

### 3.4. Analysis of the match.

As expected, we have as many matches as there are rows in df_ts123 (i.e, 11325).

In [None]:
len(idx)

Let's perform the same type of merge join of the the two DataFrames that we did in Section 2.3.  Here, we also include the separations (in arcsec) calculated by `match_coordinates_sky` as a comparison with those separations calculated by our function `sepGetArcsec`; note that the two methods yield values that differ by typically less than 1 milli-arcsec.

In [None]:
df_DiaObjsAll_new = df_DiaObjsAll.loc[idx].copy(deep=True)
df_DiaObjsAll_new.reset_index(inplace=True, drop=True)

df_merged_allmatch = pd.merge(df_ts123, df_DiaObjsAll_new, left_index=True, right_index=True)
df_merged_allmatch['sepArcsec'] = 3600.*sepDegGetArray(df_merged_allmatch['ra_x'],df_merged_allmatch['decl'],df_merged_allmatch['ra_y'],df_merged_allmatch['dec'])
df_merged_allmatch['d2d_arcsec'] = d2d.arcsec

Let's take a look at the resulting merged DataFrame:

In [None]:
df_merged_allmatch

Based on our results in Section 2, let's define a good match as one with a match separation of less than 1.0 arcsec.

In [None]:
df_merged_allmatch.loc[:,'good_match'] = (df_merged_allmatch.loc[:,'sepArcsec'] < 1.00)

_Optional:_  Look a the updated contents of df_merged_allmatch:

In [None]:
#df_merged_allmatch

Let's focus on the TruthSummary variable stars, first.  Let's plot the histogram for `mag_r` for "All Matches" and for just the "Good Matches".  Note that the "Good Matches" are mostly brighter than about `mag_r` < 22.  This clearly indicates some sort of magnitude limit for strong detection by the DIA pipeline.

In [None]:
ax = df_merged_allmatch[(df_merged_allmatch.truth_type==2)].hist('mag_r', label='All Matches')
df_merged_allmatch[(df_merged_allmatch.truth_type==2) & (df_merged_allmatch.good_match)].hist('mag_r', label='Good Matches', ax=ax)
plt.legend()
plt.xlabel('mag_r')
plt.ylabel('Number')
plt.title('TruthSummary Table Variable Stars')


Unfortunately, the TruthSummary table does not record the input brightness of its SNe Ia.  This information is likely available elsewhere, but finding it is beyond the purview of this notebook.  Let us use redshift, instead.  The results are are not as clear cut as for the TruthSummary variable stars, but, for the TruthSummary SNe Ia, the "Good Matches" do tend to have lower redshifts than "All Matches".  Since it is expected that higher redshift SNe Ia will tend to have fainter apparent magnitudes, this also suggests some sort of magnitude limit for detection by the DIA pipeline.

In [None]:
ax = df_merged_allmatch[(df_merged_allmatch.truth_type==3)].hist('redshift', label='All Matches')
df_merged_allmatch[(df_merged_allmatch.truth_type==3) & (df_merged_allmatch.good_match)].hist('redshift', label='Good Matches', ax=ax)
plt.legend()
plt.xlabel('redshift')
plt.ylabel('Number')
plt.title('TruthSummary Table SNe Ia')
