<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=250 style="padding: 10px"> 
<br>
<b> (Little demo) Exploring the DiaObject Duplication Issues in DP0.2</b> <br>
Contact author(s): Ryan Lau <br>
Last verified to run: 2024-08-02 <br>
LSST Science Pipelines version: Weekly 2024_04 <br>
Container Size: medium

## 1. Introduction

The purpose of this notebook is to demonstrate and characterize the DiaObject duplication issue identified in notebook DP02_07b "Variable Stars in DP0.2," where the same variable star or transient can be associated with multiple DiaOjbectIds. The key goal of this notebook is to inform how this issue may or may not affect your scientific analysis of transients and/or variables. 

In Section 2, we demonstrate the diaObject duplication issue using the known RR Lyrae variable presented as an example in notebook DP02_07b. This known RR Lyrae star has two diaObjectIds associated with it within a 0.5'' radius, which highlights the issue with the source association algorithm in the LSST Pipeline that should associate diaSources with diaObjects within a 0.5'' radius. 

Lastly, in Section 3, we conduct a broader investigation of the diaObject duplication issue. We estimate the occurrence rate of duplicate diaObjectIds using a random sample of diaObjects from the diaObject catalog, and we investigate what conditions might trigger the diaObject duplication. We also investigate the types of sources affected by this using the TruthSummary table. 

### 1.1 Package Imports

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from lsst.rsp import get_tap_service


### 1.2 Define Functions and Parameters

Setting the plot format parameters

In [None]:
%matplotlib inline
plt.style.use('tableau-colorblind10')
params = {'axes.labelsize': 24,
          'font.size': 20,
          'legend.fontsize': 14,
          'xtick.major.width': 3,
          'xtick.minor.width': 2,
          'xtick.major.size': 12,
          'xtick.minor.size': 6,
          'xtick.direction': 'in',
          'xtick.top': True,
          'lines.linewidth': 3,
          'axes.linewidth': 3,
          'axes.labelweight': 3,
          'axes.titleweight': 3,
          'ytick.major.width': 3,
          'ytick.minor.width': 2,
          'ytick.major.size': 12,
          'ytick.minor.size': 6,
          'ytick.direction': 'in',
          'ytick.right': True,
          'figure.figsize': [10, 8],
          'figure.facecolor': 'White'}
plt.rcParams.update(params)

Start the TAP service, which we will use for all data retrieval in this notebook.

In [None]:
service = get_tap_service("tap")

## 2. Demonstration of DiaObject Duplication Issue on a Known RR Lyrae Variable

In this section, we will present an example of the DiaObject duplication issue using the known RR Lyrae Variable that was also presented as an example variable in notebook DP02_07b. 



### 2.1 Identifying Two DiaObjectIds associated with known variable from DiaSources

As in DP02_07b, the known RR Lyrae star we will use is at position (RA, Dec) = (62.1479031, -35.799138) degrees. 

We define the coordinate (ra and dec) of the known position this variable, and then obtain the diaSource properties including the associated diaObjectIds, total flux, and detector coordinate positions by conducting the following search. Note that we use a narrow search radius of 0.5'', which is the radius used for associating a diaSource with a known diaObject. Lastly, we sort the diaSources by 'midPointTai' and 'diaSourceId' as primary and secondary keys, respectively, to list the diaSources in temporal order.

In [None]:
ra_known_rrl = 62.1479031
dec_known_rrl = -35.799138

results = service.search("SELECT ra, decl, diaObjectId, diaSourceId, ccdVisitId,"
                             "filterName, midPointTai, psFlux, totFlux, totFluxErr, "
                             "apFlux, apFluxErr,psFluxErr, snr, x, y "
                             "FROM dp02_dc2_catalogs.DiaSource "
                             "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "
                             "CIRCLE('ICRS'," + str(ra_known_rrl) + ", "
                             + str(dec_known_rrl) + ", 0.000139)) = 1 ", maxrec=100000)
DiaSrcs = results.to_table()
DiaSrcs.sort(keys = ['midPointTai', 'diaSourceId'])
del results

We can identify the unique diaObjects from the resulting 'DiaSrcs' table.

In [None]:
DiaObjIds = list(set(DiaSrcs['diaObjectId']))
print(DiaObjIds)
print('The first DiaObject assigned to this target is %s' %DiaSrcs['diaObjectId'][0])

If the DIA source association is working propertly, there should only be one DiaObject associated with the 0.5'' search of a true astrophysical source. The occurance of a duplicate DiaObjectId indicates an issue with the source association. Note that the first DiaObjectId created for this target is 1651589610221862935. The duplicate DiaObjectId is 1651589610221864014.

The following cell will identify the MJD when the duplicate DiaObject was created and show where in the DiaSrcs table this occurs.

In [None]:
mjd_dup = DiaSrcs[np.where(DiaSrcs['diaObjectId']==1651589610221864014)]['midPointTai'][0]
DiaSrcs[np.where(DiaSrcs['midPointTai']==mjd_dup)]

Surprisingly, the DiaObject duplication seems to have occured on the same visit where a new diaSource was associated with the original diaObject (1651589610221862935) with nearly identical measure DiaSource properties (excepted 'totFlux'). It is uncertain whether the discrepancy in totFlux is associated with the duplication issue. 

In Sec. 3.2, we explore the occurence rate of DiaObject duplication on the same visit.

#### 2.1.1 Note on the DiaObject Duplication issue in the same visit

It is worth noting that a new diaSource within the same visit can result in both the creation of a duplicate diaObject and a new diaSource of that duplicated diaObject (i.e. 3 diaSources in the same visit of the same astrophysical object). The following source exhibits this issue.

In [None]:
ra_known_var = 49.8859616
dec_known_var = -44.5203187

results = service.search("SELECT ra, decl, diaObjectId, diaSourceId, ccdVisitId,"
                             "filterName, midPointTai, psFlux, totFlux, totFluxErr, "
                             "apFlux, apFluxErr,psFluxErr, snr, x, y "
                             "FROM dp02_dc2_catalogs.DiaSource "
                             "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "
                             "CIRCLE('ICRS'," + str(ra_known_var) + ", "
                             + str(dec_known_var) + ", 0.000139)) = 1 ", maxrec=100000)
DiaSrcs_var = results.to_table()
DiaSrcs_var.sort(keys = ['midPointTai', 'diaSourceId'])
del results

In the following cell, we show the 3 diaSources that were created from a single visit at MJD = 60686.08.

In [None]:
DiaSrcs_var[DiaSrcs_var['midPointTai']==60686.0835662]

### 2.2 Light curve and coordinate properties of DiaSources associated with known RR lyrae variable

We can plot the light curve (y-band) of both diaObjects to investigate if they perhaps exhibit distinct brightnesses and/or levels of variability

In [None]:
fig, ax = plt.subplots(1, figsize=(12, 8))
DiaObjIds = list(set(DiaSrcs['diaObjectId']))
filt = 'y'


for j in np.arange(len(DiaObjIds)):
    fx = np.where((DiaSrcs['diaObjectId']==DiaObjIds[j]) & (DiaSrcs['filterName']==filt))[0]
    ax.plot(DiaSrcs['midPointTai'][fx], DiaSrcs['totFlux'][fx],['p','v','^','o'][j], ms=10,color = ['r','g','b','y'][j], mew=2, mec=['r','g','b','y'][j],
            alpha=0.3, label=DiaObjIds[j])

#Plotting line indicating MJD where duplicate DiaObject is created
plt.axvline(x = mjd_dup,color = 'black', ls = '--', lw = 1)

ax.set_xlabel('Modified Julian Date')
ax.set_ylabel('TotFlux')
ax.set_title('Forced PSF flux measured on the direct image')
ax.legend(loc='lower right')

plt.show()

Other than the initial low value of the first diaSource associated with the duplicate diaObject (1651589610221864014), the light curve appears to indicate that both diaObjects are indeed associated with the same variable star.

In the following cell, we plot the measured ra and dec offsets from the known coordinates of the variable star for both diaObjects (in all filters).

In [None]:
fig, ax = plt.subplots(1, figsize=(10, 10))
DiaObjIds = list(set(DiaSrcs['diaObjectId']))


for j in np.arange(len(DiaObjIds)):
    fx = np.where(DiaSrcs['diaObjectId']==DiaObjIds[j])
    ax.plot((ra_known_rrl-DiaSrcs['ra'][fx])*3600, (dec_known_rrl-DiaSrcs['decl'][fx])*3600,
               ['p','v','^','o'][j], ms=10, mew=2, mec=['r','g','b','y'][j],
               alpha=0.5, color='none', label='%s' % DiaObjIds[j])

ax.set_xlabel('ra offset (arcsec)')
ax.set_ylabel('dec offset (arcsec)')
ax.legend(loc='lower left')


plt.show()

As shown in the offset coordinate plot, the diaSources associated with both diaObjects are within 0.5'' of eachother and therefore should not have been split into two diaObjects.

## 3. Broader Investigation of DiaObject Duplication Issue

In this section, we estimate the occurrence rate of the diaObject duplication issue by obtaining a random sample of diaObjects from the DiaObject catalog, and verify the types of objects that are affected by this using the TruthSummary table.

### 3.1. Estimated Occurrence Rate of DiaObject Duplication Issue

First, we grab a random sample of 200 DiaObjects that have at least 10 detections to help ensure they are associated with real astrophysical transients/variables and not arising from spurious diaSource detections. 

In [None]:
nDiaSources_min = 10

results = service.search("SELECT TOP 200 "
                         "ra, decl, diaObjectId, nDiaSources "
                         "FROM dp02_dc2_catalogs.DiaObject "
                         "WHERE nDiaSources > "+str(nDiaSources_min)+" ")
DiaObjs = results.to_table()
del results

In the following cell, we conduct a coordinate search through the DiaSource catalog for the sample of diaObjects obtained above to identify DiaSources within a 0.5'' radius of the diaObject coordinates. 

The resulting DiaSources will notably include the diaObjectId(s) they are associated with. The diaObject duplication issue will manifest as multiple diaObjectIds for a specific coordinate position.

We will keep track of the number of duplicate diaObjects (nDuplicates), their diaObjectIds (DiaObjIds), the magnitude of the diaObject before the duplication preceding the duplication (DupMags) and following the duplication (DupMags_2), the filter(s) of the visits before and after the duplication occurred (DupFilters), the dates (in MJD) before and after the duplication occurred (DupVisitMJD), and the time (in days) between the original and duplicate visits (DupMJDDiff). We also keep track of a "control" DiaSource magnitude, for which we assume is the magnitude of the DiaObject's 10th DiaSource.

The following cell should take less than 1 minute to execute.

In [None]:
NDup = np.zeros(len(DiaObjs))
NDupDiaObj = np.zeros(len(DiaObjs),dtype = 'object')
DupMJDDiff = np.zeros(len(DiaObjs),dtype = 'object')
NDupMag = np.zeros(len(DiaObjs),dtype = 'object')
NDupMag_2 = np.zeros(len(DiaObjs),dtype = 'object')
ControlMag = np.zeros(len(DiaObjs),dtype = 'object')
NDupDate = np.zeros(len(DiaObjs),dtype = 'object')
NDupFilt = np.zeros(len(DiaObjs),dtype = 'object')
for i in np.arange(len(DiaObjs)):
    ra = DiaObjs['ra'][i]
    decl = DiaObjs['decl'][i]
    results = service.search("SELECT ra, decl, diaObjectId, diaSourceId, ccdVisitId,"
                             "scisql_nanojanskyToAbMag(totFlux) AS mag,"
                             "filterName, midPointTai "
                             "FROM dp02_dc2_catalogs.DiaSource "
                             "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "
                             "CIRCLE('ICRS'," + str(ra) + ", "
                             + str(decl) + ", 0.000139)) = 1 ", maxrec=100000) #0.5'' radius coordinate search
    DiaSrcs = results.to_table()
    DiaSrcs.sort(keys = ['midPointTai', 'diaSourceId'])
    del results
    NDup[i]=len(list(set(DiaSrcs['diaObjectId'])))-1
    ddot = [DiaSrcs['diaObjectId'][0]]
    ddates = []
    ddatesdiff = []
    dfilt = []
    dmag = []
    dmag_2 = []
    ctrlmag = []
    ctrlmag +=[DiaSrcs['mag'][9]] # mag of 10th diasource as control
    for x in np.arange(len(DiaSrcs)-1):
        if DiaSrcs['diaObjectId'][x]!=DiaSrcs['diaObjectId'][x+1] and DiaSrcs['diaObjectId'][x+1] not in ddot:
            ddot+=[DiaSrcs['diaObjectId'][x+1]]
            ddates+=[[DiaSrcs['midPointTai'][x],DiaSrcs['midPointTai'][x+1]]]
            ddatesdiff+=[DiaSrcs['midPointTai'][x+1]-DiaSrcs['midPointTai'][x]]
            dfilt+=[[DiaSrcs['filterName'][x],DiaSrcs['filterName'][x+1]]]
            dmag+=[DiaSrcs['mag'][x]] #mag of diasource source before duplication
            dmag_2+=[DiaSrcs['mag'][x+1]] #mag of diasource after duplication
    NDupDiaObj[i] = ddot
    NDupDate[i] = ddates
    NDupFilt[i] = dfilt
    NDupMag[i] = dmag
    NDupMag_2[i] = dmag_2
    ControlMag[i] = ctrlmag
    DupMJDDiff[i] = ddatesdiff
DiaObjs['nDuplicates']=NDup
DiaObjs['DiaObjIds']=NDupDiaObj
DiaObjs['DupFilters']=NDupFilt
DiaObjs['DupMags']=NDupMag
DiaObjs['DupMags_2']=NDupMag_2
DiaObjs['ControlMag']=ControlMag
DiaObjs['DupVisitMJD'] = NDupDate
DiaObjs['DupMJDDiff'] = DupMJDDiff

Uncomment to display the DiaObjects with duplicates (i.e. nDuplicates > 0).

In [None]:
#DiaObjs[DiaObjs['nDuplicates']>0]

We can estimate the occurrence frequency of the diaObject duplication issue by calculating what fraction of the DiaObject sample has at least one duplicate diaObject.

In [None]:
len(DiaObjs[DiaObjs['nDuplicates']>0])/len(DiaObjs)

The results of the above cell indicates that >~50% of diaObjects are affected by the diaObject duplication issue, which could present significant issues for statistical analyses of transients and variables. 

### 3.2. Investigating Conditions when DiaObject Duplication Occurs

In this subsection, we investigate the conditions that may give rise to the DiaObject duplication issue. For the DiaObjects with duplicates, we will plot (1) a histogram of the number DiaObjects as a function of $\Delta$t, the time between the original and duplicate visits in days, and (2) a histogram of the number of DiaObjects as a function of the DiaObject brightness (AB mag) preceding the DiaObject duplication

In [None]:
#flattening array
dupdt=sum(list(DiaObjs['DupMJDDiff']), [])

#Plotting histogram of delta t, the difference between duplicate DiaObject creation.
plt.hist(dupdt, density=False, bins=100) 
plt.ylabel('N_DiaObjects')
plt.xlabel('$\Delta$t (d)')
plt.show()

#zoom - 10 day

plt.hist(dupdt, density=False,range = [0,10], bins=100)
plt.ylabel('N_DiaObjects')
plt.xlabel('$\Delta$t (d)')
plt.xlim(0,10)
plt.show()

#zoom - 0.2 day

plt.hist(dupdt, density=False,range = [0,0.2], bins=100)
plt.ylabel('N_DiaObjects')
plt.xlabel('$\Delta$t (d)')
plt.xlim(0,0.2)
plt.show()

In [None]:
#Percentage of duplicates that occurred within 50 days between visits
len(np.array(dupdt)[np.array(dupdt)<50])/len(dupdt)*100

In [None]:
#Percentage of duplicates that occurred in the same visit
len(np.array(dupdt)[np.array(dupdt)==0])/len(dupdt)*100

We see that ~80% of the duplicates occurred within 50 days between visits, and that the duplication from the same visit (i.e. $\Delta$t=0) accounted for ~8% of the duplication events

In the following histogram, we investigate the dependence of the DiaObject duplication events on source brightness before duplication, after duplication, and a control distribution (the mag of the 10th DiaSource from a DiaObject).

In [None]:
#flattening arrays
dupmag = sum(list(DiaObjs['DupMags']), [])
dupmag_2 = sum(list(DiaObjs['DupMags_2']), [])
dupmag_control = sum(list(DiaObjs['ControlMag']), [])

plt.hist(dupmag, density=False, bins=100,range = [0,30], color = 'tab:blue', alpha = 0.5,label = 'Before Duplication')  
plt.hist(dupmag_2, density=False, bins=100,range = [0,30], color = 'tab:red',alpha = 0.5,label = 'After Duplication')  
plt.hist(dupmag_control, density=False, bins=100,range = [0,30], color = 'grey',alpha = 0.5, label = 'Control',zorder=0)  
plt.legend()
plt.ylabel('N_DiaObjects')
plt.xlabel('Mag (converted from totFlux)')
plt.show()

There does not appear to be significant differences between the three distributions, but the DiaSource mag after duplication may exhibit a trend where it is slightly fainter. Mag = 0 corresponds to 'masked' values.

In [None]:
#This one is weird because it has duplicate DiaSourceIDs with same DiaObjs...
#This indicates there is a DiaSourceId duplication issue.. Maybe this is related? 
ra_known_rrl = 49.8859616
dec_known_rrl = -44.5203187

results = service.search("SELECT ra, decl, diaObjectId, diaSourceId, ccdVisitId,"
                             "filterName, midPointTai, psFlux, totFlux, totFluxErr, "
                             "apFlux, apFluxErr,psFluxErr, snr, x, y "
                             "FROM dp02_dc2_catalogs.DiaSource "
                             "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "
                             "CIRCLE('ICRS'," + str(ra_known_rrl) + ", "
                             + str(dec_known_rrl) + ", 0.000139)) = 1 ", maxrec=100000)
DiaSrcs_DupSrc = results.to_table()
DiaSrcs_DupSrc.sort(keys = ['midPointTai', 'diaSourceId'])
del results

In [None]:
DiaSrcs_DupSrc[DiaSrcs_DupSrc['midPointTai']==60686.0835662]

### 3.3. Investigating DiaObject Truth Type affected by Duplication Issue

In order to investigate the type of sources (i.e. variables vs transients) that are producing the duplication issue, we will grab a random sample of diaObjects associated with variable stars and SNe and identify how many diaObjects of each type have duplicates. In order to verify the diaObject type, we utilize the TruthSummary table, the DESC DC2 truth catalog as described in arXiv:2101.04855.

**First, we will explore the variable stars with diaObject duplicates**. We conduct a (1''-radius) coordinate search in the TruthSummary table for the diaObjects with duplicates identified in the random sample of diaObjects obtained in Sec. 3.1. The following cell should take around 30 seconds to execute.


In [None]:
Var = np.zeros(len(DiaObjs))
for i in np.arange(len(DiaObjs)):
    ra = DiaObjs['ra'][i]
    decl = DiaObjs['decl'][i]
    results = service.search("SELECT ts.ra, ts.dec, ts.is_variable, ts.truth_type "
                             "FROM dp02_dc2_catalogs.TruthSummary AS ts "
                             "WHERE CONTAINS(POINT('ICRS', ts.ra, ts.dec), "
                             "CIRCLE('ICRS'," + str(ra) + ", "
                             + str(decl) + ", 0.00028)) = 1 ", maxrec=100000)
    SrcTruth = results.to_table()
    if 1 in SrcTruth['is_variable'] and 2 in SrcTruth[SrcTruth['is_variable']==1]['truth_type']:
        Var[i] = 1
    else:
        Var[i] = 0
    del results
DiaObjs['Var']=Var

print('Out of the %s diaObjects that are associated with a variable star'
      ' (i.e. is_variable = 1 and truth_type = 2), %s have duplicates.' % 
      (len(DiaObjs[DiaObjs['Var']==1]),len(DiaObjs[(DiaObjs['nDuplicates']>0) & (DiaObjs['Var']==1)])))


The results of the search in the previous cell indicate that roughly ~50% of the diaObjects associated with a variable star has duplicate diaObjects.

**However,** variables will likely have more visits than transients, so it is better to normalize the number of duplicate diaobjects by the number of DiaSources associated with the variable. We can investigate this by normalizing the number of duplicates by the total number of DiaSources.

In [None]:
nVar = len(DiaObjs[(DiaObjs['nDuplicates']>0) & (DiaObjs['Var']==1)])
nVarDiaSrcs = np.sum(DiaObjs[(DiaObjs['nDuplicates']>0) & (DiaObjs['Var']==1)]['nDiaSources'])
nVar_unc = np.sqrt(nVar)/nVarDiaSrcs

print('For variables, ',np.round(nVar/nVarDiaSrcs*100,3),'+/-',np.round(nVar_unc*100,3),'% of the DiaSources resulted in a DiaObject duplication event.')

**Now we explore the SNe with diaObject duplicates.** In order to grab a sample of Type Ia SN candidates, we utilize the search parameters described in the DP02 07a notebook "DiaObject Sample Identification." (See notebook 07a for more details).

In [None]:
snia_peak_mr_min = 18.82
snia_peak_mr_max = 22.46

snia_ampl_mr_min = 1.5
snia_ampl_mr_max = 5.5

snia_peak_mg_max = 24.0
snia_peak_mi_max = 24.0

snia_nDiaSources_min = 15
snia_nDiaSources_max = 100

snia_duration_min = 50
snia_duration_max = 300

Conducting a search for 500 diaObjects associated with Type Ia SN candidates.

In [None]:
results = service.search("SELECT TOP 500 "
                         "ra, decl, diaObjectId, nDiaSources, "
                         "scisql_nanojanskyToAbMag(rPSFluxMin) AS rMagMax, "
                         "scisql_nanojanskyToAbMag(rPSFluxMax) AS rMagMin, "
                         "scisql_nanojanskyToAbMag(gPSFluxMax) AS gMagMin, "
                         "scisql_nanojanskyToAbMag(iPSFluxMax) AS iMagMin "
                         "FROM dp02_dc2_catalogs.DiaObject "
                         "WHERE nDiaSources > "+str(snia_nDiaSources_min)+" "
                         "AND nDiaSources < "+str(snia_nDiaSources_max)+" "
                         "AND scisql_nanojanskyToAbMag(rPSFluxMax) > "+str(snia_peak_mr_min)+" "
                         "AND scisql_nanojanskyToAbMag(rPSFluxMax) < "+str(snia_peak_mr_max)+" "
                         "AND scisql_nanojanskyToAbMag(gPSFluxMax) < "+str(snia_peak_mg_max)+" "
                         "AND scisql_nanojanskyToAbMag(iPSFluxMax) < "+str(snia_peak_mi_max)+" "
                         "AND scisql_nanojanskyToAbMag(rPSFluxMin)"
                         " - scisql_nanojanskyToAbMag(rPSFluxMax) < "+str(snia_ampl_mr_max)+" "
                         "AND scisql_nanojanskyToAbMag(rPSFluxMin)"
                         " - scisql_nanojanskyToAbMag(rPSFluxMax) > "+str(snia_ampl_mr_min)+" ")

DiaObjsSN = results.to_table()
del results

In the following cell, we identify duplicates in the sample of SN candidates. The following cell takes around 2 minutes to execute.

In [None]:
NDup = np.zeros(len(DiaObjsSN))
for i in np.arange(len(DiaObjsSN)):
    ra = DiaObjsSN['ra'][i]
    decl = DiaObjsSN['decl'][i]
    results = service.search("SELECT ra, decl, diaObjectId, diaSourceId, ccdVisitId,"
                             "filterName, midPointTai "
                             "FROM dp02_dc2_catalogs.DiaSource "
                             "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "
                             "CIRCLE('ICRS'," + str(ra) + ", "
                             + str(decl) + ", 0.000139)) = 1 ", maxrec=100000) #0.5'' radius coordinate search
    DiaSrcs = results.to_table()
    del results
    NDup[i]=len(list(set(DiaSrcs['diaObjectId'])))-1
DiaObjsSN['nDuplicates']=NDup

Laslty, we utilize the TruthSummary table to confirm which candidates are true SNe and determine how many of the diaObjects associated with true SN have duplicates. The following cell takes around 3 minutes to execute.

In [None]:
SN = np.zeros(len(DiaObjsSN))
for i in np.arange(len(DiaObjsSN)):
    ra = DiaObjsSN['ra'][i]
    decl = DiaObjsSN['decl'][i]
    results = service.search("SELECT ts.ra, ts.dec, ts.is_variable, ts.truth_type "
                             "FROM dp02_dc2_catalogs.TruthSummary AS ts "
                             "WHERE CONTAINS(POINT('ICRS', ts.ra, ts.dec), "
                             "CIRCLE('ICRS'," + str(ra) + ", "
                             + str(decl) + ", 0.00028)) = 1 ", maxrec=100000)
    SrcTruth = results.to_table()
    if 3 in SrcTruth['truth_type']:
        SN[i] = 1
    else:
        SN[i] = 0
    del results
DiaObjsSN['SN']=SN
print('Out of the %s diaObjects that are associated with a supernova (i.e. truth_type = 3), %s have duplicates.' 
      % (len(DiaObjsSN[DiaObjsSN['SN']==1]),len(DiaObjsSN[(DiaObjsSN['nDuplicates']>0) & (DiaObjsSN['SN']==1)])))

The results of the search in the previous cell indicate that roughly ~20% of the diaObjects associated with a SN has duplicate diaObjects. 

We now normalize the number of duplicates by the total number of DiaSources and can compare to the normalized occurrence rate from the variables.

In [None]:
nSN = len(DiaObjsSN[(DiaObjsSN['nDuplicates']>0) & (DiaObjsSN['SN']==1)])
nSNDiaSrcs = np.sum(DiaObjsSN[(DiaObjsSN['nDuplicates']>0) & (DiaObjsSN['SN']==1)]['nDiaSources'])
nSN_unc = np.sqrt(nSN)/nSNDiaSrcs

print('For supernovae, ',np.round(nSN/nSNDiaSrcs*100,3),'+/-',np.round(nSN_unc*100,3),'% of the DiaSources resulted in a DiaObject duplication event.')

Within uncertainties, the normalized occurrence rate of the DiaObject duplication event is comparable for SNe and variables. The variable or transient behavior of the astrophysical object therefore unlikely influences the DiaObject duplication issue. 