# LVV-T1068: Coaddition for Deep Detection

**Written By: Bryce Kalmbach**

**Last updated: 10-03-2019**

**Tested on Stack Version: w_2019_38**

## Requirements:

[OSS-REQ-0157](https://docushare.lsst.org/docushare/dsweb/Get/LSE-030#page=60)

1. Fraction of all detections on deep detection coadds caused by unremoved artifacts shall not exceed 0.1%

## Proposed Test Case:

1. We will obtain detection catalogs obtained from the ComCam/LSSTCam deep co-addition, and compare them to external data sets (such as HSC, HST, DLS, or CFHTLens).  Sources in the ComCam/LSSTCam dataset which have no corresponding detections in the external data will be considered false detections. 

In [None]:
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 18})

In [None]:
import os
import json
import numpy as np
import pandas as pd
import h5py
import lsst.verify
from astropy import units as u
from astropy.coordinates import SkyCoord
from lsst.daf.persistence import Butler

### Set parameters for testing

* `test_bandpass`: The notebook will set up to test catalogs in this bandpass

* `mag_lims`: Keep detections with magnitudes in between `[bright_limit, faint_limit]`

In [None]:
test_bandpass = 'HSC-R'

mag_lims = [20., 26.] 
# According to survey info: https://hsc.mtk.nao.ac.jp/ssp/survey/, HSC-R goes down to 27.1 in DEEP when finished
# We will set faint limit at 26.0 for testing currently since HSC

### Load butler for HSC Deep and Ultra-Deep `deepCoadd`

We will compare the coadd source catalogs from the HSC Ultra-Deep data to
the coadd source catalog from HSC Deep. Since the Ultra-Deep fields
are completely within the HSC Deep footprint we can match sources in the Ultra-Deep
catalogs and know we are covered within the Deep footprint. Then if we compare
detection results as a function of magnitude we don't have to worry about the
varying magnitude depth in the surveys as long as we stay within the magnitude
limit of the HSC Deep field.

In [None]:
deep_repo_dir = '/datasets/hsc/repo/rerun/DM-13666/DEEP'
deep_butler = Butler(deep_repo_dir)

In [None]:
u_deep_repo_dir = '/datasets/hsc/repo/rerun/DM-13666/UDEEP'
u_deep_butler = Butler(u_deep_repo_dir)

In [None]:
# Without Gen 3 butler need to go into filesystem to get list of available tracts
deep_tracts = os.listdir(os.path.join(deep_repo_dir, 'deepCoadd', test_bandpass))

In [None]:
u_deep_tracts = os.listdir(os.path.join(u_deep_repo_dir, 'deepCoadd', test_bandpass))

Identify the tract numbers that overlap and then identify the specific patches in each tract that overlap between the two datasets.

In [None]:
overlap_tracts = list(set(deep_tracts).intersection(set(u_deep_tracts)))

In [None]:
overlap_tracts

In [None]:
overlap_patches = {}
for tract in overlap_tracts:
    deep_patches = os.listdir(os.path.join(deep_repo_dir, 'deepCoadd', test_bandpass, tract))
    u_deep_patches = os.listdir(os.path.join(u_deep_repo_dir, 'deepCoadd', test_bandpass, tract))
    common_patches = []
    for deep_patch in deep_patches:
        # only want folder names since these are the actual patch ids
        if len(deep_patch.split('.')) == 1:
            if deep_patch in u_deep_patches:
                common_patches.append(deep_patch)
    overlap_patches[tract] = common_patches

### Compile the source catalogs from a given tract for comparison
We pick a random tract and loop over all patches in that tract getting all the
sources from each catalog and compiling them in a pandas dataframe.

#### Only keep deep sources within ultradeep field

There are two ultradeep fields and HSC has a diameter of 1.5 degrees. Thus, keep only sources within 0.5 degrees of the center of each field to make sure we are not taking deep sources outside of the ultradeep area when we try to match.

In [None]:
ud_1_center = SkyCoord('02h18m00s -5', unit=(u.hourangle, u.deg))
ud_2_center = SkyCoord('10h00m29s +2d12m21s', unit=(u.hourangle, u.deg))

In [None]:
num_patches = []
for tract_on in overlap_tracts:
    num_patches.append(len(overlap_patches[tract_on]))

In [None]:
# Pick tract with most overlap patches
test_tract = overlap_tracts[np.argmax(num_patches)]

In [None]:
deep_src_df = None
u_deep_src_df = None
for test_patch_idx in range(len(overlap_patches[test_tract])):
    print(overlap_patches[test_tract][test_patch_idx])
    deep_src = deep_butler.get('deepCoadd_forced_src', tract=int(test_tract), 
                               patch=overlap_patches[test_tract][test_patch_idx], 
                               filter=test_bandpass)
    u_deep_src = u_deep_butler.get('deepCoadd_forced_src', tract=int(test_tract), 
                                   patch=overlap_patches[test_tract][test_patch_idx], 
                                   filter=test_bandpass)
    deep_photo_calib = deep_butler.get('deepCoadd_photoCalib', tract=int(test_tract), 
                                       patch=overlap_patches[test_tract][test_patch_idx], 
                                       filter=test_bandpass)
    ud_photo_calib = u_deep_butler.get('deepCoadd_photoCalib', tract=int(test_tract), 
                                       patch=overlap_patches[test_tract][test_patch_idx], 
                                       filter=test_bandpass)
    deep_flux = deep_src['base_PsfFlux_instFlux']
    deep_mags = deep_photo_calib.instFluxToMagnitude(deep_src, 'base_PsfFlux')
    u_deep_mags = ud_photo_calib.instFluxToMagnitude(u_deep_src, 'base_PsfFlux')
    
    if u_deep_src_df is None:
        u_deep_src_df = u_deep_src.asAstropy().to_pandas()
        u_deep_src_df = u_deep_src_df[['coord_ra', 'coord_dec', 'deblend_nChild']]
        u_deep_src_df['mag'] = u_deep_mags[:, 0]
        u_deep_src_df['mag_err'] = u_deep_mags[:, 1]
        
        u_deep_sample = SkyCoord(np.degrees(u_deep_src_df['coord_ra'])*u.deg, 
                                 np.degrees(u_deep_src_df['coord_dec'])*u.deg)

        # Identify field 1 or 2
        if u_deep_sample[0].separation(ud_1_center) < 2*u.deg:
            ud_center = ud_1_center
        else:
            ud_center = ud_2_center
        
        
    else:
        temp_ud_src_df = u_deep_src.asAstropy().to_pandas()
        temp_ud_src_df = temp_ud_src_df[['coord_ra', 'coord_dec', 'deblend_nChild']]
        temp_ud_src_df['mag'] = u_deep_mags[:, 0]
        temp_ud_src_df['mag_err'] = u_deep_mags[:, 1]
        u_deep_src_df = pd.concat([u_deep_src_df, temp_ud_src_df], sort=False)

    if deep_src_df is None:
        deep_src_df = deep_src.asAstropy().to_pandas()
        deep_src_df = deep_src_df[['coord_ra', 'coord_dec', 'deblend_nChild']]
        
        deep_coords = SkyCoord(np.degrees(deep_src_df['coord_ra'])*u.deg, 
                               np.degrees(deep_src_df['coord_dec'])*u.deg)
        deep_sep = deep_coords.separation(ud_center)
        deep_keep = np.where(deep_sep <= 0.75*u.deg)
        deep_coords = deep_coords[deep_keep]
        deep_src_df = deep_src_df.iloc[deep_keep].reset_index(drop=True)
        
        deep_src_df['mag'] = deep_mags[deep_keep[0], 0]
        deep_src_df['mag_err'] = deep_mags[deep_keep[0], 1]

    else:
        temp_deep_src_df = deep_src.asAstropy().to_pandas()
        temp_deep_src_df = temp_deep_src_df[['coord_ra', 'coord_dec', 'deblend_nChild']]

        deep_coords = SkyCoord(np.degrees(temp_deep_src_df['coord_ra'])*u.deg, 
                               np.degrees(temp_deep_src_df['coord_dec'])*u.deg)
        deep_sep = deep_coords.separation(ud_center)
        deep_keep = np.where(deep_sep <= 0.75*u.deg)
        deep_coords = deep_coords[deep_keep]
        temp_deep_src_df = temp_deep_src_df.iloc[deep_keep].reset_index(drop=True)
        
        temp_deep_src_df['mag'] = deep_mags[deep_keep[0], 0]
        temp_deep_src_df['mag_err'] = deep_mags[deep_keep[0], 1]
        deep_src_df = pd.concat([deep_src_df, temp_deep_src_df], sort=False)

### Remove entries for deblended parent objects to avoid double counting
Since the parent objects that are deblended into child objects 
are included in the source catalogs we exclude them in the rest of the analysis
to avoid double counting.

Also remove any sources with magnitudes outside the range we specify at the beginning.

In [None]:
deep_src_df = deep_src_df.query('deblend_nChild == 0 and mag >= %f and mag <= %f' % tuple(mag_lims)).reset_index(drop=True)
u_deep_src_df = u_deep_src_df.query('deblend_nChild == 0').reset_index(drop=True)

### Use astropy to spatially match the catalogs

In [None]:
deep_coords = SkyCoord(np.degrees(deep_src_df['coord_ra'])*u.deg, 
                       np.degrees(deep_src_df['coord_dec'])*u.deg)
u_deep_coords = SkyCoord(np.degrees(u_deep_src_df['coord_ra'])*u.deg, 
                         np.degrees(u_deep_src_df['coord_dec'])*u.deg)

In [None]:
# Check that deep coverage does span ultra-deep coverage
fig = plt.figure(figsize=(10, 8))
plt.scatter(deep_coords.ra.deg, deep_coords.dec.deg, label='deep')
plt.scatter(u_deep_coords.ra.deg, u_deep_coords.dec.deg, alpha=0.05, label='u_deep')
leg = plt.legend()
for lh in leg.legendHandles: 
    lh.set_alpha(1.0)
plt.xlabel('ra')
plt.ylabel('dec')
plt.title('Coverage map: Deep vs Ultra-Deep')

In [None]:
matched_idx, d2d, d3d = deep_coords.match_to_catalog_sky(u_deep_coords)
max_sep = 0.5*u.arcsec
# Reject results outside max separation
matched_idx[np.where(d2d > max_sep)] = -99

In [None]:
# Make sure to only take the closest match to something in the reference catalog
unique_idx, idx_counts = np.unique(matched_idx, return_counts=True)
for idx_match, idx_count in zip(unique_idx, idx_counts):
    if idx_match == -99:
        continue
    elif idx_count == 1:
        continue
    else:
        duplicate_deep_idx = np.where(matched_idx == idx_match)[0]
        duplicate_distances = d2d[duplicate_deep_idx]
        min_dist_idx = np.argsort(duplicate_distances)
        matched_idx[duplicate_deep_idx[min_dist_idx[1:]]] = -99

In [None]:
found_objects_idx = np.where(matched_idx > -1)[0]
unmatched_idx = np.where(matched_idx < -1)[0]

False positive fraction is 1 minus the fraction of matched deep catalog objects over total objects in the catalog.

$false\ positive\ fraction = 1.0 - \frac{matched\ deep\ objects}{total\ deep\ objects}$

In [None]:
false_positive_frac = 1. - len(found_objects_idx)/len(deep_coords)
print(false_positive_frac)

### Setup `lsst_verify`

We have a metric package for catalogs called `verify_catalogs`. In the json files we add our metrics and the design specs that are required for commissioning.

In [None]:
METRIC_PACKAGE = "verify_catalogs"
metrics = lsst.verify.MetricSet.load_metrics_package(METRIC_PACKAGE)
specs = lsst.verify.SpecificationSet.load_metrics_package(METRIC_PACKAGE)

In [None]:
metrics

In [None]:
specs

### Test against requirements

To show reports from `lsst_verify` we calculate the parameters we want to test and format them as `Measurement` objects with additional information saved as `Datum` objects so we can use to make diagnostic plots below.

In [None]:
falseDeepDetect_meas = lsst.verify.Measurement('coadd_detection.falseDeepDetect',
                                               false_positive_frac)
falseDeepDetect_meas.extras['matched_detection_mags'] = lsst.verify.Datum(deep_src_df.iloc[found_objects_idx]['mag'], unit=u.mag,
                                                                         label='Matched Detection Magnitudes',
                                                                         description='Magnitudes of rows in Deep catalog matched to Ultradeep objects')
falseDeepDetect_meas.extras['deep_mags'] = lsst.verify.Datum(deep_src_df['mag'], unit=u.mag,
                                                            label='Deep Catalog Mags',
                                                            description='Magnitudes of sources in deep catalog')
falseDeepDetect_meas.extras['matched_ra'] = lsst.verify.Datum(deep_coords.ra.deg[found_objects_idx], unit=u.deg,
                                                                        label='Matched RA',
                                                                        description='RA of matched deep catalog objects')
falseDeepDetect_meas.extras['matched_dec'] = lsst.verify.Datum(deep_coords.dec.deg[found_objects_idx], unit=u.deg,
                                                                        label='Matched Dec',
                                                                        description='Dec of matched deep catalog objects')
falseDeepDetect_meas.extras['unmatched_ra'] = lsst.verify.Datum(deep_coords.ra.deg[unmatched_idx], unit=u.deg,
                                                                        label='Unmatched RA',
                                                                        description='RA of unmatched deep catalog objects')
falseDeepDetect_meas.extras['unmatched_dec'] = lsst.verify.Datum(deep_coords.dec.deg[unmatched_idx], unit=u.deg,
                                                                        label='Unmatched Dec',
                                                                        description='Dec of unmatched deep catalog objects')

Once all values are calculated for metrics we add them to a `Job`.

In [None]:
job = lsst.verify.Job(metrics=metrics, specs=specs)
job.measurements.insert(falseDeepDetect_meas)

We add available metadata to the job. This metadata can be used to differentiate tests of the same metrics in Squash. Here we add the bandpass as metadata, but we could also add in information like the dataset we are using to test.

In [None]:
job.meta.update({'filter': '%s' % test_bandpass})

We can now run the job and print out a report.

In [None]:
job.report().show()

### Push job results to Squash

Here we push the results to the [Squash dashboard](chronograf-demo.lsst.codes/) so we can track measurements over time interactively.

First we point at the api. Currently we are pushing our results to the sandbox database.

In [None]:
squash_api_url = "https://squash-restful-api-sandbox.lsst.codes"

Enter credentials to get access. Only authenticated users can push to Squash.

In [None]:
#import getpass
#username = getpass.getuser()
#password = getpass.getpass(prompt='Password for user `{}`: '.format(username))

In [None]:
# Current hack to get CI working with notebooks and chronograf
# Uses a password in a read-only file readable only by the user
username = os.environ['USER']
with open(os.path.join(os.environ['HOME'], 'bk_abc.txt')) as f:
    password = f.readline()
password = password[:-1] # Remove new line character

In [None]:
import requests
credentials = {'username': username, 'password': password}

If this is your first time you can register as a new user by uncommenting the lines below.

In [None]:
# r = requests.post('{}/register'.format(squash_api_url), json=credentials)
# r.json()

In [None]:
r = requests.post('{}/auth'.format(squash_api_url), json=credentials)
r.json()

In [None]:
headers = {'Authorization': 'JWT {}'.format(r.json()['access_token'])}

The following two cells upload the metric definitions to Squash and are a one-time setup procedure.

In [None]:
r = requests.post('{}/metrics'.format(squash_api_url),
                json={'metrics': metrics.json},
                headers=headers)
r.json()

In [None]:
r = requests.post('{}/specs'.format(squash_api_url),
                json={'specs': specs.json},
                headers=headers)
r.json()

Here we add some more metadata that is required for Squash.

In [None]:
job.meta.update({'packages': {}})
job.meta.update({'env': {'env_name': 'jenkins'}})

Finally, we dispatch the results of the `Job` we ran to Squash and can view them on the Squash dashboards.

In [None]:
job.dispatch(api_user=username, api_password=password, api_url=squash_api_url)

### Compare the detections between the catalogs
We now compare the number of detections matched between the Ultra-Deep catalog
and the Deep catalog to the total number of detections in the Ultra-Deep catalog.

In [None]:
fig = plt.figure(figsize=(10, 8))
n, bins, _ = plt.hist(falseDeepDetect_meas.extras['matched_detection_mags'].quantity, 
                      histtype='step', range=mag_lims, bins=np.int(mag_lims[1] - mag_lims[0])*4,
                      label='Matched Detections to Ultra-Deep Catalog', lw=3)
n2, bins, _ = plt.hist(falseDeepDetect_meas.extras['deep_mags'].quantity, histtype='step', 
                       bins=bins, label='Total Deep Detections', lw=3)
plt.legend(loc=2)
plt.xlabel('Magnitude')
plt.ylabel('Number of detections')
plt.title('Number of Detections in Deep catalog')

In [None]:
bin_spacing = bins[1]-bins[0]

fig = plt.figure(figsize=(10, 8))
plt.plot(bins[:-1]+bin_spacing/2., 1.-n/n2, lw=3, marker='o', markersize=10)
plt.xlabel('Magnitude')
plt.ylabel('1.0 - (matched/total detections)')
false_positive_frac = 1 - np.sum(n)/np.sum(n2)
plt.title('False Positive Ratio by Magnitude. Overall False Pos. Fraction = %.2f' % false_positive_frac)

In [None]:
fig = plt.figure(figsize=(10, 8))
plt.scatter(falseDeepDetect_meas.extras['unmatched_ra'].quantity, 
            falseDeepDetect_meas.extras['unmatched_dec'].quantity)
plt.title('Unmatched objects by location')
plt.xlabel('RA')
plt.ylabel('Dec')