# LVV-T960: Relative Astrometric Performance

**Written By: Bryce Kalmbach**

**Last updated: 07-22-2019**

**Tested on Stack Version: w_2019_28**

## Requirements:

[OSS-REQ-0388](https://docushare.lsst.org/docushare/dsweb/Get/LSE-030#page=68)

1. For all pairs of sources separated by ~5 arcminutes median error in these measurements is <= 10 milliarcseconds.

2. No more than 10% of the source pairs separated by ~5 arcminutes have separation errors greater than 20 milliarcseconds.

3. For all pairs of sources separated by ~20 arcminutes median error in these measurements is <= 10 milliarcseconds.

4. No more than 10% of the source pairs separated by ~20 arcminutes have separation errors greater than 20 milliarcseconds.

5. For all pairs of sources separated by ~200 arcminutes median error in these measurements is <= 15 milliarcseconds.

6. No more than 10% of the source pairs separated by ~200 arcminutes have separation errors greater than 30 milliarcseconds.

## Proposed Test Case:

1. Image a region that overlaps the Gaia footprint (we will use Gaia as astrometric truth).  Repeat at different airmasses.

2. Run source detection and astrometric measurement on images from step 1

3. Calculate the separation between all sources detected in step 2

4. Compare source separations from step 3 to the same source separations as measured by Gaia

5. Examine distribution of source separation errors from step 4 for all pairs of sources separated by ~5 arcminutes.  Verify that the median error in these measurements is <= 10 milliarcseconds

6. Verify that no more than 10% of the source pairs separated by ~5 arcminutes have separation errors greater than 20 milliarcseconds

7. Examine distribution of source separation errors from step 4 for all pairs of sources separated by ~20 arcminutes.  Verify that the median error in these measurements is <= 10 milliarcseconds

8. Verify that no more than 10 percent of source pairs separated by ~20 arcminutes have source separation errors greater than 20 milliarcseconds

9. Examine distribution of source separation errors from step 4 for all pairs separated by ~200 arcminutes.  Verify that the median error in these measurements is <= 15 milliarcseconds.

10. Verify that no more than 10 percent of sources separated by ~200 arcminutes have source separation errors greater than 30 milliarcseconds.

### Import necessary tools

In [None]:
import os
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
from lsst.daf.persistence import Butler
import lsst.daf.persistence as daf_persistence

from astropy.coordinates import SkyCoord
from astropy import units as u

from itertools import combinations

In [None]:
# Make our plots nice and readable
plt.rcParams.update({'font.size': 18})

### Set parameters for testing

* `test_bandpass`: The notebook will set up to test astrometry in this bandpass against 'r'

* `faint_mag_lim`: If set to `None`, the notebook will calculate separations for every pair of objects that are present in all visits. This can take a long time or perhaps we want to see how astrometry changes as a function of magnitude. Therefore, we can set this to only keep sources with a magnitude brighter than this limit in the `test_bandpass`.

In [None]:
test_bandpass = 'HSC-R'

faint_mag_lim = None

### Identify HSC Data to use

We want to get data from a single visit for this requirement so we choose a visit from the HSC Wide dataset. https://hsc-release.mtk.nao.ac.jp/doc/index.php/database/ has info 
on which tracts are included in the Wide data. We randomly choose tract 9348 for testing. To choose a different band for testing change `band` below.

In [None]:
# Load a butler for the HSC Wide data
depth = 'WIDE'
band = test_bandpass
butler = daf_persistence.Butler('/datasets/hsc/repo/rerun/DM-13666/%s/'%(depth))

In [None]:
# Find a visit in the WIDE data for specified band in the tract 9348
warp_list = os.listdir('/datasets/hsc/repo/rerun/DM-13666/WIDE/deepCoadd/%s/9348/0,0' % band)
warp_list.sort()
visit = int(warp_list[0].split('-')[-1].split('.')[0])

In [None]:
subset = butler.subset('src', filter=band, visit=visit)

In [None]:
# Load in sources from visit making exceptions for bad ccd 9 and focusing ccds.
hsc_sources_df = None
ccd_lims = []
for dataId in subset.cache:
    if dataId['ccd'] % 10 == 0:
        print('On CCD #%i' % dataId['ccd'])
    try:
        src_cat = butler.get('src', dataId=dataId)
        calexp = butler.get('calexp', dataId=dataId)
        calib = calexp.getPhotoCalib()
        src_cat_df = src_cat.asAstropy()
        src_cat_df = src_cat_df[['id', 'coord_ra', 'coord_dec',
                                 'base_PsfFlux_instFlux']]
        src_cat_df = src_cat_df.to_pandas()
        mag = []
        for src_flux in src_cat_df['base_PsfFlux_instFlux'].values:
            mag.append(calib.instFluxToMagnitude(src_flux))
        src_cat_df['mag'] = mag
        if hsc_sources_df is None:
            hsc_sources_df = pd.DataFrame([], columns=src_cat_df.columns)
            hsc_sources_df = hsc_sources_df.append(src_cat_df, sort=False)
        else:
            hsc_sources_df = hsc_sources_df.append(src_cat_df, sort=False)
        ccd_lims.append([np.degrees(np.max(src_cat_df['coord_ra'])),
                         np.degrees(np.min(src_cat_df['coord_ra'])),
                         np.degrees(np.max(src_cat_df['coord_dec'])),
                         np.degrees(np.min(src_cat_df['coord_dec']))])
    except daf_persistence.butlerExceptions.NoResults as inst:
        print('No results for CCD #%i' % dataId['ccd'])

In [None]:
# Total number of HSC Sources
len(hsc_sources_df)

In [None]:
hsc_sources_df.head()

In [None]:
if faint_mag_lim is None:
    hsc_final_df = hsc_sources_df
else:
    hsc_final_df = hsc_sources_df.query('mag < %f' % faint_mag_lim)

In [None]:
hsc_sources_coords = SkyCoord(hsc_final_df['coord_ra']*u.rad, hsc_final_df['coord_dec']*u.rad)

In [None]:
fig = plt.figure(figsize=(8, 8))
plt.scatter(hsc_sources_coords.ra.deg, hsc_sources_coords.dec.deg, s=8, lw=0)
plt.xlabel('RA (deg)')
plt.ylabel('dec (deg)')
plt.title('HSC Sources in Visit %i' %visit)
ax = plt.gca()
ax.set_xticks(ax.get_xticks()[1::2]) # Clean up ticks in RA

### Load Gaia data

We have previously created a pandas dataframe with Gaia data that overlaps the HSC Wide data footprint. Here we load it in and select the data in the region of the visit.

In [None]:
# Load in cached gaia data
gaia_df = pd.read_pickle('/project/danielsf/gaia_hsc_overlap_pandas.pickle')

In [None]:
gaia_df.head()

In [None]:
# Only select data that falls in the bounds of the HSC CCDs
gaia_visit_df = pd.DataFrame([], columns=gaia_df.columns)
for ccd_corners in ccd_lims:
    gaia_visit_df = gaia_visit_df.append(gaia_df.query('ra < %f and ra > %f and dec < %f and dec > %f' % (ccd_corners[0], ccd_corners[1],
                                                                                                          ccd_corners[2], ccd_corners[3])))

In [None]:
# Pick low proper motion sources (gaia proper motions are in mas/year)
low_pm = np.where((gaia_visit_df['pmra'] > -5) & (gaia_visit_df['pmra'] < 5) &
                  (gaia_visit_df['pmdec'] > -5) & (gaia_visit_df['pmdec'] < 5))[0]
gaia_visit_df = gaia_visit_df.iloc[low_pm]

In [None]:
gaia_coords = SkyCoord(gaia_visit_df['ra']*u.deg, gaia_visit_df['dec']*u.deg)

In [None]:
fig = plt.figure(figsize=(8, 8))
plt.scatter(gaia_coords.ra.deg, gaia_coords.dec.deg, s=8, lw=0)
plt.xlabel('RA (deg)')
plt.ylabel('dec (deg)')
plt.title('Gaia Sources in Visit %i' %visit)
ax = plt.gca()
ax.set_xticks(ax.get_xticks()[1::2]) # Clean up ticks in RA

### Use astropy to match each filter to r-band

We will use the `match_to_catalog_sky` method from astropy to do the catalog match.

In [None]:
matched_idx, sep2d, sep3d = gaia_coords.match_to_catalog_sky(hsc_sources_coords)

In [None]:
fig = plt.figure(figsize=(10,8))
plt.scatter(gaia_coords.ra.deg, gaia_coords.dec.deg, c=sep2d.arcsec*1000, s=20, vmax=50)
cb = plt.colorbar()
plt.xlabel('RA (deg)')
plt.ylabel('dec (deg)')
cb.set_label('Distance to match (milliarcsec)')
ax = plt.gca()
ax.set_xticks(ax.get_xticks()[1::2]) # Clean up ticks in RA

### Make matched catalog

In [None]:
# Use hsc_sources_coords to get everything in degrees
matched_list = []
for keep_idx, gaia_idx in zip(matched_idx, np.arange(len(gaia_coords))):
    gaia_row = gaia_visit_df.iloc[gaia_idx]
    hsc_row = hsc_final_df.iloc[keep_idx]
    matched_list.append([hsc_row['id'], hsc_sources_coords[keep_idx].ra.deg,
                         hsc_sources_coords[keep_idx].dec.deg,
                         gaia_row['source_id'], gaia_row['ra'], gaia_row['dec']])
matched_df = pd.DataFrame(matched_list, columns=['HSC_id', 'HSC_ra', 'HSC_dec', 
                                       'gaia_id', 'gaia_ra', 'gaia_dec'])

### Find separations in all pairs of sources

The first thing we do is keep only the objects that appear in all visits so that we will have the best information available for the objects we use to test.

In [None]:
num_unique_objects = len(matched_df)
print("Number of Objects present in all visits: %i" % num_unique_objects)

In [None]:
use_objects = 200
rand_state = np.random.RandomState(98)
pairs_list = list(combinations(rand_state.choice(np.arange(num_unique_objects), 
                                                 size=use_objects, replace=False),
                               2))

In [None]:
def calc_separations(catalog, pairs_list):
    cat_seps = np.empty((len(pairs_list), 2))
    hsc_locs = SkyCoord(catalog['HSC_ra']*u.deg, catalog['HSC_dec']*u.deg)
    gaia_locs = SkyCoord(catalog['gaia_ra']*u.deg, catalog['gaia_dec']*u.deg)
    seps = []
    j = 0
    for pair_1, pair_2 in pairs_list:
        if j % 5000 == 0:
            print('Calculating Separation %i out of %i' % (j, len(pairs_list)))
        pair_seps = []
        pair_seps.append(hsc_locs[pair_1].separation(hsc_locs[pair_2]).arcsec)
        pair_seps.append(gaia_locs[pair_1].separation(gaia_locs[pair_2]).arcsec)
        seps.append(pair_seps)
        j += 1
    cat_seps[:] = seps
        
    return cat_seps

In [None]:
visit_seps = calc_separations(matched_df, pairs_list)

In [None]:
sep_df = pd.DataFrame(visit_seps, columns=['sep_hsc', 'sep_gaia'])

In [None]:
sep_df['diff'] = sep_df['sep_gaia'] - sep_df['sep_hsc']

### Plot results against requirements

Now break into Gaia Separations of 5, 20, 200 arcminutes

#### 5 arcmin tests

In [None]:
lims = 30 # Get 30 arcseconds on either side of defined separation
five_arcmin = 60*5 # five arcminutes in arcseconds
five_arcmin_df = sep_df.query('sep_gaia > %i-%i and sep_gaia < %i+%i' % (five_arcmin, lims,
                                                                         five_arcmin, lims))

In [None]:
fig = plt.figure(figsize=(10, 8))
median_diff_5_arcmin = np.median(np.abs(five_arcmin_df['diff']))
plt.hist(np.abs(five_arcmin_df['diff']), range=(0, 0.1))
plt.axvline(median_diff_5_arcmin, 0, 1, 
            c='k', label='Median Difference: %.2f (mas)' % (median_diff_5_arcmin*1000), lw=4)
plt.axvline(0.01, 0, 1, c='r', label='Requirement = 10 milliarcsec', lw=4)
plt.legend()
plt.xlabel('Difference in Measured Separation for sources separated by ~5 arcmin (arcsec)')
plt.ylabel('Number of Pairs')

In [None]:
fig = plt.figure(figsize=(10, 8))
n, bins, _ = plt.hist(np.abs(five_arcmin_df['diff']), range=(0, 0.1),
                      cumulative=True, density=True)
current_outlier_frac_5 = n[np.where(bins < 0.02)[0][-1]]
plt.axhline(current_outlier_frac_5, 0, 1, c='k', 
            label='Outlier Percentage = %.2f%s' % ((1.-current_outlier_frac_5)*100, '%'), 
            lw=4)
plt.axhline(0.9, 0, 1, c='r', ls='--', label='90th percentile', lw=4)
plt.axvline(0.020, 0, 1, c='r', label='Requirement: Outlier Fraction (> 20mas) <= 10%', lw=4)
plt.legend(loc=4)
plt.xlabel('Difference in Measured Separation for sources separated by ~5 arcmin (arcsec)')
plt.ylabel('Cumulative Fraction of Pairs')

#### 20 arcmin tests

In [None]:
lims = 30 # Get 30 arcseconds on either side of defined separation
twenty_arcmin = 60*20 # 20 arcminutes in arcseconds
twenty_arcmin_df = sep_df.query('sep_gaia > %i-%i and sep_gaia < %i+%i' % (twenty_arcmin, lims,
                                                                           twenty_arcmin, lims))

In [None]:
fig = plt.figure(figsize=(10, 8))
median_diff_20_arcmin = np.median(np.abs(twenty_arcmin_df['diff']))
plt.hist(np.abs(twenty_arcmin_df['diff']), range=(0, 0.1))
plt.axvline(median_diff_20_arcmin, 0, 1, 
            c='k', label='Median Difference: %.2f (mas)' % (median_diff_20_arcmin*1000), lw=4)
plt.axvline(0.01, 0, 1, c='r', label='Requirement = 10 milliarcsec', lw=4)
plt.legend()
plt.xlabel('Difference in Measured Separation for sources separated by ~20 arcmin (arcsec)')
plt.ylabel('Number of Pairs')

In [None]:
fig = plt.figure(figsize=(10, 8))
n, bins, _ = plt.hist(np.abs(twenty_arcmin_df['diff']), range=(0, 0.1),
                      cumulative=True, density=True)
current_outlier_frac_20 = n[np.where(bins < 0.02)[0][-1]]
plt.axhline(current_outlier_frac_20, 0, 1, c='k', 
            label='Outlier Percentage = %.2f%s' % ((1.-current_outlier_frac_20)*100, '%'), 
            lw=4)
plt.axhline(0.9, 0, 1, c='r', ls='--', label='90th percentile', lw=4)
plt.axvline(0.020, 0, 1, c='r', label='Requirement: Outlier Fraction (> 20mas) <= 10%', lw=4)
plt.legend(loc=4)
plt.xlabel('Difference in Measured Separation for sources separated by ~20 arcmin (arcsec)')
plt.ylabel('Cumulative Fraction of Pairs')

#### 200 arcmin tests

In [None]:
lims = 30 # Get 30 arcseconds on either side of defined separation
two_hundred_arcmin = 60*200 # 200 arcminutes in arcseconds
two_hundred_arcmin_df = sep_df.query('sep_gaia > %i-%i and sep_gaia < %i+%i' % (two_hundred_arcmin, 
                                                                                lims,
                                                                                two_hundred_arcmin, 
                                                                                lims))

In [None]:
fig = plt.figure(figsize=(10, 8))
median_diff_200_arcmin = np.median(np.abs(two_hundred_arcmin_df['diff']))
plt.hist(np.abs(two_hundred_arcmin_df['diff']), range=(0, 0.1))
plt.axvline(median_diff_200_arcmin, 0, 1, 
            c='k', label='Median Difference: %.2f (mas)' % (median_diff_200_arcmin*1000), lw=4)
plt.axvline(0.015, 0, 1, c='r', label='Requirement = 15 milliarcsec', lw=4)
plt.legend()
plt.xlabel('Difference in Measured Separation for sources separated by ~200 arcmin (arcsec)')
plt.ylabel('Number of Pairs')

In [None]:
fig = plt.figure(figsize=(10, 8))
n, bins, _ = plt.hist(np.abs(two_hundred_arcmin_df['diff']), range=(0, 0.1),
                      cumulative=True, density=True)
current_outlier_frac_200 = n[np.where(bins < 0.02)[0][-1]]
plt.axhline(current_outlier_frac_200, 0, 1, c='k', 
            label='Outlier Percentage = %.2f%s' % ((1.-current_outlier_frac_200)*100, '%'), 
            lw=4)
plt.axhline(0.9, 0, 1, c='r', ls='--', label='90th percentile', lw=4)
plt.axvline(0.030, 0, 1, c='r', label='Requirement: Outlier Fraction (> 30mas) <= 10%', lw=4)
plt.legend(loc=4)
plt.xlabel('Difference in Measured Separation for sources separated by ~200 arcmin (arcsec)')
plt.ylabel('Cumulative Fraction of Pairs')

### Test against requirements

In [None]:
class RequirementFailure(ValueError):
    "Requirement not met."

In [None]:
# Set up for potential error messages
error_msg = ""
error_present = False
error_val = 0

In [None]:
# Test separation differences on ~5 arcmin scale
if median_diff_5_arcmin*1000. > 10.:
    error_present = True
    error_val += 1
    error_msg += str('Error #%i: \n' % error_val +
                     'Failure differences in separations on ~5 arcmin scale greater than 10 milliarcsec. ' + 
                     'Test Value = %.2f mas. \n' % (median_diff_5_arcmin*1000.))

In [None]:
# Test Outlier Fraction of separation differences on ~5 arcmin scale
if (1.-current_outlier_frac_5)*100 > 10.:
    error_present = True
    error_val += 1
    error_msg += str('Error #%i: \n' % error_val + 
                     'Separation Difference Outlier Fraction on ~5 arcmin scales ' + 
                     '(differences in pair separations > 20 mas) ' +
                     'is greater than 10%s. Test Value = %.2f%s \n' % ('%', 
                                                                    (1.-current_outlier_frac_5)*100, 
                                                                    '%'))

In [None]:
# Test separation differences on ~20 arcmin scale
if median_diff_20_arcmin*1000. > 10.:
    error_present = True
    error_val += 1
    error_msg += str('Error #%i: \n' % error_val +
                     'Failure differences in separations on ~20 arcmin scale greater than 10 milliarcsec. ' + 
                     'Test Value = %.2f mas. \n' % (median_diff_20_arcmin*1000.))

In [None]:
# Test Outlier Fraction of separation differences on ~20 arcmin scale
if (1.-current_outlier_frac_20)*100 > 10.:
    error_present = True
    error_val += 1
    error_msg += str('Error #%i: \n' % error_val + 
                     'Separation Difference Outlier Fraction on ~20 arcmin scales ' + 
                     '(differences in pair separations > 20 mas) ' +
                     'is greater than 10%s. Test Value = %.2f%s \n' % ('%', 
                                                                    (1.-current_outlier_frac_20)*100, 
                                                                    '%'))

In [None]:
# Test separation differences on ~200 arcmin scale
if median_diff_200_arcmin*1000. > 10.:
    error_present = True
    error_val += 1
    error_msg += str('Error #%i: \n' % error_val +
                     'Failure differences in separations on ~200 arcmin scale greater than 15 milliarcsec. ' + 
                     'Test Value = %.2f mas. \n' % (median_diff_200_arcmin*1000.))
elif np.isnan(median_diff_200_arcmin):
    error_present = True
    error_val += 1
    error_msg += str('Error #%i: \n' % error_val +
                     'Cannot calculate median difference in separations for objects '+
                     'with 200 arcmin separation. No objects with 200 arcmin separation. Test value == nan. \n')

In [None]:
# Test Outlier Fraction of separation differences on ~200 arcmin scale
if (1.-current_outlier_frac_200)*100 > 10.:
    error_present = True
    error_val += 1
    error_msg += str('Error #%i: \n' % error_val + 
                     'Separation Difference Outlier Fraction on ~200 arcmin scales ' + 
                     '(differences in pair separations > 30 mas) ' +
                     'is greater than 10%s. Test Value = %.2f%s \n' % ('%', 
                                                                    (1.-current_outlier_frac_200)*100, 
                                                                    '%'))
elif np.isnan(current_outlier_frac_200):
    error_present = True
    error_val += 1
    error_msg += str('Error #%i: \n' % error_val +
                     'Cannot calculate Separation Difference Outlier Fraction for objects '+
                     'with 200 arcmin separation. No objects with 200 arcmin separation. Test value == nan. \n')

In [None]:
if error_present is True:
    error_msg = str('%i Total Errors: \n' % error_val + error_msg)
    raise RequirementFailure(error_msg)