# LVV-T1278: Relative Astrometric Performance

**Written By: Bryce Kalmbach**

**Last updated: 08-06-2019**

**Tested on Stack Version: w_2019_29**

## Requirements:

[OSS-REQ-0388](https://docushare.lsst.org/docushare/dsweb/Get/LSE-030#page=66)

1. For all pairs of sources separated by ~5 arcminutes median error in these measurements is <= 10 milliarcseconds.

2. No more than 10% of the source pairs separated by ~5 arcminutes have separation errors greater than 20 milliarcseconds.

3. For all pairs of sources separated by ~20 arcminutes median error in these measurements is <= 10 milliarcseconds.

4. No more than 10% of the source pairs separated by ~20 arcminutes have separation errors greater than 20 milliarcseconds.

5. For all pairs of sources separated by ~200 arcminutes median error in these measurements is <= 15 milliarcseconds.

6. No more than 10% of the source pairs separated by ~200 arcminutes have separation errors greater than 30 milliarcseconds.

## Proposed Test Case:

1. Image an average field. Repeat at different airmasses.

2. Run source detection and astrometric measurement on images from step 1

3. Calculate the separations between all sources detected in step 2

4. Compare source separations from step 3. Calculate RMS for each pair across set of visits.

5. Examine distribution of source separation RMS from step 4 for all pairs of sources separated by ~ 5 arcminutes. Verify that the median in these measurements is <= 10 milliarcseconds

6. Verify that no more than 10% of the source pairs separated by ~ 5 arcminutes have separation RMS greater than 20 milliarcseconds

7. Examine distribution of source separation RMS from step 4 for all pairs of sources separated by ~ 20 arcminutes. Verify that the median in these measurements is <= 10 milliarcseconds

8. Verify that no more than 10 percent of source pairs separated by ~ 20 arcminutes have source separation RMS greater than 20 milliarcseconds

9. Examine distribution of source separation RMS from step 4 for all pairs separated by ~ 200 arcminutes. Verify that the median in these measurements is <= 15 milliarcseconds.

10. Verify that no more than 10 percent of sources separated by ~ 200 arcminutes have source separation RMS greater than 30 milliarcseconds.

### Import necessary tools

In [None]:
import os
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import h5py

In [None]:
from lsst.daf.persistence import Butler
import lsst.daf.persistence as daf_persistence
from lsst.afw.table import MultiMatch

from astropy.coordinates import SkyCoord, Angle
from astropy import units as u

from itertools import combinations

import lsst.verify

In [None]:
# Make our plots nice and readable
plt.rcParams.update({'font.size': 18})

### Set parameters for testing

* `test_bandpass`: The notebook will set up to test astrometry in this bandpass against Gaia

* `mag_lims`: Keep stars with magnitudes in between `[bright_limit, faint_limit]`

* `num_visits`: Number of visits to use

In [None]:
test_bandpass = 'HSC-R'

mag_lims = [17.5, 21.5]

num_visits = 2

### Identify HSC Data to use

We want to get data from a single visit for this requirement so we choose a visit from the HSC Wide dataset. https://hsc-release.mtk.nao.ac.jp/doc/index.php/database/ has info 
on which tracts are included in the Wide data. We have an hdf5 file with the visit data located at `/project/danielsf/valid_hsc_visit_extent.h5`. From this file we load in a visit to test.

In [None]:
# Load a butler for the HSC Wide data
depth = 'WIDE'
band = test_bandpass
butler = daf_persistence.Butler('/datasets/hsc/repo/rerun/DM-13666/%s/'%(depth))

In [None]:
f = h5py.File('/project/danielsf/valid_hsc_visit_extent.h5', 'r')

Load the hdf5 data into a pandas dataframe.

In [None]:
hsc_data_df = pd.DataFrame([])
for key in list(f.keys()):
    if key == 'filter':
        hsc_data_df[key] = np.array(f[key][()], dtype=str)
    else:
        hsc_data_df[key] = f[key][()]

In [None]:
hsc_data_df.head()

Select a random visit from the data observed in the desired test bandpass and find a visit which is centered near the same area so we have overlap for comparison.

In [None]:
unique_visits = np.unique(hsc_data_df.query('filter == "%s"' % test_bandpass)['visit'].values)
rand_state = np.random.RandomState(123)
test_visit_1 = rand_state.choice(unique_visits)
print(test_visit_1)

We will search for another visit that overlaps the test visit.

In [None]:
test_visit_df = hsc_data_df.query('visit == %i and ccd == %i' % (test_visit_1, 0))

In [None]:
ra_min = test_visit_df['ra_min'].iloc[0]
ra_max = test_visit_df['ra_max'].iloc[0]
dec_min = test_visit_df['dec_min'].iloc[0]
dec_max = test_visit_df['dec_max'].iloc[0]

In [None]:
overlap_visits_df = hsc_data_df.query(str('ra_center > %f and ra_center < %f ' +
                                          'and dec_center > %f and dec_center < %f') % (ra_min, ra_max,
                                                                                        dec_min, dec_max))

In [None]:
overlap_visits = overlap_visits_df.query('filter == "%s" and ccd == %i' % (test_bandpass, 0))['visit'].values

In [None]:
test_visit_2 = overlap_visits[overlap_visits != test_visit_1][0]
print(test_visit_1, test_visit_2)

Below we define methods to create a matched catalog for sources and create `objects` made up of individual sources detected in a single visit.

In [None]:
def get_matched_catalog(subset, visit_list):
    """
    Create a matched catalog from a subset with observations in the bandpasses listed.
    """

    matched_cat = None
    calexps = {}          

    for data_ref in subset:
        data_id = data_ref.dataId
        if data_id['visit'] not in visit_list:
            continue
        try:
            src_cat = data_ref.get('src')
        except:
            print('Error in Visit #%i, CCD #%i. Skipping it.' % (data_id['visit'], data_id['ccd']))
            continue
        calexps[data_id['visit']] = data_ref.get('calexp')
        if matched_cat is None:
            id_fmt = {'visit':np.int64}
            matched_cat = MultiMatch(src_cat.schema, id_fmt)
        matched_cat.add(src_cat, data_id)
        
    final_catalog = matched_cat.finish()
    
    return final_catalog, calexps

In [None]:
# Note that Gen 2 butler does not like numpy ints
subset = butler.subset('src', filter='HSC-R')

In [None]:
final_catalog, calexps = get_matched_catalog(subset, [test_visit_1, test_visit_2])

In [None]:
# Total number of HSC Sources
len(final_catalog)

In [None]:
# Only keep the columns we need going forward and convert to pandas dataframe
final_catalog = final_catalog.asAstropy()
final_catalog = final_catalog[['id', 'coord_ra', 'coord_dec', 'base_PsfFlux_instFlux', 'object', 'visit']]
final_catalog = final_catalog.to_pandas()

In [None]:
# Add in magnitude information for cuts
mag = []
for obj_row in final_catalog.values:
    calib = calexps[obj_row[-1]].getPhotoCalib()
    mag.append(calib.instFluxToMagnitude(obj_row[-3]))
final_catalog['mag'] = mag

In [None]:
final_catalog.head()

Here we make cuts based upon magnitude set at the beginning.

In [None]:
hsc_final_df = final_catalog.query('mag > %f and mag < %f' % (mag_lims[0], mag_lims[1]))

In [None]:
hsc_sources_coords = SkyCoord(hsc_final_df['coord_ra'].values*u.rad, hsc_final_df['coord_dec'].values*u.rad)

And now plot all the sources that we've kept.

In [None]:
fig = plt.figure(figsize=(8, 8))
plt.scatter(hsc_sources_coords.ra.deg, hsc_sources_coords.dec.deg, s=8, lw=0)
plt.xlabel('RA (deg)')
plt.ylabel('dec (deg)')
plt.title('HSC Sources in Matched Catalog')
ax = plt.gca()
ax.set_xticks(ax.get_xticks()[1::2]) # Clean up ticks in RA

### Find separations in all pairs of sources

To get source separations to compare we need to only keep the objects that appear in both visits.

In [None]:
# Make pairs of all objects with detections in both filters
# Faster to use numpy array than loop over pandas df
# Currently keeps only the objects present in all visits
unique, counts = np.unique(hsc_final_df['object'].values, return_counts=True)
in_all = unique[np.where(counts == num_visits)[0]]
num_unique_objects = len(in_all)
print("Number of Objects present in all visits: %i" % num_unique_objects)

In [None]:
keep_catalog = final_catalog[final_catalog['object'].isin(in_all)]

Make a catalog for each visit so we can compare the results.

In [None]:
keep_visit_1 = keep_catalog.query('visit == %i' % test_visit_1)
keep_visit_2 = keep_catalog.query('visit == %i' % test_visit_2)

In order to speed things up we randomly select `use_objects` number of objects to calculate the separations. It takes a long time to calculate the separations for *all* possible pairs of objects in the visit.

In [None]:
use_objects = 300
rand_state = np.random.RandomState(98)
pairs_list = list(combinations(rand_state.choice(np.arange(num_unique_objects), 
                                                 size=use_objects, replace=False),
                               2))

In [None]:
def calc_pairwise_separations(cat_ra, cat_dec, pairs_list, cat_units):
    
    """
    Calculate the separation between pairs of objects found in a catalog.
    
    Inputs
    ------
    cat_ra: list of floats
        The ra coordinates of the catalog objects in units given by cat_units
        
    cat_dec: list of floats
        The dec coordinates of the catalog objects in units given by cat_units
        
    pairs_list: list of len-2 lists
        The indices of pairs of catalogs objects for which to calculate the separations
        
    cat_units: astropy Unit
        The units of the coordinates
        
    Returns
    -------
    cat_seps: list of floats
        The separations of the pairs of objects defined in pairs_list given in arcsec
    """
    
    cat_1_locs = SkyCoord(cat_ra*cat_units, cat_dec*cat_units)
    cat_seps = []
    j = 0
    for pair_1, pair_2 in pairs_list:
        if j % 5000 == 0:
            print('Calculating Separation %i out of %i' % (j, len(pairs_list)))
        pair_seps = cat_1_locs[pair_1].separation(cat_1_locs[pair_2]).arcsec
        cat_seps.append(pair_seps)
        j += 1
        
    return cat_seps

In [None]:
print("Calculating Separations for Visit %i" % test_visit_1)
visit_1_seps = calc_pairwise_separations(keep_visit_1['coord_ra'], keep_visit_1['coord_dec'], pairs_list, u.rad)

In [None]:
print("Calculating Separations for Visit %i" % test_visit_2)
visit_2_seps = calc_pairwise_separations(keep_visit_2['coord_ra'], keep_visit_2['coord_dec'], pairs_list, u.rad)

In [None]:
visit_seps = np.array([visit_1_seps, visit_2_seps]).T
sep_df = pd.DataFrame(visit_seps, columns=['sep_visit_1', 'sep_visit_2'])

In [None]:
sep_df['diff'] = sep_df['sep_visit_1'] - sep_df['sep_visit_2']

### Setup `lsst_verify`

Following `verify_demo` [notebook](https://github.com/LSSTScienceCollaborations/StackClub/blob/master/Validation/verify_demo.ipynb) we create a metric package for astrometry and call it `verify_astrometry`. In the json files we add our metrics and the design specs that are required for commissioning.

In [None]:
METRIC_PACKAGE = "verify_astrometry"
metrics = lsst.verify.MetricSet.load_metrics_package(METRIC_PACKAGE)
specs = lsst.verify.SpecificationSet.load_metrics_package(METRIC_PACKAGE)

In [None]:
metrics

In [None]:
specs

### Test against requirements

To show reports from `lsst_verify` we calculate the parameters we want to test and format them as `Measurement` objects with additional information saved as `Datum` objects so we can use to make diagnostic plots below.

In [None]:
lims = 30 # Get 60 arcseconds on either side of defined separation
five_arcmin = 60*5 # five arcminutes in arcseconds
five_arcmin_df = sep_df.query('sep_visit_1 > %i-%i and sep_visit_1 < %i+%i' % (five_arcmin, lims,
                                                                               five_arcmin, lims))

In [None]:
median_diff_5_arcmin = np.median(np.abs(five_arcmin_df['diff']))*1000*u.mas
am1_meas = lsst.verify.Measurement('relative_astrometry.AM1', median_diff_5_arcmin)

am1_meas.extras['meas_errors'] = lsst.verify.Datum(np.abs(five_arcmin_df['diff']).values*1000*u.mas,
                                                   label='Differences in Measurement (mas)',
                                                   description='Differences in measurements of pairs of sources on 5-arcmin. scale')

In [None]:
outlier_frac_5_arcmin = len(np.where((np.abs(five_arcmin_df['diff'])*1000) > 20)[0]) / len(five_arcmin_df)
af1_meas = lsst.verify.Measurement('relative_astrometry.AF1', outlier_frac_5_arcmin)
af1_meas.extras['meas_errors'] = lsst.verify.Datum(np.abs(five_arcmin_df['diff']).values*1000*u.mas,
                                                   label='Differences in Measurement (mas)',
                                                   description='Differences in measurements of pairs of sources on 5-arcmin. scale')

In [None]:
lims = 30 # Get 60 arcseconds on either side of defined separation
twenty_arcmin = 60*20 # twenty arcminutes in arcseconds
twenty_arcmin_df = sep_df.query('sep_visit_1 > %i-%i and sep_visit_2 < %i+%i' % (twenty_arcmin, lims,
                                                                           twenty_arcmin, lims))

In [None]:
median_diff_20_arcmin = np.median(np.abs(twenty_arcmin_df['diff']))*1000*u.mas
am2_meas = lsst.verify.Measurement('relative_astrometry.AM2', median_diff_20_arcmin)

am2_meas.extras['meas_errors'] = lsst.verify.Datum(np.abs(twenty_arcmin_df['diff']).values*1000*u.mas,
                                                   label='Differences in Measurement (mas)',
                                                   description='Differences in measurements of pairs of sources on 20-arcmin. scale')

In [None]:
outlier_frac_20_arcmin = len(np.where((np.abs(twenty_arcmin_df['diff'])*1000) > 20)[0]) / len(twenty_arcmin_df)
af2_meas = lsst.verify.Measurement('relative_astrometry.AF2', outlier_frac_20_arcmin)
af2_meas.extras['meas_errors'] = lsst.verify.Datum(np.abs(twenty_arcmin_df['diff']).values*1000*u.mas,
                                                   label='Differences in Measurement (mas)',
                                                   description='Differences in measurements of pairs of sources on 20-arcmin. scale')

Once all values are calculated for metrics we add them to a `Job`.

In [None]:
job = lsst.verify.Job(metrics=metrics, specs=specs)
job.measurements.insert(am1_meas)
job.measurements.insert(am2_meas)
job.measurements.insert(af1_meas)
job.measurements.insert(af2_meas)

We add available metadata to the job. This metadata can be used to differentiate tests of the same metrics in Squash. Here we add the bandpass as metadata, but we could also add in information like the dataset we are using to test.

In [None]:
job.meta.update({'filter': '%s' % test_bandpass})

We can now run the job and print out a report.

In [None]:
job.report().show()

### Push job results to Squash

Here we push the results to the [Squash dashboard](chronograf-demo.lsst.codes/) so we can track measurements over time interactively.

First we point at the api. Currently we are pushing our results to the sandbox database.

In [None]:
squash_api_url = "https://squash-restful-api-sandbox.lsst.codes"

Enter credentials to get access. Only authenticated users can push to Squash.

In [None]:
#import getpass
#username = getpass.getuser()
#password = getpass.getpass(prompt='Password for user `{}`: '.format(username))

In [None]:
# Current hack to get CI working with notebooks and chronograf
# Uses a password in a read-only file readable only by the user
username = os.environ['USER']
with open(os.path.join(os.environ['HOME'], 'bk_abc.txt')) as f:
    password = f.readline()
password = password[:-1] # Remove new line character

In [None]:
import requests
credentials = {'username': username, 'password': password}

If this is your first time you can register as a new user by uncommenting the lines below.

In [None]:
# r = requests.post('{}/register'.format(squash_api_url), json=credentials)
# r.json()

In [None]:
r = requests.post('{}/auth'.format(squash_api_url), json=credentials)
r.json()

In [None]:
headers = {'Authorization': 'JWT {}'.format(r.json()['access_token'])}

The following two cells upload the metric definitions to Squash and are a one-time setup procedure.

In [None]:
r = requests.post('{}/metrics'.format(squash_api_url),
                json={'metrics': metrics.json},
                headers=headers)
r.json()

In [None]:
r = requests.post('{}/specs'.format(squash_api_url),
                json={'specs': specs.json},
                headers=headers)
r.json()

Here we add some more metadata that is required for Squash.

In [None]:
job.meta.update({'packages': {}})
job.meta.update({'env': {'env_name': 'jenkins'}})

Finally, we dispatch the results of the `Job` we ran to Squash and can view them on the Squash dashboards.

In [None]:
job.dispatch(api_user=username, api_password=password, api_url=squash_api_url)

### Plot results against requirements

Pick out pairs with Gaia Separations of 5, 20, 200 arcminutes and compare differences in measured separation to compare against requirements.

#### 5 arcmin tests

1. For all pairs of sources separated by ~5 arcminutes median error in these measurements is <= 10 milliarcseconds.

2. No more than 10% of the source pairs separated by ~5 arcminutes have separation errors greater than 20 milliarcseconds.

In [None]:
fig = plt.figure(figsize=(10, 8))
plt.hist(am1_meas.extras['meas_errors'].quantity, range=(0, 100), bins=20)
plt.axvline(am1_meas.quantity.value, 0, 1, 
            c='k', label='Median Difference: %.2f mas' % am1_meas.quantity.value, lw=4)
thresh = specs['relative_astrometry.AM1.design'].threshold.value
plt.axvline(thresh,
            0, 1, c='r', label='Requirement = %.1f milliarcsec' % thresh,
            lw=4)
plt.legend()
plt.xlabel('%s' % am1_meas.extras['meas_errors'].label)
plt.ylabel('Number of Pairs')
plt.title('%s' % am1_meas.extras['meas_errors'].description)

In [None]:
fig = plt.figure(figsize=(10, 8))
bins = np.linspace(0, 100, 21)
bins = np.append(bins, np.max(af1_meas.extras['meas_errors'].quantity.value)+0.01)
n, bins, _ = plt.hist(af1_meas.extras['meas_errors'].quantity.value, bins=bins,
                      cumulative=True, density=True)
plt.xlim((0, 100))
plt.axhline(1. - af1_meas.quantity.value, 0, 1, c='k', 
            label='Outlier Percentage = %.2f%s' % ((af1_meas.quantity.value)*100, '%'), 
            lw=4)
plt.axhline(0.9, 0, 1, c='r', ls='--', label='Requirement: Outlier Fraction (> 20mas) <= 10%', lw=4)
plt.legend(loc=4)
plt.xlabel('%s' % af1_meas.extras['meas_errors'].label)
plt.ylabel('Cumulative Fraction of Pairs')

#### 20 arcmin tests

3. For all pairs of sources separated by ~20 arcminutes median error in these measurements is <= 10 milliarcseconds.

4. No more than 10% of the source pairs separated by ~20 arcminutes have separation errors greater than 20 milliarcseconds.

In [None]:
fig = plt.figure(figsize=(10, 8))
plt.hist(am2_meas.extras['meas_errors'].quantity, range=(0, 100), bins=20)
plt.axvline(am2_meas.quantity.value, 0, 1, 
            c='k', label='Median Difference: %.2f mas' % am2_meas.quantity.value, lw=4)
thresh = specs['relative_astrometry.AM2.design'].threshold.value
plt.axvline(thresh,
            0, 1, c='r', label='Requirement = %.1f milliarcsec' % thresh,
            lw=4)
plt.legend()
plt.xlabel('%s' % am2_meas.extras['meas_errors'].label)
plt.ylabel('Number of Pairs')
plt.title('%s' % am2_meas.extras['meas_errors'].description)

In [None]:
fig = plt.figure(figsize=(10, 8))
bins = np.linspace(0, 100, 21)
bins = np.append(bins, np.max(af2_meas.extras['meas_errors'].quantity.value)+0.01)
n, bins, _ = plt.hist(af2_meas.extras['meas_errors'].quantity.value, bins=bins,
                      cumulative=True, density=True)
plt.xlim((0, 100))
plt.axhline(1. - af2_meas.quantity.value, 0, 1, c='k', 
            label='Outlier Percentage = %.2f%s' % ((af2_meas.quantity.value)*100, '%'), 
            lw=4)
plt.axhline(0.9, 0, 1, c='r', ls='--', label='Requirement: Outlier Fraction (> 20mas) <= 10%', lw=4)
plt.legend(loc=4)
plt.xlabel('%s' % af2_meas.extras['meas_errors'].label)
plt.ylabel('Cumulative Fraction of Pairs')

#### 200 arcmin tests

5. For all pairs of sources separated by ~200 arcminutes median error in these measurements is <= 15 milliarcseconds.

6. No more than 10% of the source pairs separated by ~200 arcminutes have separation errors greater than 30 milliarcseconds.

In [None]:
# lims = 30 # Get 30 arcseconds on either side of defined separation
# two_hundred_arcmin = 60*200 # 200 arcminutes in arcseconds
# two_hundred_arcmin_df = sep_df.query('sep_gaia > %i-%i and sep_gaia < %i+%i' % (two_hundred_arcmin, 
#                                                                                 lims,
#                                                                                 two_hundred_arcmin, 
#                                                                                 lims))

In [None]:
# fig = plt.figure(figsize=(10, 8))
# median_diff_200_arcmin = np.median(np.abs(two_hundred_arcmin_df['diff']))
# plt.hist(np.abs(two_hundred_arcmin_df['diff']), range=(0, 0.1), bins=20)
# plt.axvline(median_diff_200_arcmin, 0, 1, 
#             c='k', label='Median Difference: %.2f mas' % (median_diff_200_arcmin*1000), lw=4)
# plt.axvline(0.015, 0, 1, c='r', label='Requirement = 15 milliarcsec', lw=4)
# plt.legend()
# plt.xlabel('Difference in Measured Separation for sources separated by ~200 arcmin (arcsec)')
# plt.ylabel('Number of Pairs')

In [None]:
# fig = plt.figure(figsize=(10, 8))
# n, bins, _ = plt.hist(np.abs(two_hundred_arcmin_df['diff']), range=(0, 0.1), bins=20,
#                       cumulative=True, density=True)
# current_outlier_frac_200 = n[np.where(bins < 0.02)[0][-1]]
# plt.axhline(current_outlier_frac_200, 0, 1, c='k', 
#             label='Outlier Percentage = %.2f%s' % ((1.-current_outlier_frac_200)*100, '%'), 
#             lw=4)
# plt.axhline(0.9, 0, 1, c='r', ls='--', label='Requirement: Outlier Fraction (> 30mas) <= 10%', lw=4)
# plt.legend(loc=4)
# plt.xlabel('Difference in Measured Separation for sources separated by ~200 arcmin (arcsec)')
# plt.ylabel('Cumulative Fraction of Pairs')