# Rubin (DP0) & Roman (Troxel+23) cross-match

Author: Melissa Graham

Date: Mon Oct 28 2024

RSP Image: Weekly 2024_37

Goal: Cross-match DP0.2 Objects with the Roman simulation from Troxel et al. (2023).

## Introduction

The same simulation, DESC's Data Challenge 2 (DC2), is the basis for both
the simulated data products of Rubin's Data Preview 0, and the simulated
data for Roman Observatory presented in Troxel et al. (2023).

Thus, it is possible to cross-match the catalogs and obtain infrared
photometry for DP0 Objects.

Roman DC2 Simulated Images and Catalogs at IRSA IPAC:<br>
https://irsa.ipac.caltech.edu/data/theory/Roman/Troxel2023/overview.html

Troxel et al. (2023):<br>
https://academic.oup.com/mnras/article/522/2/2801/7076879?login=false

## Set up

In [None]:
import os
from astropy.io import fits
from astropy.coordinates import SkyCoord
from astropy.coordinates import match_coordinates_sky
import astropy.units as u
import numpy as np
import matplotlib.pyplot as plt
from lsst.rsp import get_tap_service, retrieve_query

Start Rubin TAP service for DP0.2 catalog access.

In [None]:
service = get_tap_service("tap")

Use colorblind-friendly colors for the LSST filters.

In [None]:
lsst_filt_clrs = {'u': '#0c71ff', 'g': '#49be61', 'r': '#c61c00',
                  'i': '#ffc200', 'z': '#f341a2', 'y': '#5d0000'}

Define colors to use for the Roman filters.

In [None]:
roman_filt_clrs = {'y': 'limegreen', 'j': 'magenta',
                   'h': 'cyan', 'f': 'grey'}

## Simulated Roman data

The simulated Roman data presented in Troxel et al. (2023) is available as FITS table files.

For example: <br>
https://irsa.ipac.caltech.edu/data/theory/Roman/Troxel2023/detection/dc2_det_50.93_-38.8.fits.gz

Files can be downloaded and unzipped using `wget`, for example:

```
filename = 'dc2_det_50.93_-38.8.fits'
path_and_filename = 'https://irsa.ipac.caltech.edu/data/theory/Roman/Troxel2023/detection/'+filename+'.gz'
os.system('wget '+path_and_filename)
os.system('gunzip '+filename+'.gz')
```

### Pick a Troxel file and read it

A bunch of unzipped FITS files containing detections in the simulated Roman data from Troxel et al. (2023) 
have already been downloaded and are available in the RSP's `/project/` directory.

The RA, Dec coordinates are in the file name.
Get a list of all available files and plot the coordinates.

In [None]:
os.system('ls /project/melissagraham2/troxel2023/*fits > fitslist.txt')
with open('fitslist.txt') as file:
    lines = [line.rstrip() for line in file]
print('Number of files available: ', len(lines))

In [None]:
t1 = []
t2 = []
for line in lines:
    fx = line.find('det_')
    t1.append(line[fx+4:fx+9])
    t2.append(line[fx+10:fx+15])
    del fx
allra = np.asarray(t1, dtype='float')
allde = np.asarray(t2, dtype='float')
del t1, t2, lines

In [None]:
fig = plt.figure(figsize=(6, 4))
plt.plot(allra, allde, 'o', ms=4)
plt.xlabel('RA')
plt.ylabel('Dec')
plt.title('Already-downloaded Troxel FITS file centers')
plt.show()

Pick one file to work with in this demo.

In [None]:
fnm = '/project/melissagraham2/troxel2023/dc2_det_52.21_-40.3.fits'
hdul = fits.open(fnm)
data = hdul[1].data

Print the column names.

In [None]:
data.columns

Store the data in `numpy` arrays.

In [None]:
roman_ra = np.asarray(data['alphawin_j2000'], dtype='float')
roman_dec = np.asarray(data['deltawin_j2000'], dtype='float')
roman_y = np.asarray(data['mag_auto_Y106'], dtype='float')
roman_j = np.asarray(data['mag_auto_J129'], dtype='float')
roman_h = np.asarray(data['mag_auto_H158'], dtype='float')
roman_f = np.asarray(data['mag_auto_F184'], dtype='float')
roman_ye = np.asarray(data['magerr_auto_Y106'], dtype='float')
roman_je = np.asarray(data['magerr_auto_J129'], dtype='float')
roman_he = np.asarray(data['magerr_auto_H158'], dtype='float')
roman_fe = np.asarray(data['magerr_auto_F184'], dtype='float')

Clean up.

In [None]:
del fnm, hdul, data

## Simulated Rubin data

### Get DP0 objects

In [None]:
sra = str(np.round(np.mean(roman_ra), 3))
sde = str(np.round(np.mean(roman_dec), 3))
query = "SELECT coord_ra, coord_dec, "\
        "scisql_nanojanskyToAbMag(u_cModelFlux) AS umag, "\
        "scisql_nanojanskyToAbMag(g_cModelFlux) AS gmag, "\
        "scisql_nanojanskyToAbMag(r_cModelFlux) AS rmag, "\
        "scisql_nanojanskyToAbMag(i_cModelFlux) AS imag, "\
        "scisql_nanojanskyToAbMag(z_cModelFlux) AS zmag, "\
        "scisql_nanojanskyToAbMag(y_cModelFlux) AS ymag, "\
        "scisql_nanojanskyToAbMagSigma(u_cModelFlux, u_cModelFluxErr) AS umage, "\
        "scisql_nanojanskyToAbMagSigma(g_cModelFlux, g_cModelFluxErr) AS gmage, "\
        "scisql_nanojanskyToAbMagSigma(r_cModelFlux, r_cModelFluxErr) AS rmage, "\
        "scisql_nanojanskyToAbMagSigma(i_cModelFlux, i_cModelFluxErr) AS image, "\
        "scisql_nanojanskyToAbMagSigma(z_cModelFlux, z_cModelFluxErr) AS zmage, "\
        "scisql_nanojanskyToAbMagSigma(y_cModelFlux, y_cModelFluxErr) AS ymage "\
        "FROM dp02_dc2_catalogs.Object "\
        "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "\
        "CIRCLE('ICRS', "+sra+", "+sde+", 0.08)) = 1 "\
        "AND detect_isPrimary = 1"
print(query)
del sra, sde

In [None]:
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)

In [None]:
results = job.fetch_result().to_table()
print('Number of DP0.2 objects: ', len(results))

In [None]:
rubin_ra = np.asarray(results['coord_ra'], dtype='float')
rubin_dec = np.asarray(results['coord_dec'], dtype='float')
rubin_u = np.asarray(results['umag'], dtype='float')
rubin_g = np.asarray(results['gmag'], dtype='float')
rubin_r = np.asarray(results['rmag'], dtype='float')
rubin_i = np.asarray(results['imag'], dtype='float')
rubin_z = np.asarray(results['zmag'], dtype='float')
rubin_y = np.asarray(results['ymag'], dtype='float')
rubin_ue = np.asarray(results['umage'], dtype='float')
rubin_ge = np.asarray(results['gmage'], dtype='float')
rubin_re = np.asarray(results['rmage'], dtype='float')
rubin_ie = np.asarray(results['image'], dtype='float')
rubin_ze = np.asarray(results['zmage'], dtype='float')
rubin_ye = np.asarray(results['ymage'], dtype='float')

Clean up.

In [None]:
del query, job, results

## Visualize the data

### Photometry

In [None]:
fig = plt.figure(figsize=(6, 4))
plt.hist(rubin_u, bins=50, histtype='step', color=lsst_filt_clrs['u'], label='u')
plt.hist(rubin_g, bins=50, histtype='step', color=lsst_filt_clrs['g'], label='g')
plt.hist(rubin_r, bins=50, histtype='step', color=lsst_filt_clrs['r'], label='r')
plt.hist(rubin_i, bins=50, histtype='step', color=lsst_filt_clrs['i'], label='i')
plt.hist(rubin_z, bins=50, histtype='step', color=lsst_filt_clrs['z'], label='z')
plt.hist(rubin_y, bins=50, histtype='step', color=lsst_filt_clrs['y'], label='y')
plt.xlim([18, 30])
plt.xlabel('Apparent Magnitude')
plt.ylabel('Number of Objects')
plt.legend(loc='upper left')
plt.title('Rubin DP0 Objects')
plt.show()

In [None]:
fig = plt.figure(figsize=(6, 4))
yx = np.where(roman_y < 40)[0]
jx = np.where(roman_j < 40)[0]
hx = np.where(roman_h < 40)[0]
fx = np.where(roman_f < 40)[0]
plt.hist(roman_y[yx], bins=50, histtype='step', color=roman_filt_clrs['y'], label='Y106')
plt.hist(roman_j[jx], bins=50, histtype='step', color=roman_filt_clrs['j'], label='J129')
plt.hist(roman_h[hx], bins=50, histtype='step', color=roman_filt_clrs['h'], label='H158')
plt.hist(roman_f[fx], bins=50, histtype='step', color=roman_filt_clrs['f'], label='F184')
del yx, jx, hx, fx
plt.xlim([18, 30])
plt.xlabel('Apparent Magnitude')
plt.ylabel('Number of Objects')
plt.legend(loc='upper left')
plt.title('Troxel DC2 Objects')
plt.show()

In [None]:
fig = plt.figure(figsize=(6, 4))
plt.hist(rubin_y, bins=50, histtype='step', color=lsst_filt_clrs['y'], label='LSST y')
yx = np.where(roman_y < 40)[0]
plt.hist(roman_y[yx], bins=50, histtype='step', color=roman_filt_clrs['y'], label='Roman Y106')
del yx
plt.xlim([18, 30])
plt.xlabel('Apparent Magnitude')
plt.ylabel('Number of Objects')
plt.legend(loc='upper left')
plt.show()

In [None]:
fig = plt.figure(figsize=(6, 4))
plt.plot(rubin_y, rubin_ye, 'o', ms=2, mew=0, alpha=0.2, color=lsst_filt_clrs['y'])
yx = np.where(roman_y < 30)[0]
plt.plot(roman_y[yx], roman_ye[yx], 'o', ms=2, mew=0, alpha=0.2, color=roman_filt_clrs['y'])
del yx
plt.xlim([20, 30])
plt.ylim([0.0, 2])
plt.xlabel('Apparent Magnitude')
plt.ylabel('Magnitude Error')
plt.plot(0, 0, 'o', ms=4, mew=0, color=lsst_filt_clrs['y'], label='Rubin y')
plt.plot(0, 0, 'o', ms=4, mew=0, color=roman_filt_clrs['y'], label='Roman Y106')
plt.legend(loc='upper left')
plt.show()

### Coordinates

In [None]:
fig = plt.figure(figsize=(6, 6))
plt.plot(rubin_ra, rubin_dec, 'o', ms=3, mew=0, alpha=0.3, color='darkorange', label='Rubin')
plt.plot(roman_ra, roman_dec, 'o', ms=3, mew=0, alpha=0.3, color='dodgerblue', label='Roman')
plt.xlabel('RA')
plt.ylabel('Dec')
plt.legend(loc='upper left')
plt.show()

Take a closer look to make sure the coordinates align.

Show the size of one arcsecond.

In [None]:
fig = plt.figure(figsize=(4, 4))

ru_x = np.where(rubin_y < 25)[0]
ro_x = np.where(roman_y < 25)[0]
plt.plot(rubin_ra[ru_x], rubin_dec[ru_x], 'o', ms=5, mew=0, alpha=0.3, color='darkorange', label='Rubin ('+str(len(ru_x))+')')
plt.plot(roman_ra[ro_x], roman_dec[ro_x], 'o', ms=3, mew=0, alpha=0.8, color='dodgerblue', label='Roman ('+str(len(ro_x))+')')
del ru_x, ro_x

tra = 52.2225
tde = -40.3325
ts = 0.5/3600.0
plt.plot([tra-ts, tra+ts], [tde, tde], lw=1, ls='solid', color='black')
plt.plot([tra, tra], [tde-ts, tde+ts], lw=1, ls='solid', color='black')
plt.text(tra + 2.0*ts, tde, '1"')
del tra, tde, ts

cra = 52.215
cde = -40.325
cw = 0.01
plt.xlim([cra-cw, cra+cw])
plt.ylim([cde-cw, cde+cw])
del cra, cde, cw

plt.xlabel('RA')
plt.ylabel('Dec')
plt.title('Zoom in, bright objects only')
plt.legend(loc='upper left')
plt.show()

Yes, it appears object coordinates match to within an arcsecond.

## Cross match

### Down-select to bright objects

As the Roman surveys will be deeper, there are many more objects detected.

As this is simply a demo of how to cross-match, restrict the tables to objects brighter than 25th magnitude in the y-band.

In reality, such restrictions would be driven by scientific use case.

In [None]:
print('Number of objects:')
print('Rubin: ', len(rubin_y))
print('Roman: ', len(roman_y))
print(' ')

print('Number of objects with y <= 25 mag:')
ru_x = np.where(rubin_y <= 25.0)[0]
ro_x = np.where(roman_y <= 25.0)[0]
print('Rubin: ', len(ru_x))
print('Roman: ', len(ro_x))

### Use astropy to match coordinates

Create arrays of `astropy` `SkyCoord` to facilitate cross-match.

In [None]:
rubin_coord = SkyCoord(ra=rubin_ra[ru_x]*u.degree, dec=rubin_dec[ru_x]*u.degree, frame='icrs')
roman_coord = SkyCoord(ra=roman_ra[ro_x]*u.degree, dec=roman_dec[ro_x]*u.degree, frame='icrs')

Use `match_coordinaets_sky`.

https://docs.astropy.org/en/latest/api/astropy.coordinates.match_coordinates_sky.html

In [None]:
idx, d2d, d3d = match_coordinates_sky(rubin_coord, roman_coord)

Plot 2d offsets

In [None]:
fig = plt.figure(figsize=(4, 3))
tx = np.where(d2d.arcsec < 5.0)[0]
plt.hist(d2d.arcsec[tx], bins=100, log=True)
del tx
plt.xlabel('Offset in arcsec')
plt.ylabel('Number of matched objects')
plt.title('Where 2d distance is <5 arcsec')
plt.show()

Based on above, 0.5" appears to be a good cut-off to declare a cross-match.

This approximation is ok for this demo, which does not explore purity or
completeness, or do any kind of probabilistic assessment.
But many science goals would require a more rigorous consideration.

In [None]:
max_off_arcsec = 0.5

Create an array to hold the index of the Roman object that has been cross-matched for each Rubin object.

Set a default value of -1 to represent Rubin objects without a Roman object association (or that are faint and were never considered for cross-match in the first place).

In [None]:
rubin_rox = np.zeros(len(rubin_y), dtype='int') - 1

Store the index of the cross-matched Roman object for each Rubin object in `rubin_rox`.

In [None]:
for i in range(len(ru_x)):
    if d2d.arcsec[i] < 0.5:
        rubin_rox[ru_x[i]] = ro_x[idx[i]]

In [None]:
tx = np.where(rubin_rox >= 0)[0]
print('Number of bright Rubin objects with a Roman object within 0.5": ', len(tx))
del tx

## Visualize cross-matched objects

### Compare y-band magnitudes

In [None]:
fig = plt.figure(figsize=(6, 4))
tx = np.where(rubin_rox[ru_x] >= 0)[0]
plt.plot(rubin_y[ru_x[tx]], roman_y[rubin_rox[ru_x[tx]]], 'o', ms=3, mew=0, color='grey')
del tx
plt.plot([18, 25], [18, 25], lw=1, ls='solid', color='lightgrey')
plt.xlim([18, 25])
plt.ylim([18, 25])
plt.xlabel('Rubin')
plt.ylabel('Roman')
plt.title('Compare y-band Magnitudes')
plt.show()

### Optical/IR color-color diagram

In [None]:
fig = plt.figure(figsize=(6, 4))
tx = np.where(rubin_rox[ru_x] >= 0)[0]
plt.plot(rubin_z[ru_x[tx]]-rubin_y[ru_x[tx]], 
         roman_j[rubin_rox[ru_x[tx]]]-roman_h[rubin_rox[ru_x[tx]]],
         'o', ms=3, mew=0, alpha=0.3, color='grey', label='all')
del tx
tx = np.where((rubin_rox[ru_x] >= 0) &
              (rubin_y[ru_x] > 18) &
              (rubin_y[ru_x] < 20))[0]
plt.plot(rubin_z[ru_x[tx]]-rubin_y[ru_x[tx]], 
         roman_j[rubin_rox[ru_x[tx]]]-roman_h[rubin_rox[ru_x[tx]]],
         '*', ms=5, mew=0, alpha=0.8, color='black', label='18 < y < 20 mag')
del tx
plt.xlim([-0.8, 1.2])
plt.ylim([-0.4, 0.6])
plt.xlabel('Rubin z-y Color')
plt.ylabel('Roman J-H Color')
plt.title('Optical/IR Color-Color Diagram')
plt.legend(loc='upper left')
plt.show()

Above, structure in the color-color diagram can be seen for the brighter objects, which are probably stars.