# 05 — Catalogs, Coordinates & Crossmatch (RA/Dec)

## What you’ll learn
- How to compute angular separation on the sphere
- Crossmatching two catalogs with nearest neighbors
- How to choose a match radius (trade-off: completeness vs contamination)
- A realistic pitfall: chance alignments in crowded fields

We’ll create synthetic catalogs near a chosen sky location.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.spatial import cKDTree

np.random.seed(4)


## 1) Spherical distance (small-angle approximation vs exact)

For small separations (arcseconds), a good approximation is:
\[
\Delta\theta \approx \sqrt{ (\Delta\alpha \cos\delta)^2 + (\Delta\delta)^2 }
\]

Where RA/Dec are in radians.

We'll implement both:
- a **small-angle** approximation (fast, accurate for tiny angles)
- an **exact** great-circle distance (robust)


In [None]:
def angsep_small(ra1, dec1, ra2, dec2):
    # inputs in radians
    dra = (ra2 - ra1) * np.cos(0.5*(dec1+dec2))
    ddec = (dec2 - dec1)
    return np.sqrt(dra*dra + ddec*ddec)

def angsep_exact(ra1, dec1, ra2, dec2):
    # haversine-like formula
    s = np.sin((dec2-dec1)/2)**2 + np.cos(dec1)*np.cos(dec2)*np.sin((ra2-ra1)/2)**2
    return 2*np.arcsin(np.sqrt(np.clip(s, 0, 1)))

# quick check
ra1, dec1 = np.deg2rad(10.0), np.deg2rad(-5.0)
ra2, dec2 = np.deg2rad(10.0001), np.deg2rad(-4.9999)
print("small (arcsec):", np.rad2deg(angsep_small(ra1,dec1,ra2,dec2))*3600)
print("exact (arcsec):", np.rad2deg(angsep_exact(ra1,dec1,ra2,dec2))*3600)


## 2) Make two catalogs: “truth” and “observed”

Catalog A: true positions  
Catalog B: measured positions with random errors + a few extra spurious sources


In [None]:
# pick a field center
ra0_deg, dec0_deg = 210.0, 54.0
ra0, dec0 = np.deg2rad(ra0_deg), np.deg2rad(dec0_deg)

N_true = 1500

# uniform in a small patch (~0.3 deg)
patch = np.deg2rad(0.3)
ra_true  = ra0  + np.random.uniform(-patch, patch, size=N_true) / np.cos(dec0)
dec_true = dec0 + np.random.uniform(-patch, patch, size=N_true)

# measurement errors (~0.5 arcsec)
sig_arcsec = 0.5
sig = np.deg2rad(sig_arcsec/3600)

ra_obs  = ra_true  + np.random.normal(0, sig, size=N_true) / np.cos(dec0)
dec_obs = dec_true + np.random.normal(0, sig, size=N_true)

# add spurious sources to observed catalog
N_spur = 200
ra_spur  = ra0  + np.random.uniform(-patch, patch, size=N_spur) / np.cos(dec0)
dec_spur = dec0 + np.random.uniform(-patch, patch, size=N_spur)

raB  = np.concatenate([ra_obs, ra_spur])
decB = np.concatenate([dec_obs, dec_spur])

catA = pd.DataFrame({"ra_rad": ra_true, "dec_rad": dec_true})
catB = pd.DataFrame({"ra_rad": raB, "dec_rad": decB})

catA.head(), catB.head()


## 3) Crossmatch using a KD-tree

For small patches, we can work in a “tangent plane” projection:
- x = RA * cos(dec0)
- y = Dec

Then use nearest-neighbor matching with `cKDTree`.


In [None]:
def project_tan(ra, dec, dec_ref):
    x = ra * np.cos(dec_ref)
    y = dec
    return np.vstack([x,y]).T

A_xy = project_tan(catA.ra_rad.values, catA.dec_rad.values, dec0)
B_xy = project_tan(catB.ra_rad.values, catB.dec_rad.values, dec0)

tree = cKDTree(B_xy)
dist, idx = tree.query(A_xy, k=1)

# convert to arcsec using small-angle scaling
dist_arcsec = np.rad2deg(dist) * 3600

print("median match distance [arcsec]:", np.median(dist_arcsec))
print("95th percentile [arcsec]:", np.percentile(dist_arcsec, 95))


## 4) Choose a match radius

If the radius is too small → miss real matches (incomplete)  
If too large → accept wrong matches (contamination)

We’ll compute “how many matches” as a function of radius.


In [None]:
radii = np.linspace(0.2, 3.0, 30)  # arcsec
nmatch = [(dist_arcsec <= r).sum() for r in radii]

plt.figure(figsize=(7,4))
plt.plot(radii, nmatch, marker="o")
plt.xlabel("match radius [arcsec]")
plt.ylabel("# matches (A→B)")
plt.title("Matches vs. radius")
plt.tight_layout()
plt.show()


## 5) Inspect the distance distribution

A common pattern:
- a tight peak at small separations (real matches)
- a tail (chance matches / confusion)


In [None]:
plt.figure(figsize=(7,4))
plt.hist(dist_arcsec, bins=50)
plt.xlabel("nearest-neighbor distance [arcsec]")
plt.ylabel("count")
plt.title("A→B nearest-neighbor distances")
plt.tight_layout()
plt.show()


## 6) Build the matched catalog at a chosen radius


In [None]:
R = 1.5  # arcsec
keep = dist_arcsec <= R
matched = pd.DataFrame({
    "raA_deg": np.rad2deg(catA.ra_rad.values[keep]),
    "decA_deg": np.rad2deg(catA.dec_rad.values[keep]),
    "raB_deg": np.rad2deg(catB.ra_rad.values[idx[keep]]),
    "decB_deg": np.rad2deg(catB.dec_rad.values[idx[keep]]),
    "sep_arcsec": dist_arcsec[keep],
})
matched.head(), len(matched)


## Try it
- Increase the spurious-source count and see how the tail grows.
- Increase measurement error and see how the best radius increases.
- Try a *mutual* nearest-neighbor match (A→B and B→A) to reduce false matches.

You now have the core building blocks used across much of observational astronomy.

---
### Want a version that works with real FITS/WCS/catalogs?
Install **Astropy** (and optionally Photutils/Astroquery) and I can also generate an “Astropy-first” set of notebooks.
