# Advanced Astroquery Analysis Report  
## MAST + TESS: Download a light curve, detrend it, and find a transit with BLS

**Goal:** Use `astroquery.mast` to fetch a real **TESS light curve** from MAST, then perform:
- cleaning + detrending  
- **Box Least Squares (BLS)** period search  
- phase-folded transit visualization  
- simple physical inference (depth → radius ratio)

> Runs best with internet access (downloads from MAST).  
> Date: 2026-01-07


## 0) Setup

We use:
- `astroquery.mast` to discover + download TESS light curves
- `astropy.io.fits` to read FITS light curve files
- `astropy.timeseries.BoxLeastSquares` for transit search
- `numpy/matplotlib` for analysis & plots

If packages are missing, uncomment install.


In [None]:
# If needed (run once):
# !pip -q install astroquery astropy numpy matplotlib pandas


In [None]:
import numpy as np
import matplotlib.pyplot as plt

from astropy.io import fits
from astropy.timeseries import BoxLeastSquares
from astropy.stats import sigma_clip

from astroquery.mast import Catalogs, Observations


## 1) Choose a target

You can use a star name resolved by MAST's TIC service.

Examples that often work:
- `HD 209458` (classic transiting hot Jupiter)
- `WASP-12`
- `Pi Men`
- `TOI 700`

If resolution fails for your chosen name, try another or use a TIC ID directly.


In [None]:
target_name = "HD 209458"   # change me
print("Target:", target_name)


## 2) Resolve to a TIC ID (TESS Input Catalog)

We query the TIC catalog and take the top match.

Tip: If you already know the TIC ID, set `tic_id = <number>` and skip this step.


In [None]:
tic = Catalogs.query_object(target_name, radius=0.02, catalog="Tic")
tic[:5]


In [None]:
# pick best match
row0 = tic[0]
tic_id = int(row0["ID"])
ra = float(row0["ra"])
dec = float(row0["dec"])

print("TIC ID:", tic_id)
print("RA/Dec:", ra, dec)


## 3) Find TESS light curve products in MAST

We’ll search observations around the TIC position and filter to **TESS** light curves.

MAST has multiple product types:
- Light curves (e.g., *SPOC* light curves)
- Target pixel files (TPFs)
- Quick-look or HLSP products

Here we focus on light curves because they’re easiest to start with.


In [None]:
obs = Observations.query_region(f"{ra} {dec}", radius="0.02 deg")
# Filter to TESS mission and lightcurve-type products
obs_tess = obs[(obs["obs_collection"] == "TESS")]

print("All observations:", len(obs))
print("TESS observations:", len(obs_tess))
obs_tess[:5]


In [None]:
# Get product list and filter to light curves (LC)
products = Observations.get_product_list(obs_tess)
# A common heuristic: productType == "SCIENCE" and description contains "Lightcurve"
lc_products = products[
    (products["productType"] == "SCIENCE") &
    (np.char.find(np.char.lower(products["description"].astype(str)), "lightcurve") >= 0)
]

print("All products:", len(products))
print("Light curve-like products:", len(lc_products))
lc_products[:10]


## 4) Download a light curve

We’ll download the first light curve product (you can choose another by inspecting `lc_products`).

Downloads go into a local directory (default: `mastDownload/`).


In [None]:
# Choose one product
if len(lc_products) == 0:
    raise RuntimeError("No light curve products found. Try a different target or radius.")

prod = lc_products[0]
print("Selected product:")
print(prod)


In [None]:
manifest = Observations.download_products(Table([prod]), mrp_only=False)
manifest


In [None]:
noting = manifest["Local Path"][0]
print("Downloaded to:", noting)


## 5) Read the FITS light curve + choose a flux column

Many TESS light curves include:
- `TIME` (BJD - 2457000, days)
- `PDCSAP_FLUX` (systematics-corrected flux; usually preferred)
- `SAP_FLUX` (raw aperture photometry)

We will:
- use `PDCSAP_FLUX` if available, else fallback to `SAP_FLUX`
- remove NaNs
- sigma-clip outliers


In [None]:
with fits.open(noting, memmap=False) as hdul:
    hdul.info()
    data = hdul[1].data
    cols = data.columns.names
    print("Columns:", cols)


In [None]:
time = data["TIME"]

if "PDCSAP_FLUX" in cols:
    flux = data["PDCSAP_FLUX"]
    flux_err = data["PDCSAP_FLUX_ERR"] if "PDCSAP_FLUX_ERR" in cols else None
    flux_name = "PDCSAP_FLUX"
elif "SAP_FLUX" in cols:
    flux = data["SAP_FLUX"]
    flux_err = data["SAP_FLUX_ERR"] if "SAP_FLUX_ERR" in cols else None
    flux_name = "SAP_FLUX"
else:
    raise RuntimeError("No known flux column found (expected PDCSAP_FLUX or SAP_FLUX).")

mask = np.isfinite(time) & np.isfinite(flux)
time = np.array(time[mask], dtype=float)
flux = np.array(flux[mask], dtype=float)

# Normalize flux for convenience
flux_norm = flux / np.nanmedian(flux)

# sigma-clip
clipped = sigma_clip(flux_norm, sigma=5, maxiters=3)
m = ~clipped.mask

time = time[m]
flux_norm = flux_norm[m]

print("Using:", flux_name, "N=", len(time))


## 6) Plot raw normalized light curve

This is your first “data sanity check”.


In [None]:
fig, ax = plt.subplots(figsize=(10,4))
ax.plot(time, flux_norm, ".", ms=2)
ax.set_xlabel("Time (TESS BJD - 2457000)")
ax.set_ylabel("Normalized flux")
ax.set_title(f"{target_name} (TIC {tic_id}) — {flux_name} (normalized)")
plt.show()


## 7) Detrend (simple)

There are many detrending strategies. Here’s a simple, robust approach:
- compute a moving median baseline
- divide the flux by the baseline

This removes slow trends but keeps short transits.

You can tune `window` depending on cadence and expected transit duration.


In [None]:
def moving_median(y, window):
    # simple moving median (odd window recommended)
    w = int(window)
    if w < 3:
        return y.copy()
    if w % 2 == 0:
        w += 1
    pad = w//2
    ypad = np.pad(y, (pad, pad), mode="edge")
    out = np.empty_like(y, dtype=float)
    for i in range(len(y)):
        out[i] = np.median(ypad[i:i+w])
    return out

# Heuristic window size: depends on cadence; 101 is a common starting point
baseline = moving_median(flux_norm, window=101)
flux_det = flux_norm / baseline

fig, ax = plt.subplots(figsize=(10,4))
ax.plot(time, flux_det, ".", ms=2)
ax.set_xlabel("Time (TESS BJD - 2457000)")
ax.set_ylabel("Detrended flux")
ax.set_title("Detrended light curve (moving-median divide)")
plt.show()


## 8) Transit search with Box Least Squares (BLS)

BLS is designed for **boxy dips** (transits/eclipses).
We scan a period grid and find the strongest signal.

You should set:
- period range (days)
- transit duration grid (days)

For hot Jupiters, typical periods: 1–10 days.  
For longer-period planets, expand the search.


In [None]:
# Period search grid (days)
periods = np.linspace(0.5, 10.0, 5000)

# Transit duration grid (days) — a few hours is typical
durations = np.linspace(0.05, 0.3, 20)  # 1.2h to 7.2h

bls = BoxLeastSquares(time, flux_det)
res = bls.power(periods, durations)

best = np.argmax(res.power)
best_period = res.period[best]
best_duration = res.duration[best]
best_t0 = res.transit_time[best]
best_depth = res.depth[best]

best_period, best_duration, best_t0, best_depth


### Plot BLS periodogram

A peak suggests a candidate periodic transit.


In [None]:
fig, ax = plt.subplots(figsize=(10,4))
ax.plot(res.period, res.power, lw=1)
ax.set_xlabel("Period (days)")
ax.set_ylabel("BLS power")
ax.set_title("BLS periodogram")
ax.axvline(best_period, ls="--")
plt.show()

print(f"Best period:   {best_period:.6f} d")
print(f"Best duration: {best_duration:.4f} d ({best_duration*24:.2f} hours)")
print(f"Best t0:       {best_t0:.4f}")
print(f"Depth:         {best_depth:.6g} (in normalized flux units)")


## 9) Phase-fold and plot the transit

We fold on the best period and bin for visibility.


In [None]:
def phase_fold(time, period, t0):
    phase = (time - t0 + 0.5*period) % period - 0.5*period
    return phase

phase = phase_fold(time, best_period, best_t0)
order = np.argsort(phase)

ph = phase[order]
fl = flux_det[order]

# Simple binning for plot
def bin_series(x, y, nbins=200):
    bins = np.linspace(x.min(), x.max(), nbins+1)
    idx = np.digitize(x, bins) - 1
    xb = []
    yb = []
    for i in range(nbins):
        m = idx == i
        if np.any(m):
            xb.append(np.median(x[m]))
            yb.append(np.median(y[m]))
    return np.array(xb), np.array(yb)

xb, yb = bin_series(ph, fl, nbins=200)

fig, ax = plt.subplots(figsize=(10,4))
ax.plot(ph, fl, ".", ms=2, alpha=0.3, label="data")
ax.plot(xb, yb, "-", lw=2, label="binned median")
ax.set_xlabel("Phase (days)")
ax.set_ylabel("Detrended flux")
ax.set_title("Phase-folded light curve (best BLS period)")
ax.legend()
plt.show()


## 10) Simple physical inference

For small planets, transit depth:

\[
\delta \approx (R_p / R_\star)^2
\]

So a quick radius ratio estimate:

\[
R_p / R_\star \approx \sqrt{\delta}
\]

This ignores limb darkening and grazing effects, but is a useful first-order estimate.


In [None]:
delta = max(best_depth, 0)  # depth should be positive in BLS output
rp_rs = np.sqrt(delta)

print(f"Estimated depth δ ≈ {delta:.6g}")
print(f"Estimated radius ratio Rp/R* ≈ sqrt(δ) ≈ {rp_rs:.4f}")


## 11) Report-style summary

This block prints the “headline” results you’d put in a short analysis memo.


In [None]:
report = {
    "target_name": target_name,
    "tic_id": int(tic_id),
    "flux_used": flux_name,
    "n_points_used": int(len(time)),
    "best_period_days": float(best_period),
    "best_duration_hours": float(best_duration * 24.0),
    "best_depth": float(best_depth),
    "rp_over_rs_estimate": float(rp_rs),
}
report


## 12) Next steps (more advanced)

To go from “candidate detection” to “publishable”:
1. Use a better detrending model (e.g., splines, GP, or mission systematics models)
2. Fit a physical transit model (e.g., `batman`) with limb darkening
3. Check odd/even transit depths (eclipsing binary test)
4. Verify against known ephemerides (ExoFOP / literature)
5. Use pixel-level vetting (Target Pixel File analysis) for contamination checks
