# Radio Astrophysics Data Analysis — A Practical Jupyter Guide

This mini-course is a set of **hands-on notebooks** for the most common workflows in radio astronomy.
It focuses on ideas that transfer across facilities (VLA/MeerKAT/LOFAR/GMRT/ASKAP/ALMA, single-dish surveys, pulsar backends), while keeping examples **small and runnable**.

It is designed to be useful even if you *don't* have radio-specific libraries installed.
- Core dependencies used here: **NumPy, SciPy, Matplotlib, Pandas**
- Optional (recommended) radio stack: **Astropy, SpectralCube, RadioBeam, PyUVData, Dask/Xarray**
- Interferometry workhorses (external): **CASA / casatasks**, **WSClean**, **AOFlagger**

> If you want to process real Measurement Sets (MS) or do production imaging, you will almost certainly use CASA and/or WSClean.
> These notebooks still run without them (they use synthetic data and fallbacks).

## Notebooks in this pack

1. **01 — Radio Data Basics & Formats**  
   What’s in an MS/UVFITS/FITS cube/filterbank, units, axes, metadata sanity checks.

2. **02 — RFI: Detection, Flagging & QA**  
   Robust statistics, time–frequency masks, baseline-dependent RFI intuition, “don’t overflag” checks.

3. **03 — Calibration (Conceptual + Practical)**  
   Flux scale, bandpass, complex gains, polarization leakage basics, calibration tables & diagnostic plots.

4. **04 — Interferometric Imaging**  
   UV coverage, weighting (natural/robust/uniform), dirty image/PSF, CLEAN, primary beam, mosaics.

5. **05 — Spectral Line Cubes (H I / CO / recombination lines)**  
   Continuum subtraction, cube building, moment maps, matched filtering, stacking.

6. **06 — Time Domain & Dynamic Spectra**  
   Waterfalls, bandpass flattening, dedispersion intuition, simple burst detection.

7. **07 — Polarization & Faraday Rotation**  
   Stokes parameters, leakage, RM synthesis basics, debiasing polarized intensity.

8. **08 — Source Finding, Catalogs & Crossmatch**  
   Source extraction, completeness/false positives, spectral index maps, astrometry & associations.

## How to use this guide
- Run cells top-to-bottom.
- Swap synthetic inputs for your real data when ready.
- Do the “Try it” exercises in each notebook.
- Keep the diagnostic plots — radio reduction lives and dies by QA plots.


## Radio data you’ll meet in the wild

**Interferometers** (VLA/MeerKAT/LOFAR/ALMA…)
- *Visibilities* \(V(u,v,\nu,t)\): complex correlations stored as **Measurement Sets (MS)** or **UVFITS**.
- Imaging is an inversion + deconvolution problem (gridding → dirty image + PSF → CLEAN/selfcal).

**Single-dish / mapping**
- Time-ordered data or gridded maps; scanning strategies; baseline subtraction; beam effects.

**Spectral line cubes**
- 3D data \(I(x,y,\nu)\) with channelized spectra; continuum subtraction and moment maps are common.

**Time-domain backends** (pulsars / FRBs)
- Filterbank or voltages; dynamic spectra \(P(\nu,t)\); dedispersion and transient searches.

### A quick workflow decision tree
- *Do you have an MS/UVFITS?* → calibration → flagging → imaging → selfcal → science products.
- *Do you have a FITS cube?* → QA axes/units → continuum subtraction → line analysis.
- *Do you have a dynamic spectrum?* → bandpass flatten → RFI mask → search / dedisperse.


## Core QA plots (make these early)

**RFI / data quality**
- Amplitude vs time, phase vs time (per antenna/baseline)
- Bandpass amplitude vs frequency
- Waterfall (time–frequency) plots

**Imaging**
- UV coverage plot (do you have holes? short spacings?)
- Dirty image + PSF (sidelobes tell you how hard CLEAN will be)
- Residual image histogram (noise close to Gaussian? any stripes/rings?)
- Dynamic range estimate: \(\max(I)/\sigma\)

**Spectral line**
- Representative spectra and baseline regions
- Moment maps + uncertainty maps
- Channel rms vs frequency (flagged channels pop out)


---

## Optional: check for useful packages

If you're on a machine with internet access you can install common Python helpers:
- `pip install astropy spectral-cube radio-beam pyuvdata xarray dask`

(Interferometry pipelines are typically done in CASA/WSClean; Python tools often help with QA, custom analysis, and visualization.)


In [None]:
# Environment check (safe to run)
import importlib, sys

def has(pkg: str) -> bool:
    try:
        importlib.import_module(pkg)
        return True
    except Exception:
        return False

pkgs = [
    'numpy','scipy','matplotlib','pandas',
    'astropy','spectral_cube','radio_beam','pyuvdata','xarray','dask'
]
print('Python:', sys.version.split()[0])
for p in pkgs:
    print(f"{p:12s}:", '✅' if has(p) else '—')


## A tiny “analysis toolbox” we’ll reuse

These helpers are intentionally light-weight:
- quick image/spectrum/waterfall plots
- robust noise estimates (MAD)
- a simple RFI mask example (thresholding + connected components intuition)

They work on synthetic arrays and can be adapted to real data products (cal tables, cubes, dynamic spectra).


In [None]:
import numpy as np
import matplotlib.pyplot as plt

# ---------- plotting ----------

def plot_spectrum(freq, spec, title=None, xlabel='Frequency', ylabel='Amplitude'):
    plt.figure(figsize=(7,3))
    plt.plot(freq, spec)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    if title:
        plt.title(title)
    plt.tight_layout()
    plt.show()


def plot_waterfall(dyn, freq=None, time=None, title=None, xlabel='Time index', ylabel='Frequency index', vmin=None, vmax=None):
    # dyn expected shape: (nfreq, ntime)
    plt.figure(figsize=(7,4))
    extent = None
    if freq is not None and time is not None:
        extent = [time[0], time[-1], freq[0], freq[-1]]
        xlabel = 'Time'
        ylabel = 'Frequency'
    plt.imshow(dyn, origin='lower', aspect='auto', extent=extent, vmin=vmin, vmax=vmax)
    plt.colorbar(label='Power / arbitrary')
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    if title:
        plt.title(title)
    plt.tight_layout()
    plt.show()

# ---------- robust stats ----------

def mad_std(x, axis=None):
    '''Median absolute deviation, scaled to match std for Gaussian noise.'''
    x = np.asarray(x)
    med = np.nanmedian(x, axis=axis, keepdims=True)
    mad = np.nanmedian(np.abs(x - med), axis=axis)
    return 1.4826 * mad


def robust_zscore(x, axis=None):
    x = np.asarray(x)
    med = np.nanmedian(x, axis=axis, keepdims=True)
    sigma = mad_std(x, axis=axis)
    # broadcast sigma
    return (x - med) / (sigma + 1e-12)

# ---------- simple RFI mask ----------

def rfi_mask_waterfall(dyn, thresh=6.0, axis=1):
    '''
    Simple RFI mask for a dynamic spectrum (nfreq, ntime).
    - Compute robust z-scores along an axis (default: per-frequency over time)
    - Flag points above |thresh|

    This is NOT a replacement for AOFlagger, but is a good teaching baseline.
    '''
    z = robust_zscore(dyn, axis=axis)
    return np.abs(z) > thresh


## Tiny synthetic example: dynamic spectrum + naive RFI masking

We’ll create a toy waterfall with:
- thermal noise
- a narrowband persistent interferer
- a broadband impulsive burst

Then we’ll apply a simple robust-threshold mask.


In [None]:
import numpy as np

# synthetic dynamic spectrum
nfreq, ntime = 256, 400
freq = np.linspace(1.0, 1.5, nfreq)   # GHz (toy)
time = np.linspace(0.0, 200.0, ntime) # s  (toy)

rng = np.random.default_rng(7)
dyn = rng.normal(0, 1.0, size=(nfreq, ntime))

# narrowband RFI: a bright tone in a few channels
rfi_ch = (freq > 1.23) & (freq < 1.235)
dyn[rfi_ch, :] += 8.0

# broadband burst: a short time window across all freqs
burst_t = (time > 95) & (time < 100)
dyn[:, burst_t] += 5.0 * np.exp(-((freq-1.33)/0.08)**2)[:, None]

plot_waterfall(dyn, freq=freq, time=time, title='Synthetic dynamic spectrum')

mask = rfi_mask_waterfall(dyn, thresh=6.0, axis=1)

dyn_clean = dyn.copy()
dyn_clean[mask] = np.nan  # pretend we flagged

plot_waterfall(mask.astype(float), freq=freq, time=time, title='Mask (1=flagged)', vmin=0, vmax=1)
plot_waterfall(dyn_clean, freq=freq, time=time, title='After naive flagging')

# collapsed spectrum + time series (simple diagnostics)
spec = np.nanmean(dyn_clean, axis=1)
ts = np.nanmean(dyn_clean, axis=0)

plot_spectrum(freq, spec, title='Mean spectrum after flagging', xlabel='Frequency [GHz]', ylabel='Mean power')
plot_spectrum(time, ts, title='Mean time series after flagging', xlabel='Time [s]', ylabel='Mean power')


---

### Next: start with `01_radio_data_basics_formats.ipynb` (or adapt these notebooks to your pipeline)

If you tell me what kind of data you’re working with (e.g., **VLA MS**, **MeerKAT MS**, **LOFAR**, **single-dish spectra**, **H I cube**, **filterbank**), I can tailor the notebook sequence and include tool-specific commands (CASA/WSClean/AOFlagger) alongside the Python analysis.
