<a href="https://colab.research.google.com/github/sundarjhu/Astrostatistics2025/blob/main/Lesson08.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Core imports used across problems
import numpy as np
from math import sqrt, ceil
from scipy.stats import norm

np.set_printoptions(suppress=True, precision=6)


## Problem 1: Single-pixel detection (right tail; use `.cdf`)

### A CCD image has background noise modeled as $\mathscr{N}(\mu, \sigma^2)$ with $\mu$=1200 ADU and $\sigma$=15 ADU. A pixel on the CCD reads 1248 ADU. What is the probability that this reading is purely due to noise?

**Solution**: Let $X$ be the random variable associated with the level of background that is detected by the pixel. We are then looking for the probability $P(X \geq 1248 | \mu=1200, \sigma=15)$.

From the definition of the CDF, $P(X \ge x) = 1 - F_X(x)$.

Therefore, we can use the `.cdf` method, remembering to use the `loc` and `scale` arguments to pass the mean and standard deviation.

In [None]:
mu, sigma = 1200.0, 15.0
x = 1248.0
p_right = 1.0 - norm.cdf(x, loc=mu, scale=sigma)
print(f"P(X >= {x}) = {p_right:.6g}")

P(X >= 1248.0) = 0.000687138


For this problem, we could also have obtained an approximate estimate by standardising the variable:
$Z \equiv \displaystyle{X-\mu\over \sigma} = \displaystyle{1248-1200\over 15} = 48/15 = 3.2$.
That is, the observed value is more than 3.2 standard deviations away from the mean.

From the Empirical Rule for normal distributions, we know that the probability of being within 3 standard deviations is approximately 99.7%. Therefore, the required probablity must be less than (1-0.997)/2 = 0.0015.

Note that we are dividing by two because (1-0.997) is the two-tailed extreme probability and we are only interested in the right-tailed case.

### Solution: the probability is lower than the usual threshold (significance) of 5%, meaning that this reading cannot be due to pure noise.

## Problem 3: Central probability around a mean (use `.cdf`)

### A sub-mm heterodyne detector measures the centroids of spectral lines with resolution 1.8 km s$^{-1}$ (that is, the location of the centroid is distributed according to $\mathscr{N}(\mu, \sigma^2)$ with $\mu=0$, $\sigma=1.8$). What fractions of measured centroids fall within $\pm 2.5$ km s$^{-1}$ of the true value?

**Solution**: We require the central probability $P(|X| \leq 2.5|mu=0, sigma=1.8) = P(-2.5\leq X\leq 2.5|\mu=0, \sigma=1.8)$.

From the definition of the CDF, $P(-2.5\leq X\leq 2.5|\mu=0, \sigma=1.8) = F_X(2.5)-F_X(-2.5)$.

Once again, we can use the `.cdf` method with the `loc` and `scale` arguments.

In [None]:
mu, sigma = 0.0, 1.8
a, b = -2.5, 2.5
p_central = norm.cdf(b, loc=mu, scale=sigma) - norm.cdf(a, loc=mu, scale=sigma)
print(f"P({a} <= X <= {b}) = {p_central:.6f}")

P(-2.5 <= X <= 2.5) = 0.835133


As before, we could have obtained an approximate estimate by standardising:
$Z\equiv\displaystyle{X-\mu\over\sigma}=\displaystyle{2.5-0\over 1.8}=1.38$.
From the Empirical Rule, we know that the 1-$\sigma$ central probability is 68% and the 2-\$sigma$ value is 95%, so the required probability should be intermediate to these two values.

## Problem 3: False-alarm threshold for source extraction (use `.ppf`)

### An aperture has background counts distributed according to $\mathscr{N}(\mu=500,\ \sigma^2=22^2)$. What is the threshold $T$ such that the right-tail **false-alarm rate** is 0.5%?

**Solution**: the false-alarm rate is the probability that a true signal is accidentally identified as being due to the background. A right-tail false-alarm rate of 0.5% means the threshold is such that $P(X\geq T|\mu, \sigma) = 0.005$. To solve this equation for $T$, we use the `.ppf` method, making sure to pass the `loc` and `scale` arguments.


In [None]:
mu, sigma = 500.0, 22.0
alpha_right = 0.005
T = norm.ppf(1 - alpha_right, loc=mu, scale=sigma)
print(f"Detection threshold T for 0.5% right-tail false alarm: {T:.6f}")

Detection threshold T for 0.5% right-tail false alarm: 556.668245


That is, any detection above 556.7 counts has only a 0.5% probability of being due to background noise.

## Problem 4: Comparing two independent measurements (difference of normals; use `.cdf`)

###The radial velocities of two sources are normally distributed with means of 30 and 26 km s$^{-1}$ and standard deviations of 2 and 3 km s$^{-1}$. Assuming that the meaurements are independent, what is the probability that the radial velocity of the first source is greater than that of the second?

**Solution**: Let $X_1, X_2$ be the measured radial velocities of the sources. The problem requires the probability $P(X_1 > X_2) = P(X_1 - X_2 > 0)$.

We use the fact that the difference of two normally-distributed variables is also normal, that the mean is equal to the difference in means, and that the standard deviations add in quadrature.

Therefore, the required probability can be computed using the `.cdf` method with the appropriate `loc` and `scale` arguments.

In [None]:
mu1, s1 = 30.0, 2.0
mu2, s2 = 26.0, 3.0
muD = mu1 - mu2
sD = sqrt(s1**2 + s2**2)
p = 1.0 - norm.cdf(0.0, loc=muD, scale=sD)
print(f"mu_D = {muD:.6f}, sigma_D = {sD:.6f}")
print(f"P(X1 - X2 >= 0) = {p:.6f}")

mu_D = 4.000000, sigma_D = 3.605551
P(X1 - X2 >= 0) = 0.866371


## Problem 5: Determine n to achieve a target precision (use `.ppf`)

### The photon counts from a source in a single, short exposure are distributed according to $\mathscr{N}(1000, 80^2)$. How many exposures $n$ are required so that a 95% confidence interval on the mean source counts has a total width of 20 counts?

**Solution**:
When averaging over $n$ exposures, the standard deviation on this sample mean will be $\sigma/\sqrt{n}$.
How many such standard deviations does a 95% confidence interval span (that is, how many standard deviations equals the half-width of a 95% confidence interval)?

The half-width of a standard 95% CI is $z = F_X^{-1}(1-0.01/2) = $ `norm.ppf`$(1-0.05/2) = 1.96$.

Multiply this by the standard deviation of the sample mean to get the half-width of the required interval: $W/2 = 1.96\ \displaystyle{\sigma\over \sqrt{n}}$.

Solve this for $n$, rounding up to the next integer.


In [None]:
sigma = 80.0
target_halfwidth = 10.0   # W=20 => half-width = 10
alpha = 0.05
z = norm.ppf(1 - alpha/2)
n_req = int(ceil((z * sigma / target_halfwidth)**2))
print(f"z_0.975 = {z:.6f}")
print(f"Required n (ceil): {n_req}")

z_0.975 = 1.959964
Required n (ceil): 246


We could also have obtained an approximate estimate of $n$ using the Empirical Rule.

The standard deviation for a short exposure is 80 counts.

For the full $n$ exposures: according to the Empirical Rule, 95% CI approximately encloses 2 standard deviations. This means that the half-width of 10 counts corresponds to 2 standard deviations. Therefore, 1 standard deviation is 5 counts.

We therefore have $5 = 80 / \sqrt{n} \Longrightarrow n = 16^2 = 256$.