<a href="https://colab.research.google.com/github/sundarjhu/Astrostatistics2025/blob/main/Lesson14.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Problem 3 (1-sample $t$ test)
### I have a dataset of 10 points with sample mean and standard deviation $-0.47$ and $0.94$ respectively. Are these data consistent with being drawn from a population with mean 1.0?

##### Our null hypothesis is $H_0: \mu_1 = \mu = 1$.<br>
The $t$ statistic for the problem is $t = \displaystyle{\overline{X_1}-\mu\over S/\sqrt{N}}$.<br>
Let us calculate the observed value:

In [3]:
import numpy as np
from scipy.stats import t

N = 10
dof = N-1
xbar, s = -0.47, 0.94
mu = 1.0

t_obs = (xbar - mu) / (s / np.sqrt(N))
print(f"The observed value of the t statistic is {np.round(t_obs, decimals=2)}.")

The observed value of the t statistic is -4.95.


##### Why is the number of degrees of freedom for the $t$-test $N-1$?

##### The sample standard deviation uses the sample mean. We have thus placed one constraint on the data (the sample mean is defined such that the sum of deviations from this quantity is zero). Therefore, the number of independent data points is reduced by 1.

##### The $p$-value is the probability that the $t$ statistic is more extreme than the value observed:<br>
$p = {\rm Prob}(t > |t_{\rm obs}|)$<br><br>
Using the fact that the $t$ distribution is symmetric,<br>
$p = 2\ {\rm Prob}(t < -|t_{\rm obs}|) = \texttt{scipy.stats.t.cdf}(-|t_{\rm obs}|, {\rm dof})$,<br>
with ${\rm dof}$ the number of degrees of freedom, which is $N-1=9$.

In [5]:
p_threshold = 0.05
p_obs = 2 * t.cdf(-np.abs(t_obs), dof)
print(f"The corresponding p-value is {np.round(p_obs, decimals=4)}.")
if p_obs < p_threshold:
  print(f"The observed p-value was lower than the threshold, so the null hypothesis is rejected!")
else:
  print(f"The observed p-value was not below the threshold. The null hypothesis CANNOT be rejected.")

The corresponding p-value is 0.0008.
The observed p-value was lower than the threshold, so the null hypothesis is rejected!


# Problem 4 (2-sample equal-sized $t$ test)
### Two 20-point datasets have sample means 82 and 96 with sample standard deviations 19.3 and 23. Are they drawn from distributions with the same mean?

##### Note that the version of the 2-sample $t$ test we are about to use is only valid if the **population standard deviations of the distributions from which the samples are drawn are equal**. This does not restrict the observed (i.e., sample) standard deviations to be identical.<br><br>

##### Our null hypothesis is $H_0: \mu_1 = \mu_2 = \mu$ (say).<br>
The $t$ statistic for the problem is<br>
$t = \displaystyle{(\overline{X_1}-\mu_1)-(\overline{X_2}-\mu_2)\over\sqrt{S_1^2+S_2^2}/\sqrt{N}}=\sqrt{N} \displaystyle{\overline{X_1}-\overline{X_2}\over\sqrt{S_1^2+S_2^2}}$ under $H_0$.<br>
Let us calculate the observed value:

In [None]:
N = 20
xbar1, xbar2 = 82, 96
s1, s2 = 19.3, 23
p_threshold = 0.05
dof = N-1

t_obs = np.sqrt(N) * (xbar1 - xbar2) / np.sqrt(s1**2 + s2**2)
print(f"The observed value of the t statistic is {np.round(t_obs, decimals=2)}.")

##### The $p$-value is the probability that the $t$ statistic is more extreme than the value observed:<br>
$p = {\rm Prob}(t > |t_{\rm obs}|)$<br><br>
Using the fact that the $t$ distribution is symmetric,<br>
$p = 2\ {\rm Prob}(t < -|t_{\rm obs}|) = \texttt{scipy.stats.t.cdf}(-|t_{\rm obs}|, {\rm dof})$,<br>
with ${\rm dof}$ the number of degrees of freedom, which is $N-1=19$.

In [None]:
p_obs = 2 * t.cdf(-np.abs(t_obs), dof)
print(f"The corresponding p-value is {np.round(p_obs, decimals=3)}.")
if p_obs < p_threshold:
  print(f"The observed p-value was lower than the threshold, so the null hypothesis is rejected!")
else:
  print(f"The observed p-value was not below the threshold. The null hypothesis CANNOT be rejected.")