# Applied Statistics Assesment - problems.ipynb

This notebook contains solutions to all assessment problems. It is structured and written for an informed computing professional.

**How to run**: Run all cells top-to-bottom. Set the random seed in the setup cell to support reproducibility.

In [None]:
# === Setup & imports ===
import math # for mathematical functions
import itertools # for combinatorial functions
import numpy as np # for numerical operations
import pandas as pd # for data manipulation
import matplotlib.pyplot as plt # for plotting
from scipy import stats # for statistical functions
import statsmodels.api as sm # for statistical models (e.g., regression)
import statsmodels.formula.api as smf # for formula interface

# Prefer the new Generator API
SEED = 42 # for reproducibility
rng = np.random.default_rng(SEED)  

# Plot defaults (readable fonts and grid)
plt.rcParams.update({
    'figure.figsize': (8, 4),
    'axes.grid': True,
    'axes.spines.top': False,
    'axes.spines.right': False,
})

print(f"Environment initialised. numpy={np.__version__}, pandas={pd.__version__}")


Environment initialised. numpy=1.26.4, pandas=2.3.2


## Problem 1 — Extending the Lady Tasting Tea

### Plan

**Goal.** Under the null hypothesis (no skill), estimate the probability a participant correctly identifies *all* cups by chance in:
- the classic **8-cup** design (4 tea-first, 4 milk-first), and
- the **12-cup** extension (8 tea-first, 4 milk-first).

**Approach.**
1) The participant must choose exactly the `n_milk` “milk-first” indices; the rest are tea-first.  
2) Under random guessing with the correct class sizes, “all correct” happens only if their chosen subset equals the true subset.  
3) The exact probability is therefore:  
   \[
   \Pr(\text{all correct}) = \frac{1}{\binom{n_\text{total}}{n_\text{milk}}}.
   \]
4) We also verify by simulation.  
5) Briefly discuss how the extended design affects evidence thresholds (p-values) and options for rejection regions.