# Chapter 1: Introduction to Statistical Inference

**Core Goal:** Use sample data to draw conclusions about populations.

In [None]:
import numpy as np
import scipy.stats as stats

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns; sns.set_theme()

## 1.1 Population vs Sample

**Population:** All items of interest. **Sample:** Subset we actually observe.

In [None]:
np.random.seed(42)
population = stats.norm(loc=100, scale=15)

In [None]:
sample = population.rvs(size=30)
print(f"Sample: {sample[:5]}...")  # First 5

**Key insight:** We never see the full population, only samples from it.

## 1.2 Parameters vs Statistics

**Parameter (θ):** Population characteristic (unknown). **Statistic:** Sample characteristic (computed).

In [None]:
true_mu = 100  # Population mean (parameter)
true_sigma = 15  # Population std (parameter)

In [None]:
sample_mean = np.mean(sample)  # Statistic
sample_std = np.std(sample, ddof=1)  # Statistic

In [None]:
print(f"True μ={true_mu}, Est μ̂={sample_mean:.2f}")
print(f"True σ={true_sigma}, Est σ̂={sample_std:.2f}")

**Notation:** θ (theta) = parameter, θ̂ (theta-hat) = estimator

## 1.3 Random Variables

**Random Variable (RV):** Numerical outcome of a random process.

In [None]:
# Discrete RV: Binomial (n=10 trials, p=0.3)
X_discrete = stats.binom(n=10, p=0.3)

In [None]:
# Continuous RV: Normal (μ=0, σ=1)
X_continuous = stats.norm(loc=0, scale=1)

**Discrete:** Countable outcomes (0,1,2,...). **Continuous:** Any value in range.

## 1.4 Probability Distributions

**PDF/PMF:** Describes probability of each outcome.

In [None]:
# PMF for discrete: P(X=k)
k_values = np.arange(0, 11)

In [None]:
pmf_values = X_discrete.pmf(k_values)
plt.bar(k_values, pmf_values); plt.title('Binomial PMF')

In [None]:
# PDF for continuous: density at x
x_values = np.linspace(-4, 4, 100)

In [None]:
pdf_values = X_continuous.pdf(x_values)
plt.plot(x_values, pdf_values); plt.title('Normal PDF')

**CDF:** Cumulative probability F(x) = P(X ≤ x)

In [None]:
cdf_values = X_continuous.cdf(x_values)
plt.plot(x_values, cdf_values); plt.title('Normal CDF')

## 1.5 Expected Value & Variance

**E[X]:** Center of distribution (mean). **Var(X):** Spread around mean.

In [None]:
mean_binom = X_discrete.mean()
var_binom = X_discrete.var()

In [None]:
print(f"E[X]={mean_binom:.2f}, Var(X)={var_binom:.2f}")
print(f"Formula: E[X]=np={10*0.3}, Var=np(1-p)={10*0.3*0.7}")

**Standard Deviation:** σ = √Var(X), same units as X.

In [None]:
std_binom = X_discrete.std()
print(f"σ = {std_binom:.2f}")

## 1.6 Common Distributions

### Normal Distribution N(μ, σ²)

In [None]:
# Most important continuous distribution
normal = stats.norm(loc=100, scale=15)

In [None]:
# 68-95-99.7 rule
print(f"P(μ-σ < X < μ+σ) = {normal.cdf(115)-normal.cdf(85):.3f}")

### Standard Normal Z ~ N(0,1)

In [None]:
Z = stats.norm(0, 1)
print(f"P(Z < 1.96) = {Z.cdf(1.96):.4f}")

**Standardization:** Z = (X - μ) / σ transforms any normal to standard normal.

In [None]:
x = 130  # Some value from N(100,15²)
z = (x - 100) / 15; print(f"Z-score: {z:.2f}")

### t-distribution (Student's t)

In [None]:
# Used when σ unknown, sample size small
t_dist = stats.t(df=10)  # df = degrees of freedom

In [None]:
x = np.linspace(-4, 4, 100)
plt.plot(x, Z.pdf(x), label='Normal')

In [None]:
plt.plot(x, t_dist.pdf(x), label='t(df=10)')
plt.legend(); plt.title('t vs Normal: heavier tails')

**Key:** As df → ∞, t → N(0,1)

### Chi-squared χ²

In [None]:
# Distribution of sum of squared standard normals
chi2_dist = stats.chi2(df=5)

In [None]:
x = np.linspace(0, 20, 100)
plt.plot(x, chi2_dist.pdf(x)); plt.title('χ²(df=5)')

**Use:** Variance estimation, goodness-of-fit tests.

### F-distribution

In [None]:
# Ratio of two chi-squared distributions
f_dist = stats.f(dfn=5, dfd=10)

In [None]:
x = np.linspace(0, 5, 100)
plt.plot(x, f_dist.pdf(x)); plt.title('F(5,10)')

**Use:** Comparing variances, ANOVA.

## 1.7 Sampling Distributions

**Sampling distribution:** Distribution of a statistic across all possible samples.

In [None]:
# Simulate: Draw 1000 samples, compute mean of each
sample_means = [population.rvs(30).mean() for _ in range(1000)]

In [None]:
plt.hist(sample_means, bins=30, density=True, alpha=0.7)
plt.title('Sampling Distribution of X̄')

**Key result:** X̄ ~ N(μ, σ²/n) when sampling from N(μ,σ²)

In [None]:
# Theoretical vs empirical
print(f"E[X̄] theory: {true_mu}, empirical: {np.mean(sample_means):.2f}")

In [None]:
se_theory = true_sigma / np.sqrt(30)  # Standard error
print(f"SE theory: {se_theory:.2f}, empirical: {np.std(sample_means):.2f}")

## 1.8 Central Limit Theorem (CLT)

**CLT:** For large n, X̄ ~ N(μ, σ²/n) *regardless* of population distribution.

In [None]:
# Non-normal population: Exponential
pop_exp = stats.expon(scale=2)

In [None]:
# Sample means from exponential population
exp_means = [pop_exp.rvs(30).mean() for _ in range(1000)]

In [None]:
plt.hist(exp_means, bins=30, density=True)
plt.title('X̄ from Exponential: Still Normal!')

**Implication:** Normal theory applies broadly due to CLT.

## 1.9 Three Types of Inference

### 1.9.1 Point Estimation

In [None]:
# Single "best guess" for parameter
data = stats.norm(100, 15).rvs(50)

In [None]:
mu_hat = np.mean(data)
print(f"Point estimate μ̂ = {mu_hat:.2f}")

### 1.9.2 Interval Estimation (Confidence Intervals)

In [None]:
# Range of plausible values
se = stats.sem(data)  # Standard error

In [None]:
ci = stats.t.interval(0.95, len(data)-1, loc=mu_hat, scale=se)
print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")

**Interpretation:** 95% of such intervals contain true μ.

### 1.9.3 Hypothesis Testing

In [None]:
# Test H₀: μ = 105 vs H₁: μ ≠ 105
result = stats.ttest_1samp(data, 105)

In [None]:
print(f"t-statistic: {result.statistic:.3f}")
print(f"p-value: {result.pvalue:.3f}")

**Decision rule:** Reject H₀ if p-value < α (typically 0.05).

In [None]:
decision = "Reject H₀" if result.pvalue < 0.05 else "Fail to reject H₀"
print(f"Decision: {decision}")

## 1.10 Standard Error vs Standard Deviation

**SD (σ):** Variability of *individual observations*. **SE:** Variability of *sample statistic*.

In [None]:
sd = np.std(data, ddof=1)  # Sample SD
print(f"SD of observations: {sd:.2f}")

In [None]:
se = sd / np.sqrt(len(data))  # SE of mean
print(f"SE of mean: {se:.2f}")

**Formula:** SE(X̄) = σ/√n. Larger n → smaller SE → more precise estimate.

## 1.11 Bias and Variance of Estimators

**Bias:** E[θ̂] - θ. **Unbiased:** E[θ̂] = θ.

In [None]:
# Sample mean is unbiased for μ
estimates = [population.rvs(30).mean() for _ in range(5000)]

In [None]:
bias = np.mean(estimates) - true_mu
print(f"Bias of X̄: {bias:.4f} (≈ 0)")

**Variance of estimator:** Var(θ̂). Lower is better (more precise).

In [None]:
var_estimator = np.var(estimates)
print(f"Var(X̄) empirical: {var_estimator:.2f}")

In [None]:
var_theory = true_sigma**2 / 30
print(f"Var(X̄) theory: {var_theory:.2f}")

## 1.12 Mean Squared Error (MSE)

**MSE:** E[(θ̂ - θ)²] = Bias² + Variance. Measures overall quality.

In [None]:
mse = np.mean((np.array(estimates) - true_mu)**2)
print(f"MSE(X̄) = {mse:.2f}")

In [None]:
# For unbiased estimator: MSE = Variance
print(f"MSE ≈ Var: {np.isclose(mse, var_estimator)}")

## 1.13 Law of Large Numbers (LLN)

**LLN:** As n → ∞, X̄ → μ (converges in probability).

In [None]:
# Demonstrate convergence
sample_sizes = [10, 50, 100, 500, 1000, 5000]

In [None]:
means = [population.rvs(n).mean() for n in sample_sizes]
plt.plot(sample_sizes, means, 'o-')

In [None]:
plt.axhline(true_mu, color='r', linestyle='--', label='True μ')
plt.xlabel('Sample size'); plt.ylabel('X̄'); plt.legend()

## Summary: The Inference Framework

1. **Population** with unknown parameter θ
2. Draw **random sample** X₁,...,Xₙ
3. Compute **statistic** θ̂ (estimator)
4. Use **sampling distribution** of θ̂ for inference
5. Three types: **point estimate**, **interval**, **hypothesis test**

## Key Takeaways

- **Randomness** is fundamental: samples vary, so statistics vary
- **Sampling distributions** quantify this variability
- **CLT** makes normal theory widely applicable
- **Good estimators**: low bias, low variance (low MSE)
- **SE quantifies precision**, decreases with √n