# Chapter 5: Confidence Intervals

**Core Goal:** Quantify uncertainty in parameter estimates through interval estimation.

**Motivation:** A point estimate alone tells an incomplete story. If we estimate μ̂ = 52.3, is the true value likely between 52.0 and 52.6, or between 40 and 65? Confidence intervals provide a range of plausible parameter values with specified probability. They quantify estimation uncertainty systematically and enable more informed decision-making than point estimates alone. The confidence level (typically 95%) refers to long-run coverage: if we repeated the sampling process many times, approximately 95% of constructed intervals would contain the true parameter.

In [None]:
import numpy as np
import scipy.stats as stats

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns; sns.set_theme()

## 5.1 The Concept of Confidence Intervals

**Confidence Interval:** A range $[L(X), U(X)]$ computed from sample data such that $P(\theta \in [L(X), U(X)]) = 1 - \alpha$ for all $\theta$.

**Confidence Level:** $1 - \alpha$, typically 0.95 (95%)

**Coverage Probability:** The long-run proportion of confidence intervals that contain the true parameter.

**Motivation:** Unlike point estimates that give a single number, confidence intervals acknowledge uncertainty by providing a range. The confidence level quantifies our long-run reliability: 95% confidence means that if we constructed intervals from infinitely many samples, 95% would contain the true parameter. This interpretation is subtle but crucial—we don't say the probability the true parameter is in a particular interval is 95%, but rather that our procedure produces intervals that cover the truth 95% of the time.

### Correct Interpretation

**Correct:** "If we repeated this sampling procedure many times and constructed confidence intervals each time, approximately 95% of those intervals would contain the true parameter value."

**WRONG:** "There is a 95% probability that the true parameter is in this particular interval." (The parameter is fixed, not random; the interval is random.)

**Mnemonic:** Confidence is about the **procedure**, not the **parameter**.

## 5.2 Confidence Interval for Normal Mean (Known Variance)

**Setup:** $X_1, ..., X_n \sim N(\mu, \sigma^2)$ with $\sigma^2$ known

**Sampling distribution:** $\bar{X} \sim N(\mu, \sigma^2/n)$

**Standardization:** $Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim N(0,1)$

**Confidence Interval:** $\bar{X} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$

where $z_{\alpha/2}$ is the $(1 - \alpha/2)$ quantile of $N(0,1)$.

**For 95% confidence:** $z_{0.025} = 1.96$

**Motivation:** When population variance is known, we can use the normal distribution directly to construct intervals. The margin of error $z_{\alpha/2}\sigma/\sqrt{n}$ captures our uncertainty, shrinking with larger sample size and smaller variance.

In [None]:
np.random.seed(42)
true_mu, true_sigma = 100, 15

In [None]:
data = stats.norm(true_mu, true_sigma).rvs(50)
xbar = np.mean(data); n = len(data)

In [None]:
# z_{α/2} = 1.96: Critical value for 95% confidence (standard normal)
z_critical = stats.norm.ppf(0.975)

In [None]:
# Margin of error = z_{α/2} × σ/√n: Half-width of confidence interval
margin_of_error = z_critical * true_sigma / np.sqrt(n)

In [None]:
# CI = X̄ ± z_{α/2}σ/√n: Confidence interval for mean with known variance
ci_lower = xbar - margin_of_error; ci_upper = xbar + margin_of_error

In [None]:
print(f"Sample mean: X̄ = {xbar:.2f}")
print(f"95% Confidence Interval: [{ci_lower:.2f}, {ci_upper:.2f}]")
print(f"True μ = {true_mu} (contained: {ci_lower <= true_mu <= ci_upper})")

### Demonstrating Coverage Probability

**Simulation:** Construct many confidence intervals and check what fraction contain the true parameter.

In [None]:
# Generate 100 samples and their 95% confidence intervals
num_intervals = 100; np.random.seed(42)

In [None]:
intervals = []
for _ in range(num_intervals):
    sample = stats.norm(true_mu, true_sigma).rvs(50)
    xb = np.mean(sample); me = z_critical * true_sigma / np.sqrt(50)
    intervals.append((xb - me, xb + me))

In [None]:
# Count intervals that contain true parameter
coverage = sum(1 for (lower, upper) in intervals if lower <= true_mu <= upper)

In [None]:
print(f"Coverage: {coverage}/{num_intervals} = {coverage/num_intervals:.2%}")
print("Expected: 95%")

**Visualization:** Plot confidence intervals, highlighting those that miss the true parameter.

In [None]:
# Plot first 50 confidence intervals
fig, ax = plt.subplots(figsize=(10, 8))
for i, (lower, upper) in enumerate(intervals[:50]):
    color = 'blue' if lower <= true_mu <= upper else 'red'
    ax.plot([lower, upper], [i, i], color=color, linewidth=1.5)
    ax.plot((lower + upper)/2, i, 'o', color=color, markersize=3)

In [None]:
ax.axvline(true_mu, color='green', linestyle='--', linewidth=2, label='True μ')
ax.set_xlabel('Parameter Value'); ax.set_ylabel('Sample Number')
ax.set_title('95% Confidence Intervals (Blue = Contains True μ, Red = Misses)'); ax.legend()

**Observation:** Approximately 95% of intervals (blue) contain true parameter; ~5% (red) do not.

## 5.3 Confidence Interval for Normal Mean (Unknown Variance)

**Setup:** $X_1, ..., X_n \sim N(\mu, \sigma^2)$ with $\sigma^2$ unknown

**Problem:** Cannot use $Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}$ because $\sigma$ is unknown.

**Solution:** Use sample standard deviation $S$ and t-distribution.

**t-statistic:** $T = \frac{\bar{X} - \mu}{S/\sqrt{n}} \sim t_{n-1}$

**Confidence Interval:** $\bar{X} \pm t_{\alpha/2, n-1} \frac{S}{\sqrt{n}}$

**Motivation:** When $\sigma$ is unknown (the typical case), we estimate it with sample standard deviation $S$. This introduces additional uncertainty, accounted for by using the t-distribution instead of normal. The t-distribution has heavier tails, producing wider intervals that maintain correct coverage despite estimating $\sigma$. As $n$ increases, $t_{n-1}$ approaches $N(0,1)$.

In [None]:
# Estimate both mean and variance from data
sample_mean = np.mean(data); sample_std = np.std(data, ddof=1)

In [None]:
# t_{α/2, n-1}: Critical value from t-distribution with n-1 degrees of freedom
t_critical = stats.t.ppf(0.975, df=n-1)

In [None]:
# SE(X̄) = S/√n: Standard error using sample standard deviation
standard_error = sample_std / np.sqrt(n)

In [None]:
# CI = X̄ ± t_{α/2,n-1} × S/√n: t-based confidence interval
ci_t_lower = sample_mean - t_critical * standard_error
ci_t_upper = sample_mean + t_critical * standard_error

In [None]:
print(f"95% t-Confidence Interval: [{ci_t_lower:.2f}, {ci_t_upper:.2f}]")
print(f"t-critical value: {t_critical:.3f} (compare to z = 1.96)")

**Note:** t-critical value is slightly larger than 1.96, producing wider interval to account for uncertainty in estimating $\sigma$.

### Using scipy.stats for Confidence Intervals

**Scipy provides convenient functions for common confidence intervals.**

In [None]:
# stats.t.interval: Built-in function for t-confidence interval
ci_scipy = stats.t.interval(0.95, df=n-1, loc=sample_mean, scale=standard_error)

In [None]:
print(f"Scipy 95% Confidence Interval: [{ci_scipy[0]:.2f}, {ci_scipy[1]:.2f}]")
print("Matches manual calculation")

## 5.4 Factors Affecting Confidence Interval Width

**Width = $2 \times t_{\alpha/2, n-1} \times \frac{S}{\sqrt{n}}$**

**Factors:**
1. **Confidence level $(1-\alpha)$:** Higher confidence → wider interval
2. **Sample size $n$:** Larger $n$ → narrower interval (by factor $1/\sqrt{n}$)
3. **Population variance $\sigma^2$:** More variability → wider interval

**Motivation:** Understanding these factors helps plan studies and interpret results. To cut interval width in half, need 4 times the sample size. Higher confidence requires wider intervals—there's no free lunch.

### Effect of Confidence Level

In [None]:
confidence_levels = [0.90, 0.95, 0.99]
for conf in confidence_levels:
    ci = stats.t.interval(conf, df=n-1, loc=sample_mean, scale=standard_error)
    width = ci[1] - ci[0]
    print(f"{int(conf*100)}% Confidence Interval: [{ci[0]:.2f}, {ci[1]:.2f}], Width: {width:.2f}")

**Result:** Higher confidence level produces wider interval.

### Effect of Sample Size

In [None]:
sample_sizes = [10, 30, 100, 300]
for ns in sample_sizes:
    samp = stats.norm(true_mu, true_sigma).rvs(ns)
    ci = stats.t.interval(0.95, df=ns-1, loc=np.mean(samp), scale=stats.sem(samp))
    width = ci[1] - ci[0]
    print(f"n = {ns:3d}: Width = {width:.2f}")

**Result:** Interval width decreases as $1/\sqrt{n}$.

## 5.5 Large-Sample Confidence Intervals

**Central Limit Theorem Application:** For large $n$, $\bar{X}$ is approximately normal even if population is not normal.

**Large-sample confidence interval:** $\bar{X} \pm z_{\alpha/2} \frac{S}{\sqrt{n}}$

**When to use:** $n \geq 30$ as rule of thumb (earlier if population is symmetric)

**Motivation:** The Central Limit Theorem allows normal-based inference for non-normal populations when sample size is large. This is powerful because we need not know or verify the exact population distribution. Additionally, for large $n$, the t-distribution is nearly indistinguishable from normal, so using $z$ instead of $t$ makes little practical difference.

In [None]:
# Non-normal population: Exponential distribution (highly skewed)
exponential_data = stats.expon(scale=2).rvs(100)

In [None]:
# Central Limit Theorem: Sample mean approximately normal despite skewed population
exp_mean = np.mean(exponential_data); exp_se = stats.sem(exponential_data)

In [None]:
# Large-sample confidence interval using z-critical value
ci_exp = (exp_mean - 1.96*exp_se, exp_mean + 1.96*exp_se)

In [None]:
print(f"Large-sample 95% Confidence Interval: [{ci_exp[0]:.3f}, {ci_exp[1]:.3f}]")
print(f"True mean = 2.0")

## 5.6 Confidence Interval for Proportion

**Setup:** $X_1, ..., X_n \sim \text{Bernoulli}(p)$

**Estimator:** $\hat{p} = \frac{1}{n}\sum_{i=1}^n X_i$ (sample proportion)

**Standard Error:** $SE(\hat{p}) = \sqrt{\frac{p(1-p)}{n}} \approx \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$

**Wald Confidence Interval:** $\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$

**Motivation:** For binary data (success/failure), we estimate the proportion $p$. The Wald interval is the most common approach, based on normal approximation. It works well for moderate sample sizes and proportions not too close to 0 or 1.

In [None]:
# Binary data: 45 successes out of 100 trials
n_trials = 100; n_successes = 45

In [None]:
# p̂ = X/n: Sample proportion
p_hat = n_successes / n_trials

In [None]:
# SE(p̂) = √[p̂(1-p̂)/n]: Standard error of sample proportion
se_p = np.sqrt(p_hat * (1 - p_hat) / n_trials)

In [None]:
# Wald CI = p̂ ± z_{α/2} × SE(p̂): Confidence interval for proportion
ci_prop = (p_hat - 1.96*se_p, p_hat + 1.96*se_p)

In [None]:
print(f"Sample proportion: p̂ = {p_hat:.3f}")
print(f"95% Confidence Interval for p: [{ci_prop[0]:.3f}, {ci_prop[1]:.3f}]")

### Wilson Score Interval (Better Coverage)

**Problem with Wald:** Poor coverage when $p$ near 0 or 1, or small $n$

**Wilson Score Interval:** $\frac{\hat{p} + \frac{z^2}{2n} \pm z\sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z^2}{4n^2}}}{1 + \frac{z^2}{n}}$

**Motivation:** The Wilson interval has better coverage properties, especially for small samples or extreme proportions. It adds a continuity correction that stabilizes the interval.

In [None]:
# Wilson score interval with continuity correction
z = 1.96; n = n_trials; p = p_hat

In [None]:
denominator = 1 + z**2/n
center = (p + z**2/(2*n)) / denominator
margin = z * np.sqrt(p*(1-p)/n + z**2/(4*n**2)) / denominator

In [None]:
ci_wilson = (center - margin, center + margin)
print(f"Wilson 95% Confidence Interval: [{ci_wilson[0]:.3f}, {ci_wilson[1]:.3f}]")
print(f"Wald 95% Confidence Interval: [{ci_prop[0]:.3f}, {ci_prop[1]:.3f}]")

## 5.7 Confidence Interval for Difference in Means

**Setup:** Two independent samples from normal populations
- Sample 1: $X_1, ..., X_{n_1} \sim N(\mu_1, \sigma_1^2)$
- Sample 2: $Y_1, ..., Y_{n_2} \sim N(\mu_2, \sigma_2^2)$

**Parameter of interest:** $\mu_1 - \mu_2$

**Estimator:** $\bar{X} - \bar{Y}$

**Standard Error (general case):** $SE(\bar{X} - \bar{Y}) = \sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}$

**Confidence Interval (Welch's):** $(\bar{X} - \bar{Y}) \pm t_{\alpha/2, \nu} \sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}$

**Degrees of freedom (Welch-Satterthwaite):** $\nu = \frac{\left(\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}\right)^2}{\frac{(S_1^2/n_1)^2}{n_1-1} + \frac{(S_2^2/n_2)^2}{n_2-1}}$

**Motivation:** Comparing two groups is central to scientific research. Welch's t-interval allows for unequal variances, making it more robust than pooled t-interval which assumes equal variances.

In [None]:
# Two independent samples from different populations
group1 = stats.norm(100, 15).rvs(50); group2 = stats.norm(110, 20).rvs(60)

In [None]:
mean1, mean2 = np.mean(group1), np.mean(group2)
var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
n1, n2 = len(group1), len(group2)

In [None]:
# SE(X̄₁ - X̄₂) = √(s₁²/n₁ + s₂²/n₂): Standard error of difference
se_diff = np.sqrt(var1/n1 + var2/n2)

In [None]:
# Welch-Satterthwaite degrees of freedom approximation
df_welch = (var1/n1 + var2/n2)**2 / ((var1/n1)**2/(n1-1) + (var2/n2)**2/(n2-1))

In [None]:
# CI for μ₁ - μ₂: Welch's t-interval for difference in means
ci_diff = stats.t.interval(0.95, df=df_welch, loc=mean1-mean2, scale=se_diff)

In [None]:
print(f"Mean difference: X̄₁ - X̄₂ = {mean1 - mean2:.2f}")
print(f"95% Confidence Interval for μ₁ - μ₂: [{ci_diff[0]:.2f}, {ci_diff[1]:.2f}]")
print(f"True difference: -10")

## 5.8 Bootstrap Confidence Intervals

**Bootstrap Principle:** Resample data with replacement to estimate sampling distribution of any statistic.

**Percentile Bootstrap Confidence Interval:**
1. Resample data $B$ times (typically $B = 1000$ or more)
2. Compute statistic $\hat{\theta}^*$ for each bootstrap sample
3. Find $\alpha/2$ and $1-\alpha/2$ quantiles of bootstrap distribution

**Advantages:**
- Works for any statistic (median, trimmed mean, correlation, etc.)
- No need for variance formulas or distributional assumptions
- Accounts for skewness in sampling distribution

**Motivation:** Theoretical confidence intervals require distributional assumptions and variance formulas. Bootstrap provides a general computational alternative that works when theory is unavailable or assumptions are questionable.

In [None]:
# Bootstrap confidence interval for median
data_boot = stats.norm(50, 10).rvs(80)

In [None]:
B = 2000; np.random.seed(42)
boot_medians = [np.median(np.random.choice(data_boot, len(data_boot), replace=True)) for _ in range(B)]

In [None]:
# Percentile bootstrap CI: [q_{α/2}, q_{1-α/2}] of bootstrap distribution
ci_boot = np.percentile(boot_medians, [2.5, 97.5])

In [None]:
print(f"Sample median: {np.median(data_boot):.2f}")
print(f"Bootstrap 95% Confidence Interval: [{ci_boot[0]:.2f}, {ci_boot[1]:.2f}]")

In [None]:
# Visualize bootstrap distribution
plt.hist(boot_medians, bins=40, edgecolor='black', alpha=0.7)

In [None]:
plt.axvline(ci_boot[0], color='r', linestyle='--', label='95% Confidence Interval')
plt.axvline(ci_boot[1], color='r', linestyle='--')
plt.xlabel('Bootstrap Median'); plt.ylabel('Frequency')
plt.title('Bootstrap Distribution of Sample Median'); plt.legend()

## 5.9 Interpreting Confidence Intervals

**What confidence intervals tell us:**
- Range of plausible parameter values
- Precision of our estimate (width indicates uncertainty)
- Values outside interval are less plausible (but not impossible)

**What confidence intervals do NOT tell us:**
- Probability that true parameter is in this specific interval (parameter is fixed, not random)
- That all values in the interval are equally plausible
- That values outside interval are impossible

**Confidence level meaning:**
- 95% refers to the procedure, not the parameter
- If we repeated sampling many times, 95% of resulting intervals would contain true parameter
- For any single interval, parameter is either in it or not (probability is 0 or 1)

## 5.10 Sample Size Determination

**Question:** How large a sample do we need to achieve desired precision?

**For normal mean:**
- Desired margin of error: $E$
- Required sample size: $n = \left(\frac{z_{\alpha/2} \sigma}{E}\right)^2$

**Example:** To estimate mean with margin of error $E = 2$ at 95% confidence when $\sigma = 10$:

$n = \left(\frac{1.96 \times 10}{2}\right)^2 = 96.04 \approx 97$

**Motivation:** Planning studies requires determining adequate sample size. Too small wastes opportunity; too large wastes resources. Sample size formulas balance precision requirements with cost.

In [None]:
# Calculate required sample size for desired margin of error
sigma_assumed = 10; margin_desired = 2; z = 1.96

In [None]:
# n = (z_{α/2}σ/E)²: Sample size for desired margin of error
n_required = (z * sigma_assumed / margin_desired)**2

In [None]:
print(f"Required sample size for margin of error ±{margin_desired}: n = {int(np.ceil(n_required))}")

## Summary: Confidence Interval Construction

1. **Identify parameter** and appropriate confidence interval formula
2. **Compute point estimate** from sample data
3. **Calculate standard error** of the estimator
4. **Find critical value** from appropriate distribution (z or t)
5. **Construct interval:** Estimate ± (critical value × standard error)
6. **Interpret carefully:** Long-run coverage, not probability for this interval

## Key Takeaways

- **Confidence intervals quantify uncertainty:** Point estimates are incomplete without uncertainty quantification. Intervals provide range of plausible values.

- **Confidence level is about the procedure, not the parameter:** 95% confidence means 95% of intervals constructed this way contain the true parameter, not that probability is 95% for this specific interval.

- **Use t-distribution when variance is unknown:** With unknown $\sigma$, t-distribution accounts for additional uncertainty from estimating variance. For large $n$, t and normal converge.

- **Width depends on confidence level, sample size, and variability:** Higher confidence requires wider intervals. Larger samples produce narrower intervals. More variability produces wider intervals.

- **Central Limit Theorem enables inference for non-normal populations:** With large samples, confidence intervals work even when population is non-normal.

- **Bootstrap provides general-purpose confidence intervals:** When theory is unavailable or assumptions questionable, bootstrap gives computational alternative for any statistic.

- **Sample size can be determined from precision requirements:** Work backward from desired margin of error to calculate required sample size.

- **Different procedures for different parameters:** Means, proportions, differences all have specific formulas. Choose appropriate method for parameter of interest.