<a href="https://colab.research.google.com/github/sokrypton/7.571/blob/main/L4/FDR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multiple Hypothesis Testing

Previously, we saw that by simply rerunning an experiment ~1/α times, we expect to observe a "significant" effect even if the null hypothesis is true.

**How do we protect against excessive Type I errors when we want to compare many different values in a single experiment?**

→ Multiple hypothesis testing corrections

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind

## 1 | The Multiple Testing Problem

**Scenario:** You measure expression of 1000 genes in control vs treatment.

To start, we'll assume that the treatment has **NO EFFECT** on any gene.

In [None]:
# Simulate 1000 genes with NO real effect
num_genes = 1000
n_samples = 10

p_values_null = []
for gene in range(num_genes):
    # Both groups come from the SAME distribution (no effect)
    control = np.random.normal(100, 20, n_samples)
    treatment = np.random.normal(100, 20, n_samples)
    _, p = ttest_ind(control, treatment)
    p_values_null.append(p)

p_values_null = np.array(p_values_null)

# How many "significant" results?
alpha = 0.05
false_positives = np.sum(p_values_null < alpha)

print(f"Tested {num_genes} genes (all with NO real effect)")
print(f"Found {false_positives} 'significant' results (p < {alpha})")
print(f"\nThese are ALL false positives!")
print(f"Expected: {num_genes * alpha:.0f} (α × m = {alpha} × {num_genes})")

---

## 2 | Now add some real effects

What if 100 genes actually ARE affected by the treatment?

In [None]:
# Simulate: 900 genes with NO effect, 100 genes with REAL effect
num_genes = 1000
num_real = 100
n_samples = 10

p_values_mixed = []

# First 900: NO effect (null hypothesis is TRUE)
for gene in range(num_genes - num_real):
    control = np.random.normal(100, 20, n_samples)
    treatment = np.random.normal(100, 20, n_samples)  # Same!
    _, p = ttest_ind(control, treatment)
    p_values_mixed.append(p)

# Last 100: REAL effect (null hypothesis is FALSE)
for gene in range(num_real):
    control = np.random.normal(100, 20, n_samples)
    treatment = np.random.normal(130, 20, n_samples)  # Different!
    _, p = ttest_ind(control, treatment)
    p_values_mixed.append(p)

p_values_mixed = np.array(p_values_mixed)

# Track which are truly null
is_null = np.array([True] * (num_genes - num_real) + [False] * num_real)

print(f"Simulated {num_genes} genes:")
print(f"  - {num_genes - num_real} with NO effect")
print(f"  - {num_real} with REAL effect")

### Distribution of p-values

The histogram reveals whether there are real effects:

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Left: ALL NULL
ax = axes[0]
ax.hist(p_values_null, bins=20, edgecolor='white', alpha=0.7, color='steelblue')
ax.axhline(num_genes/20, color='red', linestyle='--', linewidth=2, label='Expected if uniform')
ax.set_xlabel('p-value')
ax.set_ylabel('Count')
ax.set_title('ALL NULL\n→ p-values are UNIFORM (flat)')
ax.legend()

# Right: SOME REAL
ax = axes[1]
ax.hist(p_values_mixed, bins=20, edgecolor='white', alpha=0.7, color='steelblue')
ax.axhline(num_genes/20, color='red', linestyle='--', linewidth=2, label='Expected if uniform')
ax.set_xlabel('p-value')
ax.set_ylabel('Count')
ax.set_title(f'SOME REAL ({num_real} effects)\n→ SPIKE near 0')
ax.legend()

plt.tight_layout()
plt.show()

print("Left: All null → p-values uniformly distributed (flat histogram)")
print("Right: Some real → spike near 0 (real effects have small p-values)")

---

## 3 | Benjamini-Hochberg Procedure

**Key insight:** When all nulls are true, p-values are uniformly distributed. When sorted, they follow a diagonal line with slope ~1/m.

The BH procedure draws a threshold line with slope α/m. Real effects create small p-values that dip **below** this line.

**Algorithm:**
1. Sort p-values: P₁ ≤ P₂ ≤ ... ≤ Pₘ
2. Find the largest j where Pⱼ ≤ j × α/m
3. Reject all hypotheses with p-value ≤ Pⱼ

In [None]:
# Sort both sets of p-values
sorted_null = np.sort(p_values_null)
sorted_mixed = np.sort(p_values_mixed)

m = num_genes
ranks = np.arange(1, m + 1)

# BH threshold line: j * alpha / m
alpha = 0.05
bh_line = ranks * alpha / m

# Expected p-values under null: ~j/m (diagonal)
expected_null = ranks / m

### Visualizing: All Null vs Some Real (full view)

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: ALL NULL
ax = axes[0]
ax.plot(ranks, sorted_null, 'b.', markersize=3, alpha=0.6, label='Sorted p-values')
ax.plot(ranks, bh_line, 'r-', linewidth=2, label='BH threshold (j × α/m)')
ax.plot(ranks, expected_null, 'k--', linewidth=1, alpha=0.5, label='Expected under null (j/m)')
ax.set_xlabel('Rank (j)')
ax.set_ylabel('p-value')
ax.set_title('ALL NULL (no real effects)\n→ p-values stay ABOVE BH line')
ax.legend(loc='upper left')
ax.set_xlim(0, m)
ax.set_ylim(0, 1)
ax.grid(True, alpha=0.3)

# Right: SOME REAL
ax = axes[1]
ax.plot(ranks, sorted_mixed, 'b.', markersize=3, alpha=0.6, label='Sorted p-values')
ax.plot(ranks, bh_line, 'r-', linewidth=2, label='BH threshold (j × α/m)')
ax.plot(ranks, expected_null, 'k--', linewidth=1, alpha=0.5, label='Expected under null (j/m)')
ax.set_xlabel('Rank (j)')
ax.set_ylabel('p-value')
ax.set_title(f'SOME REAL ({num_real} real effects)\n→ p-values dip BELOW BH line')
ax.legend(loc='upper left')
ax.set_xlim(0, m)
ax.set_ylim(0, 1)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### Zoomed in view

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
zoom = 150

# Left: ALL NULL (zoomed)
ax = axes[0]
ax.plot(ranks[:zoom], sorted_null[:zoom], 'bo-', markersize=4, alpha=0.7, label='Sorted p-values')
ax.plot(ranks[:zoom], bh_line[:zoom], 'r-', linewidth=2, label='BH threshold')
ax.plot(ranks[:zoom], expected_null[:zoom], 'k--', linewidth=1, alpha=0.5, label='Expected (j/m)')
ax.set_xlabel('Rank (j)')
ax.set_ylabel('p-value')
ax.set_title('ALL NULL (zoomed)\n→ p-values stay ABOVE red line')
ax.legend(loc='upper left')
ax.grid(True, alpha=0.3)

# Right: SOME REAL (zoomed)
ax = axes[1]
ax.plot(ranks[:zoom], sorted_mixed[:zoom], 'bo-', markersize=4, alpha=0.7, label='Sorted p-values')
ax.plot(ranks[:zoom], bh_line[:zoom], 'r-', linewidth=2, label='BH threshold')
ax.plot(ranks[:zoom], expected_null[:zoom], 'k--', linewidth=1, alpha=0.5, label='Expected (j/m)')

# Find and mark the cutoff
below_bh = sorted_mixed <= bh_line
if any(below_bh):
    j_cutoff = np.max(np.where(below_bh)[0]) + 1
    ax.axvline(j_cutoff, color='green', linestyle=':', linewidth=2, label=f'Cutoff (j={j_cutoff})')

ax.set_xlabel('Rank (j)')
ax.set_ylabel('p-value')
ax.set_title('SOME REAL (zoomed)\n→ real effects dip BELOW red line')
ax.legend(loc='upper left')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Left: All null - p-values follow diagonal, stay above BH line → no discoveries")
print(f"Right: Some real - p-values dip below BH line → {j_cutoff} discoveries")

---

## 4 | Applying the BH Procedure

In [None]:
def benjamini_hochberg(p_values, alpha=0.05):
    """Apply BH procedure, return boolean array of rejections and cutoff"""
    m = len(p_values)

    # Step 1: Sort p-values (P_1 <= P_2 <= ... <= P_m)
    sorted_pvals = np.sort(p_values)

    # Step 2: Find largest j where P_j <= j * alpha / m
    ranks = np.arange(1, m + 1)
    thresholds = ranks * alpha / m
    below_threshold = sorted_pvals <= thresholds

    if not any(below_threshold):
        return np.zeros(m, dtype=bool), 0

    j = np.max(np.where(below_threshold)[0]) + 1
    P_j = sorted_pvals[j - 1]

    # Step 3: Reject all hypotheses with p-value <= P_j
    rejected = p_values <= P_j

    return rejected, P_j

In [None]:
# Apply to ALL NULL case
rejected_null, cutoff_null = benjamini_hochberg(p_values_null, alpha=0.05)
print(f"ALL NULL case:")
print(f"  Discoveries: {np.sum(rejected_null)}")
print(f"  (These would all be false positives)")

print()

# Apply to SOME REAL case
rejected_mixed, cutoff_mixed = benjamini_hochberg(p_values_mixed, alpha=0.05)
tp = np.sum(rejected_mixed & ~is_null)
fp = np.sum(rejected_mixed & is_null)
total = np.sum(rejected_mixed)

print(f"SOME REAL case:")
print(f"  Discoveries: {total}")
print(f"  True positives: {tp}")
print(f"  False positives: {fp}")
if total > 0:
    print(f"  FDR = {fp}/{total} = {fp/total:.1%}")

---

## 5 | Comparison with Bonferroni

**Bonferroni correction:** Use threshold α/m (a flat horizontal line, much lower than BH)

In [None]:
# Bonferroni threshold
bonf_threshold = alpha / num_genes

print(f"Comparison ({num_real} real effects out of {num_genes} genes):\n")
print(f"{'Method':<25} {'Threshold':<15} {'Discoveries':<12} {'True Pos':<10} {'False Pos':<10}")
print("-" * 72)

# No correction
disc = np.sum(p_values_mixed < alpha)
tp = np.sum((p_values_mixed < alpha) & ~is_null)
fp = np.sum((p_values_mixed < alpha) & is_null)
print(f"{'No correction':<25} {'p < 0.05':<15} {disc:<12} {tp:<10} {fp:<10}")

# Bonferroni
disc = np.sum(p_values_mixed < bonf_threshold)
tp = np.sum((p_values_mixed < bonf_threshold) & ~is_null)
fp = np.sum((p_values_mixed < bonf_threshold) & is_null)
print(f"{'Bonferroni':<25} {f'p < {bonf_threshold:.1e}':<15} {disc:<12} {tp:<10} {fp:<10}")

# BH
disc = np.sum(rejected_mixed)
tp = np.sum(rejected_mixed & ~is_null)
fp = np.sum(rejected_mixed & is_null)
print(f"{'Benjamini-Hochberg':<25} {'adaptive':<15} {disc:<12} {tp:<10} {fp:<10}")

print("\n" + "="*72)
print("No correction: Many false positives")
print("Bonferroni:    Very conservative - misses most real effects")
print("BH:            Best balance - controls FDR while finding real effects")

---

## Summary

### The Problem
- Testing many hypotheses → many false positives by chance
- With 1000 tests at α = 0.05, expect ~50 false positives

### Key Insight
- **Histogram:** Uniform (flat) under null, spike near 0 when real effects exist
- **Sorted p-values:** Follow diagonal (j/m) under null, dip below BH line when real effects exist

### Benjamini-Hochberg Procedure
1. Sort p-values: P₁ ≤ P₂ ≤ ... ≤ Pₘ
2. Find the largest j where Pⱼ ≤ j × α/m
3. Reject all hypotheses with p-value ≤ Pⱼ

**Expected FDR ≤ α**