<a href="https://colab.research.google.com/github/sokrypton/7.571/blob/main/L3/binomial_poisson.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Binomial and Poisson Distributions

In this notebook, we'll explore:
1. The Binomial distribution (counting successes in T trials)
2. The Poisson distribution (limit of Binomial when T is large and f is small)
3. When to use each distribution

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from math import factorial

## 1 | Binomial Distribution: Neurotransmitter Example

**Scenario:** Neurotransmitters (NT) in the ER get randomly packaged into synaptic vesicles.

- **T** = total number of neurotransmitters in the ER
- **f** = probability that any single NT ends up in a particular vesicle
- **I** = number of NTs that end up in that vesicle (what we measure)

**Question:** What's the probability of getting exactly I neurotransmitters in a vesicle?

### Building intuition: Decision tree for 2 molecules

Let's start simple. If we have just 2 molecules, each can be IN or OUT of the vesicle.

In [None]:
f = 0.3  # probability of being IN the vesicle

# All possible outcomes for 2 molecules
P_0_in = (1-f) * (1-f)  # both OUT
P_1_in = f*(1-f) + (1-f)*f  # one IN, one OUT (2 ways)
P_2_in = f * f  # both IN

print(f"With f = {f}:")
print(f"  P(0 in vesicle) = (1-f)² = {P_0_in:.3f}")
print(f"  P(1 in vesicle) = 2f(1-f) = {P_1_in:.3f}")
print(f"  P(2 in vesicle) = f² = {P_2_in:.3f}")
print(f"  Sum = {P_0_in + P_1_in + P_2_in:.3f}")

### The general formula

For **T trials** with **I successes**:

$$P(I) = f^I \times (1-f)^{T-I} \times \binom{T}{I}$$

Where:
- $f^I$ = probability of I successes
- $(1-f)^{T-I}$ = probability of (T-I) failures  
- $\binom{T}{I} = \frac{T!}{I!(T-I)!}$ = number of ways to arrange I successes in T trials ("T choose I")

In [None]:
def binomial_pmf(I, T, f):
    """Calculate P(I successes in T trials) with success probability f"""
    # T choose I
    choose = factorial(T) / (factorial(I) * factorial(T - I))
    # probability
    return (f ** I) * ((1 - f) ** (T - I)) * choose

# Verify with T=2 example
T = 2
f = 0.3
print(f"T = {T}, f = {f}")
for I in range(T + 1):
    print(f"  P({I} successes) = {binomial_pmf(I, T, f):.3f}")

### Visualizing the Binomial distribution

In [None]:
# Parameters
T = 20   # total neurotransmitters in ER
f = 0.3  # probability of ending up in vesicle

# Calculate probabilities for each possible outcome
I_values = np.arange(0, T + 1)
probabilities = [binomial_pmf(I, T, f) for I in I_values]

# Plot
plt.figure(figsize=(10, 5))
plt.bar(I_values, probabilities, color='steelblue', alpha=0.7, edgecolor='white')
plt.xlabel('Number of NTs in vesicle (I)')
plt.ylabel('Probability P(I)')
plt.title(f'Binomial Distribution (T = {T}, f = {f})')

# Mark the mean
mean = T * f
plt.axvline(mean, color='red', linestyle='--', linewidth=2, label=f'Mean = T×f = {mean}')
plt.legend()
plt.show()

print(f"Mean: T × f = {T} × {f} = {mean}")
print(f"Std Dev: √(T × f × (1-f)) = √({T} × {f} × {1-f}) = {np.sqrt(T * f * (1-f)):.2f}")

### Simulate the experiment

Let's simulate many vesicles and see if our distribution matches!

In [None]:
# Simulate 10,000 vesicles
num_vesicles = 10000
T = 20
f = 0.3

# For each vesicle, count how many NTs ended up inside
NT_counts = []
for _ in range(num_vesicles):
    # Each of T neurotransmitters has probability f of being in this vesicle
    in_vesicle = np.random.random(T) < f
    NT_counts.append(np.sum(in_vesicle))

NT_counts = np.array(NT_counts)

# Plot simulation vs theory
plt.figure(figsize=(10, 5))

# Simulation (histogram)
plt.hist(NT_counts, bins=np.arange(-0.5, T + 1.5, 1), density=True,
         alpha=0.5, color='steelblue', label='Simulation')

# Theory (binomial PMF)
I_values = np.arange(0, T + 1)
theory = [binomial_pmf(I, T, f) for I in I_values]
plt.plot(I_values, theory, 'ro-', markersize=8, label='Binomial formula')

plt.xlabel('Number of NTs in vesicle')
plt.ylabel('Probability')
plt.title(f'Simulation vs Theory (T = {T}, f = {f}, n = {num_vesicles} vesicles)')
plt.legend()
plt.show()

print(f"Simulated mean: {NT_counts.mean():.2f} (theory: {T * f})")
print(f"Simulated std:  {NT_counts.std():.2f} (theory: {np.sqrt(T * f * (1-f)):.2f})")

---

## 2 | Poisson Distribution: When T is large and f is small

In many biological situations:
- **T is very large** (millions of molecules)
- **f is very small** (rare event)
- But **λ = T × f** is moderate (expected count)

In this limit, the Binomial becomes the **Poisson distribution**:

$$P(I) = \frac{\lambda^I \times e^{-\lambda}}{I!}$$

**Advantage:** We only need to know λ, not T and f separately!

In [None]:
def poisson_pmf(I, lam):
    """Calculate Poisson probability P(I events) with rate λ"""
    return (lam ** I) * np.exp(-lam) / factorial(I)

# Test with small λ
lam = 4
print(f"Poisson with λ = {lam}:")
for I in range(10):
    print(f"  P({I}) = {poisson_pmf(I, lam):.4f}")

### Binomial converges to Poisson

Let's keep λ = T × f = 6 constant, but vary T and f:

In [None]:
# Compare Binomial and Poisson
lam = 6  # expected number of events

# Different ways to get λ = 6
scenarios = [
    (20, 0.30),    # T=20, f=0.30
    (60, 0.10),    # T=60, f=0.10
    (600, 0.01),   # T=600, f=0.01
]

plt.figure(figsize=(10, 5))
I_values = np.arange(0, 18)

# Plot Poisson (black line)
poisson_probs = [poisson_pmf(I, lam) for I in I_values]
plt.plot(I_values, poisson_probs, 'k-', linewidth=3, label=f'Poisson (λ={lam})')

# Plot Binomials (colored markers)
colors = ['red', 'orange', 'green']
for (T, f), color in zip(scenarios, colors):
    binom_probs = [binomial_pmf(I, T, f) for I in I_values]
    plt.plot(I_values, binom_probs, 'o', color=color, markersize=8, alpha=0.7,
             label=f'Binomial (T={T}, f={f})')

plt.xlabel('Number of successes (I)')
plt.ylabel('Probability')
plt.title(f'Binomial → Poisson as T↑ and f↓ (all have λ = T×f = {lam})')
plt.legend()
plt.show()

**Key observation:** As T gets larger and f gets smaller (keeping λ = T×f constant), the Binomial converges to the Poisson!

The green dots (T=600, f=0.01) are almost exactly on the black Poisson line.

### Visualizing Poisson for different λ values

In [None]:
plt.figure(figsize=(10, 5))

for lam in [1, 4, 10]:
    I_values = np.arange(0, 20)
    probs = [poisson_pmf(I, lam) for I in I_values]
    plt.plot(I_values, probs, 'o-', markersize=6, label=f'λ = {lam}')

plt.xlabel('Number of events (I)')
plt.ylabel('Probability')
plt.title('Poisson Distribution for different λ')
plt.legend()
plt.show()

print("Notice: Mean = λ, and Std Dev = √λ")
print("  λ=1:  mean=1, std=1.0")
print("  λ=4:  mean=4, std=2.0")
print("  λ=10: mean=10, std=3.2")

---

## 3 | When to use each distribution?

| Distribution | Use when | Parameters | Examples |
|--------------|----------|------------|----------|
| **Binomial** | Fixed number of trials T, know f | T, f | Coin flips, cells with mutation, SNP genotyping |
| **Poisson** | Rare events, T unknown or very large | λ only | Sequencing reads, mutations per genome, radioactive decay |

**Rule of thumb:** Use Poisson when T > 100 and f < 0.01

---

## Summary

### Binomial Distribution
- **Process:** Count successes in T independent trials
- **Parameters:** T (# trials), f (success probability)
- **Formula:** $P(I) = f^I (1-f)^{T-I} \binom{T}{I}$
- **Mean:** T × f
- **Std Dev:** $\sqrt{T \times f \times (1-f)}$

### Poisson Distribution  
- **Process:** Limit of Binomial when T → ∞, f → 0, T×f = λ
- **Parameter:** λ (expected count)
- **Formula:** $P(I) = \frac{\lambda^I e^{-\lambda}}{I!}$
- **Mean:** λ
- **Std Dev:** $\sqrt{\lambda}$