# Probability Distributions Tutorial

This notebook teaches fundamental probability concepts and distributions.

We will cover:
1. Bayes Theorem
2. Binomial Distribution
3. Poisson Distribution
4. Normal Distribution
5. Lognormal Distribution

In [1]:
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy.special import comb
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Configure matplotlib
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (10, 6)

## 1. Bayes Theorem

### Conditional Probability

Conditional probability measures the chance of an event occurring given that another event has occurred.

**Notation:** P(A|B) means "probability of A given B"

**Definition:**
$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$

Where:
- P(A|B) is the probability of A given B
- P(A ∩ B) is the probability of both A and B
- P(B) is the probability of B

### Bayes Theorem in Words

Bayes theorem lets us reverse conditional probabilities.

It tells us: if we know P(B|A), we can find P(A|B).

This is useful when one direction is easy to measure but we want the other.

### Bayes Theorem Formula

$$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$$

Where:
- P(A|B) is the **posterior** probability (what we want to find)
- P(B|A) is the **likelihood** (probability of seeing B if A is true)
- P(A) is the **prior** probability (initial belief about A)
- P(B) is the **marginal** probability (total probability of B)

### Derivation of Bayes Theorem

Let's derive Bayes theorem step by step.

**Step 1:** Start with the definition of conditional probability:
$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$

**Step 2:** Similarly, we can write:
$$P(B|A) = \frac{P(A \cap B)}{P(A)}$$

**Step 3:** Rearrange Step 2 to solve for P(A ∩ B):
$$P(A \cap B) = P(B|A) \cdot P(A)$$

**Step 4:** Substitute this into Step 1:
$$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$$

This is Bayes theorem!

### Example 1: Balls in a Box

**Problem:**
- Box 1 has 3 red balls and 2 blue balls
- Box 2 has 1 red ball and 4 blue balls
- We pick a box at random (50% chance each)
- We draw a blue ball
- What is the probability it came from Box 1?

**Solution:**
- Let A = "Ball from Box 1"
- Let B = "Blue ball drawn"
- We want P(A|B)

**Step 1:** Find P(A) = 0.5 (prior)

**Step 2:** Find P(B|A) = 2/5 = 0.4 (likelihood)

**Step 3:** Find P(B) using law of total probability:
$$P(B) = P(B|A) \cdot P(A) + P(B|A^c) \cdot P(A^c)$$
$$P(B) = 0.4 \times 0.5 + 0.8 \times 0.5 = 0.6$$

**Step 4:** Apply Bayes theorem:
$$P(A|B) = \frac{0.4 \times 0.5}{0.6} = \frac{0.2}{0.6} = 0.333$$

In [2]:
# Example 1: Analytical calculation
print("Example 1: Balls in a Box")
print("=" * 40)

# Given probabilities
p_box1 = 0.5  # Prior: P(Box 1)
p_box2 = 0.5  # Prior: P(Box 2)

p_blue_given_box1 = 2/5  # Likelihood: P(Blue | Box 1)
p_blue_given_box2 = 4/5  # Likelihood: P(Blue | Box 2)

# Calculate P(Blue) - marginal probability
p_blue = p_blue_given_box1 * p_box1 + p_blue_given_box2 * p_box2
print(f"P(Blue) = {p_blue:.4f}")

# Apply Bayes theorem
p_box1_given_blue = (p_blue_given_box1 * p_box1) / p_blue
print(f"\nP(Box 1 | Blue) = {p_box1_given_blue:.4f}")
print(f"P(Box 1 | Blue) = {p_box1_given_blue:.1%}")

Example 1: Balls in a Box
P(Blue) = 0.6000

P(Box 1 | Blue) = 0.3333
P(Box 1 | Blue) = 33.3%


In [3]:
# Example 1: Simulation verification
print("\nSimulation Verification:")
print("=" * 40)

n_trials = 100000
box1_count = 0
blue_count = 0

for _ in range(n_trials):
    # Randomly choose a box
    box = np.random.choice([1, 2])
    
    # Draw a ball from the chosen box
    if box == 1:
        # Box 1: 3 red, 2 blue
        ball = np.random.choice(['red', 'blue'], p=[3/5, 2/5])
    else:
        # Box 2: 1 red, 4 blue
        ball = np.random.choice(['red', 'blue'], p=[1/5, 4/5])
    
    # Count blue balls and which box they came from
    if ball == 'blue':
        blue_count += 1
        if box == 1:
            box1_count += 1

# Calculate simulated probability
simulated_prob = box1_count / blue_count
print(f"Simulated P(Box 1 | Blue) = {simulated_prob:.4f}")
print(f"Theoretical P(Box 1 | Blue) = {p_box1_given_blue:.4f}")
print(f"Difference = {abs(simulated_prob - p_box1_given_blue):.4f}")


Simulation Verification:
Simulated P(Box 1 | Blue) = 0.3367
Theoretical P(Box 1 | Blue) = 0.3333
Difference = 0.0033


### Example 2: Medical Test Problem

**Problem:**
A disease affects 1% of the population.

A test for the disease has:
- **Sensitivity** = 95% (true positive rate)
- **Specificity** = 90% (true negative rate)

If you test positive, what is the probability you have the disease?

**Solution:**
- Let D = "Has disease"
- Let T = "Tests positive"
- We want P(D|T)

**Given:**
- P(D) = 0.01 (prevalence/prior)
- P(T|D) = 0.95 (sensitivity)
- P(T^c|D^c) = 0.90 (specificity)
- Therefore, P(T|D^c) = 0.10 (false positive rate)

**Calculate P(T):**
$$P(T) = P(T|D) \cdot P(D) + P(T|D^c) \cdot P(D^c)$$
$$P(T) = 0.95 \times 0.01 + 0.10 \times 0.99 = 0.0095 + 0.099 = 0.1085$$

**Apply Bayes theorem:**
$$P(D|T) = \frac{P(T|D) \cdot P(D)}{P(T)} = \frac{0.95 \times 0.01}{0.1085} = 0.0876$$

**Result:** Only 8.76% chance of having the disease despite testing positive!

In [4]:
# Example 2: Analytical calculation
print("Example 2: Medical Test")
print("=" * 40)

# Given probabilities
p_disease = 0.01  # Prior: prevalence
sensitivity = 0.95  # P(Test+ | Disease)
specificity = 0.90  # P(Test- | No Disease)

# Derived probabilities
p_no_disease = 1 - p_disease
p_test_pos_given_disease = sensitivity
p_test_pos_given_no_disease = 1 - specificity  # False positive rate

# Calculate P(Test+) - marginal probability
p_test_pos = (p_test_pos_given_disease * p_disease + 
              p_test_pos_given_no_disease * p_no_disease)

print(f"P(Test Positive) = {p_test_pos:.4f}")

# Apply Bayes theorem
p_disease_given_test_pos = (p_test_pos_given_disease * p_disease) / p_test_pos

print(f"\nP(Disease | Test+) = {p_disease_given_test_pos:.4f}")
print(f"P(Disease | Test+) = {p_disease_given_test_pos:.2%}")

print("\nKey insight: Even with a positive test, the probability")
print("of having the disease is less than 9% because the disease")
print("is rare (1% prevalence).")

Example 2: Medical Test
P(Test Positive) = 0.1085

P(Disease | Test+) = 0.0876
P(Disease | Test+) = 8.76%

Key insight: Even with a positive test, the probability
of having the disease is less than 9% because the disease
is rare (1% prevalence).


In [None]:
# Example 2: Simulation verification
print("\nSimulation Verification:")
print("=" * 40)

n_people = 100000

# Generate population
has_disease = np.random.random(n_people) < p_disease

# Generate test results
test_positive = np.zeros(n_people, dtype=bool)

# For people with disease: 95% test positive
test_positive[has_disease] = np.random.random(np.sum(has_disease)) < sensitivity

# For people without disease: 10% test positive (false positive)
test_positive[~has_disease] = np.random.random(np.sum(~has_disease)) < (1 - specificity)

# Calculate P(Disease | Test+)
disease_and_positive = np.sum(has_disease & test_positive)
total_positive = np.sum(test_positive)
simulated_prob = disease_and_positive / total_positive

print(f"Simulated P(Disease | Test+) = {simulated_prob:.4f}")
print(f"Theoretical P(Disease | Test+) = {p_disease_given_test_pos:.4f}")
print(f"Difference = {abs(simulated_prob - p_disease_given_test_pos):.4f}")

In [None]:
# Visualization: Prior vs Posterior
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Example 1: Balls in a Box
categories = ['Box 1', 'Box 2']
prior = [0.5, 0.5]
posterior = [p_box1_given_blue, 1 - p_box1_given_blue]

x = np.arange(len(categories))
width = 0.35

ax1.bar(x - width/2, prior, width, label='Prior', alpha=0.8, color='steelblue')
ax1.bar(x + width/2, posterior, width, label='Posterior (given Blue)', alpha=0.8, color='coral')
ax1.set_ylabel('Probability')
ax1.set_title('Example 1: Prior vs Posterior Probabilities')
ax1.set_xticks(x)
ax1.set_xticklabels(categories)
ax1.legend()
ax1.set_ylim(0, 0.8)
ax1.grid(axis='y', alpha=0.3)

# Add value labels on bars
for i, (p, post) in enumerate(zip(prior, posterior)):
    ax1.text(i - width/2, p + 0.02, f'{p:.2f}', ha='center', fontsize=10)
    ax1.text(i + width/2, post + 0.02, f'{post:.2f}', ha='center', fontsize=10)

# Example 2: Medical Test
categories2 = ['Disease', 'No Disease']
prior2 = [p_disease, p_no_disease]
posterior2 = [p_disease_given_test_pos, 1 - p_disease_given_test_pos]

x2 = np.arange(len(categories2))

ax2.bar(x2 - width/2, prior2, width, label='Prior', alpha=0.8, color='steelblue')
ax2.bar(x2 + width/2, posterior2, width, label='Posterior (given Test+)', alpha=0.8, color='coral')
ax2.set_ylabel('Probability')
ax2.set_title('Example 2: Prior vs Posterior Probabilities')
ax2.set_xticks(x2)
ax2.set_xticklabels(categories2)
ax2.legend()
ax2.set_ylim(0, 1.0)
ax2.grid(axis='y', alpha=0.3)

# Add value labels on bars
for i, (p, post) in enumerate(zip(prior2, posterior2)):
    ax2.text(i - width/2, p + 0.02, f'{p:.3f}', ha='center', fontsize=10)
    ax2.text(i + width/2, post + 0.02, f'{post:.3f}', ha='center', fontsize=10)

plt.tight_layout()
plt.show()

---

## 2. Binomial Distribution

### Setup and Assumptions

The binomial distribution models the number of successes in a fixed number of independent trials.

**Requirements:**
- Fixed number of trials (n)
- Each trial has only two outcomes (success or failure)
- Trials are independent
- Probability of success (p) is constant

**Examples:**
- Number of heads in 10 coin flips
- Number of defective items in a batch
- Number of students who pass an exam

### Probability Mass Function (PMF)

The probability of getting exactly k successes in n trials is:

$$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$$

Where:
- n = number of trials
- k = number of successes
- p = probability of success on each trial
- $\binom{n}{k} = \frac{n!}{k!(n-k)!}$ is the binomial coefficient

**Mean:** $\mu = np$

**Variance:** $\sigma^2 = np(1-p)$

### Example 1: Coin Flips

**Problem:** Flip a fair coin 10 times. What is the probability of getting exactly 7 heads?

**Given:**
- n = 10 trials
- k = 7 successes
- p = 0.5 (fair coin)

**Solution:**
$$P(X = 7) = \binom{10}{7} (0.5)^7 (0.5)^3 = 120 \times 0.0078125 \times 0.125 = 0.117$$

In [None]:
# Example 1: Coin flips
print("Example 1: Coin Flips")
print("=" * 40)

n = 10  # Number of trials
k = 7   # Number of successes
p = 0.5 # Probability of success

# Calculate using formula
binom_coeff = comb(n, k, exact=True)
prob = binom_coeff * (p ** k) * ((1 - p) ** (n - k))

print(f"n = {n} flips")
print(f"k = {k} heads")
print(f"p = {p}")
print(f"\nBinomial coefficient C({n},{k}) = {binom_coeff}")
print(f"P(X = {k}) = {prob:.4f}")
print(f"P(X = {k}) = {prob:.2%}")

# Using scipy
prob_scipy = stats.binom.pmf(k, n, p)
print(f"\nUsing scipy: P(X = {k}) = {prob_scipy:.4f}")

### Example 2: Quality Control

**Problem:** A factory produces items with a 5% defect rate. If we inspect 20 items, what is the probability of finding exactly 2 defective items?

**Given:**
- n = 20 trials
- k = 2 defects
- p = 0.05 (defect rate)

In [None]:
# Example 2: Quality control
print("Example 2: Quality Control")
print("=" * 40)

n = 20   # Number of items
k = 2    # Number of defects
p = 0.05 # Defect rate

# Calculate probability
prob = stats.binom.pmf(k, n, p)

print(f"n = {n} items inspected")
print(f"k = {k} defects")
print(f"p = {p} (defect rate)")
print(f"\nP(X = {k}) = {prob:.4f}")
print(f"P(X = {k}) = {prob:.2%}")

# Calculate expected value and variance
mean = n * p
variance = n * p * (1 - p)
std = np.sqrt(variance)

print(f"\nExpected defects: {mean:.2f}")
print(f"Standard deviation: {std:.2f}")

In [None]:
# Plot PMF for different values of n and p
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Different parameter combinations
params = [
    (10, 0.5, 'n=10, p=0.5'),
    (20, 0.3, 'n=20, p=0.3'),
    (30, 0.7, 'n=30, p=0.7'),
    (50, 0.2, 'n=50, p=0.2')
]

for ax, (n, p, title) in zip(axes.flat, params):
    k_values = np.arange(0, n + 1)
    pmf_values = stats.binom.pmf(k_values, n, p)
    
    ax.bar(k_values, pmf_values, alpha=0.7, color='steelblue', edgecolor='black')
    ax.set_xlabel('Number of Successes (k)')
    ax.set_ylabel('Probability')
    ax.set_title(f'Binomial PMF: {title}')
    ax.grid(axis='y', alpha=0.3)
    
    # Add mean line
    mean = n * p
    ax.axvline(mean, color='red', linestyle='--', linewidth=2, 
               label=f'Mean = {mean:.1f}')
    ax.legend()

plt.tight_layout()
plt.show()

In [None]:
# Simulation to verify convergence to theoretical PMF
print("Simulation: Binomial Distribution Convergence")
print("=" * 40)

n = 20  # Number of trials
p = 0.3 # Probability of success
n_simulations = 10000

# Run simulations
simulated_outcomes = np.random.binomial(n, p, n_simulations)

# Calculate theoretical probabilities
k_values = np.arange(0, n + 1)
theoretical_probs = stats.binom.pmf(k_values, n, p)

# Calculate simulated probabilities
simulated_probs = np.array([np.sum(simulated_outcomes == k) / n_simulations 
                            for k in k_values])

# Plot comparison
fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(len(k_values))
width = 0.35

ax.bar(x - width/2, theoretical_probs, width, label='Theoretical', 
       alpha=0.8, color='steelblue')
ax.bar(x + width/2, simulated_probs, width, label='Simulated', 
       alpha=0.8, color='coral')

ax.set_xlabel('Number of Successes')
ax.set_ylabel('Probability')
ax.set_title(f'Binomial Distribution: Theoretical vs Simulated\n(n={n}, p={p}, {n_simulations} simulations)')
ax.set_xticks(x)
ax.set_xticklabels(k_values)
ax.legend()
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate mean absolute difference
mae = np.mean(np.abs(theoretical_probs - simulated_probs))
print(f"\nMean absolute error: {mae:.6f}")
print("The simulation matches the theoretical distribution very well!")

---

## 3. Poisson Distribution

### What is the Poisson Distribution?

The Poisson distribution models the number of events occurring in a fixed interval of time or space.

**Key assumptions:**
- Events occur independently
- The rate of occurrence is constant
- Two events cannot occur at exactly the same instant

**Rate parameter λ (lambda):**
- λ represents the average number of events in the interval
- λ must be positive
- λ is both the mean and variance

### Probability Mass Function (PMF)

The probability of observing exactly k events is:

$$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$$

Where:
- λ = average rate (mean number of events)
- k = actual number of events (0, 1, 2, ...)
- e ≈ 2.71828 (Euler's number)

**Mean:** $\mu = \lambda$

**Variance:** $\sigma^2 = \lambda$

### Example 1: Customer Arrivals

**Problem:** A store receives an average of 4 customers per hour. What is the probability of getting exactly 6 customers in the next hour?

**Given:**
- λ = 4 customers per hour
- k = 6 customers

In [None]:
# Example 1: Customer arrivals
print("Example 1: Customer Arrivals")
print("=" * 40)

lambda_rate = 4  # Average arrivals per hour
k = 6            # Actual arrivals

# Calculate probability
prob = stats.poisson.pmf(k, lambda_rate)

print(f"λ = {lambda_rate} customers/hour (average)")
print(f"k = {k} customers")
print(f"\nP(X = {k}) = {prob:.4f}")
print(f"P(X = {k}) = {prob:.2%}")

# Calculate probability of at least 6 customers
prob_at_least_6 = 1 - stats.poisson.cdf(5, lambda_rate)
print(f"\nP(X ≥ 6) = {prob_at_least_6:.4f}")
print(f"P(X ≥ 6) = {prob_at_least_6:.2%}")

### Example 2: Manufacturing Defects

**Problem:** A manufacturing process produces an average of 2.5 defects per 100 units. What is the probability of finding exactly 3 defects in the next 100 units?

**Given:**
- λ = 2.5 defects per 100 units
- k = 3 defects

In [None]:
# Example 2: Manufacturing defects
print("Example 2: Manufacturing Defects")
print("=" * 40)

lambda_rate = 2.5  # Average defects per 100 units
k = 3              # Actual defects

# Calculate probability
prob = stats.poisson.pmf(k, lambda_rate)

print(f"λ = {lambda_rate} defects/100 units (average)")
print(f"k = {k} defects")
print(f"\nP(X = {k}) = {prob:.4f}")
print(f"P(X = {k}) = {prob:.2%}")

# Calculate probability of 0 defects (quality control)
prob_zero = stats.poisson.pmf(0, lambda_rate)
print(f"\nP(X = 0) = {prob_zero:.4f}")
print(f"Probability of no defects: {prob_zero:.2%}")

In [None]:
# Plot PMF for different lambda values
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

lambda_values = [1, 3, 5, 10]
k_max = 20

for ax, lam in zip(axes.flat, lambda_values):
    k_values = np.arange(0, k_max + 1)
    pmf_values = stats.poisson.pmf(k_values, lam)
    
    ax.bar(k_values, pmf_values, alpha=0.7, color='green', edgecolor='black')
    ax.set_xlabel('Number of Events (k)')
    ax.set_ylabel('Probability')
    ax.set_title(f'Poisson PMF: λ = {lam}')
    ax.grid(axis='y', alpha=0.3)
    
    # Add mean line
    ax.axvline(lam, color='red', linestyle='--', linewidth=2, 
               label=f'Mean = {lam}')
    ax.legend()

plt.tight_layout()
plt.show()

In [None]:
# Simulation to verify Poisson distribution
print("Simulation: Poisson Distribution Convergence")
print("=" * 40)

lambda_rate = 5  # Average events
n_simulations = 10000

# Run simulations
simulated_outcomes = np.random.poisson(lambda_rate, n_simulations)

# Calculate theoretical probabilities
k_max = 15
k_values = np.arange(0, k_max + 1)
theoretical_probs = stats.poisson.pmf(k_values, lambda_rate)

# Calculate simulated probabilities
simulated_probs = np.array([np.sum(simulated_outcomes == k) / n_simulations 
                            for k in k_values])

# Plot comparison
fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(len(k_values))
width = 0.35

ax.bar(x - width/2, theoretical_probs, width, label='Theoretical', 
       alpha=0.8, color='green')
ax.bar(x + width/2, simulated_probs, width, label='Simulated', 
       alpha=0.8, color='orange')

ax.set_xlabel('Number of Events')
ax.set_ylabel('Probability')
ax.set_title(f'Poisson Distribution: Theoretical vs Simulated\n(λ={lambda_rate}, {n_simulations} simulations)')
ax.set_xticks(x)
ax.set_xticklabels(k_values)
ax.legend()
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate statistics
print(f"\nSimulated mean: {np.mean(simulated_outcomes):.3f}")
print(f"Theoretical mean: {lambda_rate}")
print(f"\nSimulated variance: {np.var(simulated_outcomes):.3f}")
print(f"Theoretical variance: {lambda_rate}")

---

## 4. Normal Distribution

### What is the Normal Distribution?

The normal distribution is the most important continuous probability distribution.

It is also called the Gaussian distribution or bell curve.

**Key properties:**
- Symmetric around the mean
- Bell-shaped curve
- Mean, median, and mode are equal
- Defined by two parameters: mean (μ) and standard deviation (σ)

### Probability Density Function (PDF)

$$f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$$

Where:
- μ (mu) = mean (center of distribution)
- σ (sigma) = standard deviation (spread of distribution)
- x = value for which we calculate probability density

**Mean:** $\mu$

**Variance:** $\sigma^2$

### Example 1: Test Scores

**Problem:** Test scores are normally distributed with mean 75 and standard deviation 10. What is the probability a student scores between 70 and 80?

**Given:**
- μ = 75
- σ = 10
- We want P(70 < X < 80)

In [None]:
# Example 1: Test scores
print("Example 1: Test Scores")
print("=" * 40)

mu = 75     # Mean
sigma = 10  # Standard deviation

# Calculate probability between 70 and 80
prob = stats.norm.cdf(80, mu, sigma) - stats.norm.cdf(70, mu, sigma)

print(f"Mean: μ = {mu}")
print(f"Standard deviation: σ = {sigma}")
print(f"\nP(70 < X < 80) = {prob:.4f}")
print(f"P(70 < X < 80) = {prob:.2%}")

# Calculate probability of scoring above 90
prob_above_90 = 1 - stats.norm.cdf(90, mu, sigma)
print(f"\nP(X > 90) = {prob_above_90:.4f}")
print(f"P(X > 90) = {prob_above_90:.2%}")

### Example 2: Heights

**Problem:** Adult male heights are normally distributed with mean 70 inches and standard deviation 3 inches. What height corresponds to the 95th percentile?

**Given:**
- μ = 70 inches
- σ = 3 inches
- We want the value x where P(X < x) = 0.95

In [None]:
# Example 2: Heights
print("Example 2: Heights")
print("=" * 40)

mu = 70    # Mean height
sigma = 3  # Standard deviation

# Find 95th percentile
height_95 = stats.norm.ppf(0.95, mu, sigma)

print(f"Mean: μ = {mu} inches")
print(f"Standard deviation: σ = {sigma} inches")
print(f"\n95th percentile: {height_95:.2f} inches")
print(f"\nThis means 95% of men are shorter than {height_95:.2f} inches")

# Verify
prob = stats.norm.cdf(height_95, mu, sigma)
print(f"\nVerification: P(X < {height_95:.2f}) = {prob:.4f}")

In [None]:
# Plot PDF for different parameters
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

x = np.linspace(-10, 10, 1000)

# Different means, same variance
for mu in [-2, 0, 2]:
    pdf = stats.norm.pdf(x, mu, 1)
    ax1.plot(x, pdf, linewidth=2, label=f'μ={mu}, σ=1')

ax1.set_xlabel('x')
ax1.set_ylabel('Probability Density')
ax1.set_title('Normal Distribution: Different Means')
ax1.legend()
ax1.grid(alpha=0.3)

# Same mean, different variances
x2 = np.linspace(-15, 15, 1000)
for sigma in [1, 2, 3]:
    pdf = stats.norm.pdf(x2, 0, sigma)
    ax2.plot(x2, pdf, linewidth=2, label=f'μ=0, σ={sigma}')

ax2.set_xlabel('x')
ax2.set_ylabel('Probability Density')
ax2.set_title('Normal Distribution: Different Standard Deviations')
ax2.legend()
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Simulation: Generate normal distribution samples
print("Simulation: Normal Distribution")
print("=" * 40)

mu = 100
sigma = 15
n_samples = 10000

# Generate samples
samples = np.random.normal(mu, sigma, n_samples)

# Plot histogram and theoretical PDF
fig, ax = plt.subplots(figsize=(10, 6))

# Histogram of samples
ax.hist(samples, bins=50, density=True, alpha=0.7, color='skyblue', 
        edgecolor='black', label='Simulated data')

# Theoretical PDF
x_range = np.linspace(mu - 4*sigma, mu + 4*sigma, 1000)
pdf_theoretical = stats.norm.pdf(x_range, mu, sigma)
ax.plot(x_range, pdf_theoretical, 'r-', linewidth=2, 
        label='Theoretical PDF')

ax.set_xlabel('Value')
ax.set_ylabel('Probability Density')
ax.set_title(f'Normal Distribution: Simulated vs Theoretical\n(μ={mu}, σ={sigma}, n={n_samples})')
ax.legend()
ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Compare statistics
print(f"Simulated mean: {np.mean(samples):.2f}")
print(f"Theoretical mean: {mu}")
print(f"\nSimulated std: {np.std(samples, ddof=1):.2f}")
print(f"Theoretical std: {sigma}")

### The Empirical Rule (68-95-99.7 Rule)

For normal distributions:
- About 68% of data falls within 1 standard deviation of the mean
- About 95% of data falls within 2 standard deviations of the mean
- About 99.7% of data falls within 3 standard deviations of the mean

In [None]:
# Demonstrate empirical rule
print("Empirical Rule Demonstration")
print("=" * 40)

mu = 0
sigma = 1

# Calculate probabilities
prob_1sd = stats.norm.cdf(mu + sigma, mu, sigma) - stats.norm.cdf(mu - sigma, mu, sigma)
prob_2sd = stats.norm.cdf(mu + 2*sigma, mu, sigma) - stats.norm.cdf(mu - 2*sigma, mu, sigma)
prob_3sd = stats.norm.cdf(mu + 3*sigma, mu, sigma) - stats.norm.cdf(mu - 3*sigma, mu, sigma)

print(f"P(μ - σ < X < μ + σ) = {prob_1sd:.4f} ≈ 68%")
print(f"P(μ - 2σ < X < μ + 2σ) = {prob_2sd:.4f} ≈ 95%")
print(f"P(μ - 3σ < X < μ + 3σ) = {prob_3sd:.4f} ≈ 99.7%")

# Visualize empirical rule
fig, ax = plt.subplots(figsize=(12, 6))

x = np.linspace(-4, 4, 1000)
pdf = stats.norm.pdf(x, mu, sigma)

ax.plot(x, pdf, 'k-', linewidth=2)

# Shade regions
x_1sd = x[(x >= -1) & (x <= 1)]
ax.fill_between(x_1sd, stats.norm.pdf(x_1sd, mu, sigma), alpha=0.3, 
                color='blue', label='±1σ (68%)')

x_2sd = x[(x >= -2) & (x <= -1)]
ax.fill_between(x_2sd, stats.norm.pdf(x_2sd, mu, sigma), alpha=0.3, color='green')
x_2sd = x[(x >= 1) & (x <= 2)]
ax.fill_between(x_2sd, stats.norm.pdf(x_2sd, mu, sigma), alpha=0.3, 
                color='green', label='±2σ (95%)')

x_3sd = x[(x >= -3) & (x <= -2)]
ax.fill_between(x_3sd, stats.norm.pdf(x_3sd, mu, sigma), alpha=0.3, color='red')
x_3sd = x[(x >= 2) & (x <= 3)]
ax.fill_between(x_3sd, stats.norm.pdf(x_3sd, mu, sigma), alpha=0.3, 
                color='red', label='±3σ (99.7%)')

# Add vertical lines
for i in range(-3, 4):
    ax.axvline(i, color='gray', linestyle='--', alpha=0.5)
    if i != 0:
        ax.text(i, -0.02, f'{i}σ', ha='center', fontsize=10)

ax.set_xlabel('Standard Deviations from Mean')
ax.set_ylabel('Probability Density')
ax.set_title('Empirical Rule for Normal Distribution')
ax.legend()
ax.grid(alpha=0.3)
ax.set_ylim(bottom=-0.05)

plt.tight_layout()
plt.show()

---

## 5. Lognormal Distribution

### What is the Lognormal Distribution?

A random variable X is lognormally distributed if Y = ln(X) is normally distributed.

In other words:
- If Y ~ Normal(μ, σ²)
- Then X = exp(Y) ~ Lognormal(μ, σ²)

**Key properties:**
- X is always positive (X > 0)
- Distribution is right-skewed
- Useful for modeling quantities that are products of many small factors

### Parameters and PDF

The lognormal distribution has two parameters:
- μ (mu): mean of the underlying normal distribution
- σ (sigma): standard deviation of the underlying normal distribution

**Probability Density Function:**

$$f(x) = \frac{1}{x\sigma\sqrt{2\pi}} e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}}$$

For x > 0

**Mean of X:** $E[X] = e^{\mu + \sigma^2/2}$

**Variance of X:** $\text{Var}(X) = (e^{\sigma^2} - 1)e^{2\mu + \sigma^2}$

### Example: Income Distribution

**Problem:** Income in a population follows a lognormal distribution with μ = 10.5 and σ = 0.5 (these are parameters of the underlying normal). What is the median income?

**Note:** For lognormal, the median equals exp(μ)

In [None]:
# Example: Income distribution
print("Example: Income Distribution")
print("=" * 40)

mu = 10.5   # Mean of underlying normal
sigma = 0.5 # Std of underlying normal

# Calculate median (= exp(mu) for lognormal)
median = np.exp(mu)
print(f"Parameters of underlying normal: μ = {mu}, σ = {sigma}")
print(f"\nMedian income: ${median:,.2f}")

# Calculate mean of lognormal
mean = np.exp(mu + sigma**2 / 2)
print(f"Mean income: ${mean:,.2f}")

# Calculate variance
variance = (np.exp(sigma**2) - 1) * np.exp(2*mu + sigma**2)
std = np.sqrt(variance)
print(f"\nStandard deviation: ${std:,.2f}")

# Note: Mean > Median (right-skewed)
print(f"\nNote: Mean (${mean:,.2f}) > Median (${median:,.2f})")
print("This shows the distribution is right-skewed.")

In [None]:
# Plot lognormal PDF for different parameters
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

x = np.linspace(0.01, 10, 1000)

# Different means, same sigma
for mu in [0, 0.5, 1.0]:
    pdf = stats.lognorm.pdf(x, s=0.5, scale=np.exp(mu))
    ax1.plot(x, pdf, linewidth=2, label=f'μ={mu}, σ=0.5')

ax1.set_xlabel('x')
ax1.set_ylabel('Probability Density')
ax1.set_title('Lognormal Distribution: Different μ')
ax1.legend()
ax1.grid(alpha=0.3)

# Same mean, different sigmas
x2 = np.linspace(0.01, 15, 1000)
for sigma in [0.25, 0.5, 1.0]:
    pdf = stats.lognorm.pdf(x2, s=sigma, scale=np.exp(1))
    ax2.plot(x2, pdf, linewidth=2, label=f'μ=1, σ={sigma}')

ax2.set_xlabel('x')
ax2.set_ylabel('Probability Density')
ax2.set_title('Lognormal Distribution: Different σ')
ax2.legend()
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Simulation: Generate lognormal data and compare with theory
print("Simulation: Lognormal Distribution")
print("=" * 40)

mu = 1.0
sigma = 0.5
n_samples = 10000

# Method 1: Generate from normal and exponentiate
normal_samples = np.random.normal(mu, sigma, n_samples)
lognormal_samples = np.exp(normal_samples)

# Method 2: Use numpy's lognormal function (equivalent)
lognormal_samples_direct = np.random.lognormal(mu, sigma, n_samples)

print(f"Parameters: μ = {mu}, σ = {sigma}")
print(f"Number of samples: {n_samples}")

# Theoretical values
theoretical_median = np.exp(mu)
theoretical_mean = np.exp(mu + sigma**2 / 2)

print(f"\nTheoretical median: {theoretical_median:.4f}")
print(f"Simulated median: {np.median(lognormal_samples):.4f}")
print(f"\nTheoretical mean: {theoretical_mean:.4f}")
print(f"Simulated mean: {np.mean(lognormal_samples):.4f}")

# Plot histogram with theoretical curve
fig, ax = plt.subplots(figsize=(10, 6))

# Histogram
ax.hist(lognormal_samples, bins=50, density=True, alpha=0.7, 
        color='purple', edgecolor='black', label='Simulated data')

# Theoretical PDF
x_range = np.linspace(0.01, 10, 1000)
pdf_theoretical = stats.lognorm.pdf(x_range, s=sigma, scale=np.exp(mu))
ax.plot(x_range, pdf_theoretical, 'r-', linewidth=2, 
        label='Theoretical PDF')

# Add mean and median lines
ax.axvline(theoretical_mean, color='blue', linestyle='--', linewidth=2, 
           label=f'Mean = {theoretical_mean:.2f}')
ax.axvline(theoretical_median, color='green', linestyle='--', linewidth=2, 
           label=f'Median = {theoretical_median:.2f}')

ax.set_xlabel('Value')
ax.set_ylabel('Probability Density')
ax.set_title(f'Lognormal Distribution: Simulated vs Theoretical\n(μ={mu}, σ={sigma}, n={n_samples})')
ax.legend()
ax.grid(alpha=0.3)
ax.set_xlim(0, 8)

plt.tight_layout()
plt.show()

In [None]:
# Demonstrate the relationship: log(Lognormal) = Normal
print("Relationship between Lognormal and Normal")
print("=" * 40)

mu = 0
sigma = 1
n_samples = 5000

# Generate normal samples
normal_samples = np.random.normal(mu, sigma, n_samples)

# Transform to lognormal
lognormal_samples = np.exp(normal_samples)

# Create subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Plot normal distribution
ax1.hist(normal_samples, bins=40, density=True, alpha=0.7, 
         color='skyblue', edgecolor='black')
x_normal = np.linspace(-4, 4, 1000)
ax1.plot(x_normal, stats.norm.pdf(x_normal, mu, sigma), 'r-', linewidth=2)
ax1.set_xlabel('Y')
ax1.set_ylabel('Probability Density')
ax1.set_title(f'Y ~ Normal(μ={mu}, σ={sigma})')
ax1.grid(alpha=0.3)

# Plot lognormal distribution
ax2.hist(lognormal_samples, bins=40, density=True, alpha=0.7, 
         color='purple', edgecolor='black')
x_lognormal = np.linspace(0.01, 10, 1000)
ax2.plot(x_lognormal, stats.lognorm.pdf(x_lognormal, s=sigma, scale=np.exp(mu)), 
         'r-', linewidth=2)
ax2.set_xlabel('X = exp(Y)')
ax2.set_ylabel('Probability Density')
ax2.set_title(f'X ~ Lognormal(μ={mu}, σ={sigma})')
ax2.grid(alpha=0.3)
ax2.set_xlim(0, 8)

plt.tight_layout()
plt.show()

print("\nKey insight: If Y is normal, then X = exp(Y) is lognormal.")
print("Equivalently: If X is lognormal, then Y = ln(X) is normal.")

---

## Summary

### When to Use Each Distribution

**Binomial Distribution:**
- Fixed number of independent trials
- Each trial has two outcomes (success/failure)
- Constant probability of success
- Examples: coin flips, quality control, survey responses

**Poisson Distribution:**
- Count of events in a fixed interval
- Events occur independently at a constant rate
- Examples: customer arrivals, defects per unit, calls per hour

**Normal Distribution:**
- Continuous data that clusters around a mean
- Symmetric, bell-shaped distribution
- Central Limit Theorem applies
- Examples: heights, test scores, measurement errors

**Lognormal Distribution:**
- Positive continuous data that is right-skewed
- Result of multiplying many small random effects
- Log of the data is normally distributed
- Examples: income, stock prices, file sizes, city populations

### Quick Reference Table

| Distribution | Type | Parameters | Mean | Variance | Support |
|--------------|------|------------|------|----------|----------|
| Binomial | Discrete | n, p | np | np(1-p) | 0, 1, ..., n |
| Poisson | Discrete | λ | λ | λ | 0, 1, 2, ... |
| Normal | Continuous | μ, σ | μ | σ² | (-∞, ∞) |
| Lognormal | Continuous | μ, σ | e^(μ+σ²/2) | (e^(σ²)-1)e^(2μ+σ²) | (0, ∞) |

---

## Practice Problems

### Problem 1: Bayes Theorem
A bag contains 5 red marbles and 3 blue marbles. Another bag contains 2 red marbles and 6 blue marbles. You pick a bag at random and draw a marble. If the marble is red, what is the probability it came from the first bag?

**Answer:** 0.714 (or 71.4%)

### Problem 2: Binomial
A basketball player makes 70% of free throws. What is the probability of making exactly 8 out of 10 free throws?

**Answer:** 0.233 (or 23.3%)

### Problem 3: Poisson
A website receives an average of 6 visitors per minute. What is the probability of receiving exactly 4 visitors in the next minute?

**Answer:** 0.134 (or 13.4%)

### Problem 4: Normal
IQ scores are normally distributed with mean 100 and standard deviation 15. What percentage of people have IQ above 130?

**Answer:** 2.28%

### Problem 5: Lognormal
If ln(X) ~ Normal(2, 0.25²), what is the median of X?

**Answer:** e² ≈ 7.39

In [None]:
# Solutions to practice problems
print("PRACTICE PROBLEM SOLUTIONS")
print("=" * 50)

# Problem 1
print("\nProblem 1: Bayes Theorem")
p_bag1 = 0.5
p_red_given_bag1 = 5/8
p_red_given_bag2 = 2/8
p_red = p_red_given_bag1 * p_bag1 + p_red_given_bag2 * p_bag1
p_bag1_given_red = (p_red_given_bag1 * p_bag1) / p_red
print(f"Answer: {p_bag1_given_red:.3f} or {p_bag1_given_red:.1%}")

# Problem 2
print("\nProblem 2: Binomial")
prob = stats.binom.pmf(8, 10, 0.7)
print(f"Answer: {prob:.3f} or {prob:.1%}")

# Problem 3
print("\nProblem 3: Poisson")
prob = stats.poisson.pmf(4, 6)
print(f"Answer: {prob:.3f} or {prob:.1%}")

# Problem 4
print("\nProblem 4: Normal")
prob = 1 - stats.norm.cdf(130, 100, 15)
print(f"Answer: {prob:.4f} or {prob:.2%}")

# Problem 5
print("\nProblem 5: Lognormal")
median = np.exp(2)
print(f"Answer: {median:.2f}")

---

## Conclusion

Congratulations! You have learned about five fundamental probability concepts:

1. **Bayes Theorem** - updating beliefs with new evidence
2. **Binomial Distribution** - counting successes in fixed trials
3. **Poisson Distribution** - counting events in time/space
4. **Normal Distribution** - the bell curve for continuous data
5. **Lognormal Distribution** - right-skewed positive data

These distributions form the foundation of statistics and data science. Practice applying them to real-world problems to build intuition!