# 02: Probability & Statistics
Welcome to the world of uncertainty! Let’s build the probabilistic foundation used by ML models to reason under noise and variation.

## 🎯 Objectives
- Understand basic probability and conditional probability concepts
- Simulate and visualize common distributions (Normal, Binomial)
- Explore Bayes' theorem with real-world intuition and applications
- Build skills in descriptive statistics and random variable simulation
- Connect probability concepts to machine learning applications

> 💡 **Companion Reading**: This notebook pairs with [02_probability_statistics.md](02_probability_statistics.md) for deeper mathematical insights, analogies, and tutor guidance.

## 🎲 Basic Probability Concepts

Probability measures uncertainty. It ranges from 0 (impossible) to 1 (certain).

Key concepts:
- **Random Variable**: A variable whose outcome is determined by chance
- **Sample Space**: All possible outcomes
- **Event**: A subset of the sample space


In [None]:
import numpy as np

# Simulate a fair coin flip
np.random.seed(0)
flips = np.random.choice(['H', 'T'], size=10)
print("First 10 coin flips:", flips)

# Let's see how probability converges with more flips
flip_counts = [10, 100, 1000, 10000]
for n in flip_counts:
    flips = np.random.choice(['H', 'T'], size=n)
    heads_prob = np.mean(flips == 'H')
    print(f"With {n:5d} flips: P(Heads) = {heads_prob:.3f}")

print("\nAs we increase the number of trials, the probability converges to 0.5 (Law of Large Numbers)")

# Conditional probability example
# P(A|B) = P(A and B) / P(B)
print("\n--- Conditional Probability Example ---")
print("Rolling a die: What's P(even | greater than 3)?")
print("Sample space: {1, 2, 3, 4, 5, 6}")
print("Event B (greater than 3): {4, 5, 6}")
print("Event A (even): {2, 4, 6}")
print("A and B: {4, 6}")
print("P(even | greater than 3) = 2/3 = 0.667")

## 📊 Visualizing Distributions

Distributions describe how values are spread. Two key types:
- **Discrete**: Countable outcomes (e.g., coin flips, dice rolls)
- **Continuous**: Infinite possible values (e.g., height, temperature)


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm, binom

# Normal distribution (continuous)
x = np.linspace(-4, 4, 100)
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(x, norm.pdf(x), 'b-', label='PDF')
plt.fill_between(x, norm.pdf(x), alpha=0.3)
plt.title("Normal Distribution (Continuous)")
plt.xlabel("Value")
plt.ylabel("Probability Density")
plt.grid(True)
plt.legend()

# Binomial distribution (discrete)
n_trials = 10
p_success = 0.3
x_binom = np.arange(0, n_trials + 1)
plt.subplot(1, 2, 2)
plt.bar(x_binom, binom.pmf(x_binom, n_trials, p_success), alpha=0.7, color='red')
plt.title(f"Binomial Distribution (n={n_trials}, p={p_success})")
plt.xlabel("Number of Successes")
plt.ylabel("Probability")
plt.grid(True)

plt.tight_layout()
plt.show()

# Key differences
print("Key Differences:")
print("Normal Distribution:")
print("- Continuous (infinite possible values)")
print("- Bell-shaped, symmetric")
print("- Described by mean (μ) and standard deviation (σ)")
print("- Area under curve = 1")

print("\nBinomial Distribution:")
print("- Discrete (countable outcomes)")
print("- Models number of successes in n trials")
print("- Described by n (trials) and p (success probability)")
print("- Sum of all probabilities = 1")

# Expected values
print(f"\nExpected value of Normal(0,1): {norm.mean()}")
print(f"Expected value of Binomial({n_trials},{p_success}): {binom.mean(n_trials, p_success)}")

## 📐 Bayes’ Theorem

In [None]:
# Bayes’ Theorem: Example
# P(Disease | Positive Test) = ?

# Prior
P_disease = 0.01
P_no_disease = 0.99

# Likelihood
P_pos_given_disease = 0.95
P_pos_given_no_disease = 0.05

# Total probability of positive test
P_pos = P_disease * P_pos_given_disease + P_no_disease * P_pos_given_no_disease

# Posterior
P_disease_given_pos = (P_disease * P_pos_given_disease) / P_pos
print(f"P(Disease | Positive Test) = {P_disease_given_pos:.3f}")

## ✅ Summary Quiz
1. What does the area under a probability density function represent?
2. How is a Binomial distribution different from a Normal distribution?
3. Why does Bayes’ Theorem require a denominator?

### Enhanced Quiz Questions with Answers
1. **What does the area under a probability density function represent?**
   > The total probability = 1. For any specific interval, the area represents the probability of the random variable falling in that range.

2. **How is a Binomial distribution different from a Normal distribution?**
   > Binomial is discrete (countable outcomes like coin flips), while Normal is continuous (infinite possible values). Binomial models success/failure trials, Normal models bell-shaped continuous phenomena.

3. **Why does Bayes' Theorem require a denominator P(B)?**
   > To normalize the probabilities so they sum to 1. It ensures we're calculating a proper conditional probability.

4. **What's the expected value of a fair six-sided die?**
   > (1+2+3+4+5+6)/6 = 3.5

### Self-Assessment Checklist
Check off each item as you master it:

- [ ] I understand how to simulate a random process
- [ ] I can differentiate between discrete and continuous distributions
- [ ] I can calculate and interpret conditional probability
- [ ] I understand how Bayes' Theorem updates belief with evidence
- [ ] I can read and interpret probability distribution plots
- [ ] I understand the Law of Large Numbers
- [ ] I can explain why rare diseases have low posterior probabilities even with accurate tests

### 🔗 Next Steps
- Review the [companion theory file](02_probability_statistics.md) for deeper mathematical insights
- Practice with different probability scenarios
- Think about how uncertainty quantification applies to machine learning

### 💡 Key Takeaways
- **Probability**: Measures uncertainty (0 to 1)
- **Distributions**: Describe how values are spread
- **Bayes' Theorem**: Updates beliefs with new evidence
- **Conditional Probability**: P(A|B) ≠ P(A) when B provides information
- **ML Connection**: Uncertainty is everywhere in machine learning!
