# 🎲 Probability & Distributions: The Foundation of Uncertainty in AI

> *"Probability theory is nothing but common sense reduced to calculation."* - Pierre-Simon Laplace

Welcome to the world of **Probability and Statistics**! This is the mathematical framework that allows AI to reason about uncertainty, make predictions from noisy data, and learn patterns in a world that is not deterministic.

## 🎯 What You'll Master

- **Fundamentals of Probability**: Sample spaces, events, and axioms.
- **Random Variables**: Discrete and continuous variables with their PMFs, PDFs, and CDFs.
- **Key Probability Distributions**: Visualizing and understanding Uniform, Binomial, and the all-important Normal distribution.
- **AI Connections**: How probability underpins machine learning models and algorithms.

## 📚 Import Essential Libraries

Let's load the libraries we'll need for our statistical exploration.

In [None]:
# Core libraries for data manipulation and mathematics
import numpy as np
import pandas as pd
import scipy.stats as stats

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Interactive widgets
import ipywidgets as widgets
from ipywidgets import interact

# Set up plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("colorblind")
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 7)
plt.rcParams['font.size'] = 12

print("🎲 Libraries loaded successfully!")

---

# 🎲 Chapter 1: The Basics of Probability

**Probability** is a measure of the likelihood that an event will occur. It's a number between 0 and 1, where:
- **0** indicates impossibility.
- **1** indicates certainty.

### Key Concepts
- **Experiment**: An action or process with an uncertain outcome (e.g., rolling a die).
- **Sample Space (S)**: The set of all possible outcomes of an experiment (e.g., {1, 2, 3, 4, 5, 6}).
- **Event (E)**: A subset of the sample space (e.g., rolling an even number: {2, 4, 6}).

The probability of an event E is:
$$ P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes in the sample space}} $$

In [None]:
def visualize_dice_roll():
    """
    Visualize the sample space and an event for a single die roll.
    """
    sample_space = list(range(1, 7))
    event_even = [x for x in sample_space if x % 2 == 0]
    
    prob_even = len(event_even) / len(sample_space)
    
    fig, ax = plt.subplots(figsize=(10, 5))
    
    # Plot sample space
    ax.bar(sample_space, [1]*len(sample_space), color='lightblue', label='Sample Space (S)')
    
    # Highlight the event
    ax.bar(event_even, [1]*len(event_even), color='salmon', label='Event (E): Roll is Even')
    
    ax.set_yticks([])
    ax.set_xticks(sample_space)
    ax.set_xlabel('Die Roll Outcome', fontsize=12)
    ax.set_title('Visualizing a Simple Probability Experiment', fontsize=14, weight='bold')
    ax.legend()
    
    # Add text annotation
    ax.text(3.5, 0.5, f'P(E) = {len(event_even)} / {len(sample_space)} = {prob_even:.2f}',
            fontsize=14, ha='center', va='center', 
            bbox=dict(boxstyle="round", facecolor='wheat', alpha=0.8))
    
    plt.show()

    print(f"Sample Space (S): {sample_space}")
    print(f"Event (E) - Rolling an even number: {event_even}")
    print(f"Probability of E: {prob_even:.2f}")

visualize_dice_roll()

---

# 🎰 Chapter 2: Random Variables & Probability Distributions

A **Random Variable** is a variable whose value is a numerical outcome of a random phenomenon.

- **Discrete Random Variable**: Has a countable number of possible values (e.g., number of heads in 3 coin flips).
- **Continuous Random Variable**: Can take any value in a given range (e.g., height of a person).

A **Probability Distribution** describes how the probabilities are distributed over the values of the random variable.

### 1. Binomial Distribution (Discrete)

Models the number of successes in a fixed number of independent trials.
- **Parameters**: `n` (number of trials), `p` (probability of success in one trial).
- **Use Case**: A/B testing conversion rates, number of defective items in a batch.

In [None]:
def plot_binomial_distribution(n=10, p=0.5):
    """
    Plot the Probability Mass Function (PMF) and Cumulative Distribution Function (CDF)
    for a Binomial distribution.
    """
    # x values (number of successes)
    k = np.arange(0, n + 1)
    
    # PMF and CDF
    pmf = stats.binom.pmf(k, n, p)
    cdf = stats.binom.cdf(k, n, p)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # Plot PMF
    ax1.bar(k, pmf, color='skyblue', edgecolor='black')
    ax1.set_title(f'Binomial PMF (n={n}, p={p})', fontsize=14, weight='bold')
    ax1.set_xlabel('Number of Successes (k)')
    ax1.set_ylabel('Probability')
    ax1.grid(True, alpha=0.3)
    
    # Plot CDF
    ax2.step(k, cdf, where='post', color='salmon', linewidth=2)
    ax2.set_title(f'Binomial CDF (n={n}, p={p})', fontsize=14, weight='bold')
    ax2.set_xlabel('Number of Successes (k)')
    ax2.set_ylabel('Cumulative Probability')
    ax2.grid(True, alpha=0.3)
    ax2.set_ylim(0, 1.1)
    
    plt.tight_layout()
    plt.show()
    
    mean, var = stats.binom.stats(n, p, moments='mv')
    print(f"Binomial Distribution (n={n}, p={p}):")
    print(f"  - Expected Value (Mean): {mean:.2f}")
    print(f"  - Variance: {var:.2f}")

# Interactive widget for Binomial distribution
interact(
    plot_binomial_distribution,
    n=widgets.IntSlider(value=20, min=1, max=100, step=1, description='n (trials):'),
    p=widgets.FloatSlider(value=0.5, min=0.0, max=1.0, step=0.01, description='p (prob):')
);

### 2. Normal (Gaussian) Distribution (Continuous)

The most famous distribution, characterized by its bell shape. It's central to statistics due to the **Central Limit Theorem**.
- **Parameters**: `μ` (mean), `σ` (standard deviation).
- **Use Case**: Modeling natural phenomena (heights, weights), errors in measurements, initializing neural network weights.

In [None]:
def plot_normal_distribution(mu=0, sigma=1):
    """
    Plot the Probability Density Function (PDF) and CDF for a Normal distribution.
    """
    # x values
    x = np.linspace(mu - 4*sigma, mu + 4*sigma, 1000)
    
    # PDF and CDF
    pdf = stats.norm.pdf(x, loc=mu, scale=sigma)
    cdf = stats.norm.cdf(x, loc=mu, scale=sigma)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # Plot PDF
    ax1.plot(x, pdf, color='limegreen', linewidth=2)
    ax1.fill_between(x, pdf, color='limegreen', alpha=0.2)
    ax1.set_title(f'Normal PDF (μ={mu}, σ={sigma})', fontsize=14, weight='bold')
    ax1.set_xlabel('Value (x)')
    ax1.set_ylabel('Probability Density')
    ax1.grid(True, alpha=0.3)
    
    # Add standard deviation lines
    for i in range(1, 4):
        ax1.axvline(mu + i*sigma, color='red', linestyle='--', alpha=0.5, label=f'{i}σ' if i==1 else None)
        ax1.axvline(mu - i*sigma, color='red', linestyle='--', alpha=0.5)
    ax1.legend()

    # Plot CDF
    ax2.plot(x, cdf, color='darkorange', linewidth=2)
    ax2.set_title(f'Normal CDF (μ={mu}, σ={sigma})', fontsize=14, weight='bold')
    ax2.set_xlabel('Value (x)')
    ax2.set_ylabel('Cumulative Probability')
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Empirical Rule (68-95-99.7)
    p_1_sigma = stats.norm.cdf(mu + sigma, mu, sigma) - stats.norm.cdf(mu - sigma, mu, sigma)
    p_2_sigma = stats.norm.cdf(mu + 2*sigma, mu, sigma) - stats.norm.cdf(mu - 2*sigma, mu, sigma)
    p_3_sigma = stats.norm.cdf(mu + 3*sigma, mu, sigma) - stats.norm.cdf(mu - 3*sigma, mu, sigma)
    
    print(f"Normal Distribution (μ={mu}, σ={sigma}):")
    print(f"  - P(μ-σ < X < μ+σ) ≈ {p_1_sigma:.3f} (68% rule)")
    print(f"  - P(μ-2σ < X < μ+2σ) ≈ {p_2_sigma:.3f} (95% rule)")
    print(f"  - P(μ-3σ < X < μ+3σ) ≈ {p_3_sigma:.3f} (99.7% rule)")

# Interactive widget for Normal distribution
interact(
    plot_normal_distribution,
    mu=widgets.FloatSlider(value=0, min=-5, max=5, step=0.1, description='μ (mean):'),
    sigma=widgets.FloatSlider(value=1, min=0.1, max=5, step=0.1, description='σ (std dev):')
);

---

# 🧠 Chapter 3: The Central Limit Theorem (CLT)

The **Central Limit Theorem** is a cornerstone of statistics. It states that the distribution of the **sum (or average) of a large number of independent, identically distributed random variables** will be approximately normal, regardless of the underlying distribution.

**Why is this so important for AI?**
- It justifies the use of normal distributions to model aggregate effects.
- It explains why many real-world phenomena appear to be normally distributed.
- It's the foundation for many statistical tests and confidence intervals.

In [None]:
def visualize_central_limit_theorem(n_samples=1000, sample_size=30):
    """
    Demonstrate the Central Limit Theorem using a Uniform distribution.
    """
    # We'll sample from a Uniform distribution, which is decidedly not normal.
    underlying_dist = stats.uniform(loc=0, scale=10)
    
    # Generate many sample means
    sample_means = []
    for _ in range(n_samples):
        sample = underlying_dist.rvs(size=sample_size)
        sample_means.append(np.mean(sample))
        
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # Plot the underlying distribution
    x = np.linspace(-2, 12, 200)
    ax1.plot(x, underlying_dist.pdf(x), 'b-', lw=2, label='Uniform PDF')
    ax1.set_title('Underlying Distribution (Uniform)', fontsize=14, weight='bold')
    ax1.set_xlabel('Value')
    ax1.set_ylabel('Density')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Plot the distribution of sample means
    sns.histplot(sample_means, kde=True, ax=ax2, color='purple', stat='density')
    
    # Overlay the theoretical normal distribution
    mu = underlying_dist.mean()
    sigma = underlying_dist.std() / np.sqrt(sample_size)
    x_norm = np.linspace(min(sample_means), max(sample_means), 100)
    ax2.plot(x_norm, stats.norm.pdf(x_norm, mu, sigma), 'r--', lw=2, label='Theoretical Normal PDF')
    
    ax2.set_title(f'Distribution of Sample Means (n={sample_size})', fontsize=14, weight='bold')
    ax2.set_xlabel('Sample Mean')
    ax2.set_ylabel('Density')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.suptitle('Central Limit Theorem in Action', fontsize=16, weight='bold')
    plt.tight_layout()
    plt.show()
    
    print(f"Demonstrating CLT with {n_samples} samples of size {sample_size}:")
    print(f"  - Underlying Mean: {underlying_dist.mean():.2f}")
    print(f"  - Mean of Sample Means: {np.mean(sample_means):.2f}")
    print(f"  - Underlying Std Dev: {underlying_dist.std():.2f}")
    print(f"  - Std Dev of Sample Means (Theoretical): {sigma:.2f}")
    print(f"  - Std Dev of Sample Means (Observed): {np.std(sample_means):.2f}")

# Interactive widget for CLT
interact(
    visualize_central_limit_theorem,
    n_samples=widgets.IntSlider(value=1000, min=100, max=10000, step=100, description='Num Samples:'),
    sample_size=widgets.IntSlider(value=30, min=1, max=500, step=1, description='Sample Size:')
);

---

# 🎯 Key Takeaways

## 🎲 Probability Basics
- **Language of Uncertainty**: Provides a formal way to reason about randomness.
- **Foundation**: Built on sample spaces, events, and axioms.

## 🎰 Probability Distributions
- **Models for Data**: Describe the likelihood of different outcomes for a random variable.
- **Discrete vs. Continuous**: Binomial for counts, Normal for measurements.
- **Key Functions**: PMF/PDF for likelihood, CDF for cumulative probability.

## 🔔 Central Limit Theorem
- **The Star of Statistics**: The distribution of sample means tends to be normal, even if the source isn't.
- **Practical Implications**: Justifies using normal-based models and tests in many real-world scenarios.

## 🧠 AI Connections
- **Generative Models**: Learn and sample from complex data distributions (e.g., GANs, VAEs).
- **Probabilistic Models**: Explicitly model uncertainty (e.g., Bayesian networks, Gaussian Mixture Models).
- **Optimization**: Stochastic Gradient Descent relies on random sampling of data.

---

# 🚀 What's Next?

In the next notebook, we'll move from theory to practice with **Descriptive Statistics**. We'll learn how to summarize, visualize, and interpret datasets using key statistical measures.

- **Measures of Central Tendency**: Mean, Median, Mode
- **Measures of Dispersion**: Variance, Standard Deviation, Range
- **Correlation and Covariance**: Understanding relationships between variables

**Ready to describe your data? Let's go! 📊**