# 🔎 Inferential Statistics: Making Conclusions from Data

> *"Essentially, all models are wrong, but some are useful."* - George Box

Welcome to **Inferential Statistics**! While descriptive statistics summarizes a given dataset, inferential statistics uses that data to make predictions, decisions, and generalizations about a larger population. This is where we move from *what the data is* to *what the data means*.

## 🎯 What You'll Master

- **Hypothesis Testing**: The formal framework for making decisions from data (e.g., is this new drug effective?).
- **p-values**: Understanding and correctly interpreting this often-misunderstood metric.
- **Confidence Intervals**: Estimating a population parameter with a range of plausible values.
- **t-tests**: A practical tool for comparing the means of two groups.

## 📚 Import Essential Libraries

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import scipy.stats as stats

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Set up plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("muted")
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 7)
plt.rcParams['font.size'] = 13

print("🔎 Libraries loaded for inferential statistics!")

---

# 🔬 Chapter 1: Hypothesis Testing

**Hypothesis testing** is a formal procedure for checking if a hypothesis is supported by the data. It's like a mathematical version of the legal principle "innocent until proven guilty."

### The Two Hypotheses
1. **Null Hypothesis (H₀)**: The default assumption, a statement of no effect or no difference. (The defendant is innocent).
2. **Alternative Hypothesis (H₁ or Ha)**: The claim we want to test. It's what we'll accept if we have enough evidence to reject H₀. (The defendant is guilty).

### The Process
1. State H₀ and H₁.
2. Choose a significance level (α), usually 0.05. This is the probability of rejecting H₀ when it's actually true (a "false positive").
3. Collect data and calculate a test statistic.
4. Calculate the **p-value**: The probability of observing data as extreme as, or more extreme than, what you collected, *assuming the null hypothesis is true*.
5. Make a decision:
   - If **p-value < α**, we **reject the null hypothesis**. We have statistically significant evidence for the alternative.
   - If **p-value ≥ α**, we **fail to reject the null hypothesis**. We do not have enough evidence to support the alternative.

### Example: A/B Testing a Website

Let's simulate an A/B test. A company wants to know if changing the color of a "Buy Now" button from blue (Group A) to green (Group B) increases the click-through rate (CTR).

- **H₀**: The CTR of the green button is the same as the blue button.
- **H₁**: The CTR of the green button is different from the blue button.

In [None]:
def simulate_ab_test():
    """
    Simulate and analyze a simple A/B test for website button clicks.
    """
    np.random.seed(42)
    
    # Parameters
    n_A = 1000  # Visitors for blue button
    n_B = 1000  # Visitors for green button
    ctr_A_true = 0.10  # True CTR for blue
    ctr_B_true = 0.12  # True CTR for green (a 2% absolute improvement)
    
    # Simulate clicks (as a binomial process)
    clicks_A = np.random.binomial(n_A, ctr_A_true)
    clicks_B = np.random.binomial(n_B, ctr_B_true)
    
    # Observed CTRs
    ctr_A_obs = clicks_A / n_A
    ctr_B_obs = clicks_B / n_B
    
    # Perform the hypothesis test (chi-squared test for proportions)
    contingency_table = np.array([
        [clicks_A, n_A - clicks_A], # Clicks, No-clicks for A
        [clicks_B, n_B - clicks_B]  # Clicks, No-clicks for B
    ])
    
    chi2, p_value, _, _ = stats.chi2_contingency(contingency_table, correction=False)
    
    # Visualization
    fig, ax = plt.subplots(figsize=(10, 6))
    sns.barplot(x=['Blue Button (A)', 'Green Button (B)'], y=[ctr_A_obs, ctr_B_obs], ax=ax, palette=['cornflowerblue', 'mediumseagreen'])
    ax.set_ylabel('Observed Click-Through Rate (CTR)')
    ax.set_title('A/B Test Results', fontsize=16, weight='bold')
    ax.set_ylim(0, max(ctr_A_obs, ctr_B_obs) * 1.2)
    for i, v in enumerate([ctr_A_obs, ctr_B_obs]):
        ax.text(i, v + 0.005, f'{v:.3f}', ha='center', fontsize=12)
    
    plt.show()
    
    # Print results
    print("--- A/B Test Analysis ---")
    print(f"Group A (Blue): {clicks_A} clicks out of {n_A} visitors -> CTR = {ctr_A_obs:.3f}")
    print(f"Group B (Green): {clicks_B} clicks out of {n_B} visitors -> CTR = {ctr_B_obs:.3f}")
    print(f"\nChi-Squared Test Statistic: {chi2:.4f}")
    print(f"p-value: {p_value:.4f}")
    
    # Conclusion
    alpha = 0.05
    print(f"\nSignificance Level (α): {alpha}")
    if p_value < alpha:
        print(f"✅ Decision: Reject the null hypothesis (p-value < {alpha}).")
        print("   Conclusion: There is a statistically significant difference in CTR between the blue and green buttons.")
    else:
        print(f"❌ Decision: Fail to reject the null hypothesis (p-value ≥ {alpha}).")
        print("   Conclusion: We do not have enough evidence to say there's a difference in CTR.")

simulate_ab_test()

---

# 📊 Chapter 2: p-values and Confidence Intervals

### Understanding the p-value

The **p-value** is one of the most misinterpreted concepts in statistics. 

**What it IS**: The probability of getting your observed result (or something more extreme) if the null hypothesis were true.
**What it is NOT**:
- It is NOT the probability that the null hypothesis is true.
- It is NOT the probability that the alternative hypothesis is false.
- A small p-value does NOT prove your alternative hypothesis is true; it only provides evidence against the null.

Let's visualize what a p-value represents.

In [None]:
def visualize_p_value():
    """
    Visualize the concept of a p-value under a null distribution.
    """
    # Assume H0 is true: mean = 0. We are using a standard normal distribution.
    mu_h0 = 0
    sigma_h0 = 1
    
    # Let's say we observed a test statistic of 1.96 (e.g., from a z-test)
    observed_statistic = 1.96
    
    # Calculate the p-value for a two-tailed test
    p_value = 2 * (1 - stats.norm.cdf(observed_statistic, loc=mu_h0, scale=sigma_h0))
    
    x = np.linspace(-4, 4, 1000)
    y = stats.norm.pdf(x, loc=mu_h0, scale=sigma_h0)
    
    plt.figure(figsize=(12, 7))
    plt.plot(x, y, 'b-', label='Null Hypothesis Distribution (H₀)')
    
    # Shade the area for the p-value
    x_fill = np.linspace(observed_statistic, 4, 100)
    plt.fill_between(x_fill, stats.norm.pdf(x_fill, mu_h0, sigma_h0), color='red', alpha=0.5, label='p-value area (one tail)')
    x_fill_neg = np.linspace(-4, -observed_statistic, 100)
    plt.fill_between(x_fill_neg, stats.norm.pdf(x_fill_neg, mu_h0, sigma_h0), color='red', alpha=0.5, label='p-value area (other tail)')
    
    plt.axvline(observed_statistic, color='red', linestyle='--', lw=2, label=f'Observed Statistic = {observed_statistic}')
    plt.axvline(-observed_statistic, color='red', linestyle='--', lw=2)
    
    plt.title('Visualizing a p-value for a Two-Tailed Test', fontsize=16, weight='bold')
    plt.xlabel('Test Statistic Value')
    plt.ylabel('Probability Density')
    plt.legend()
    plt.text(0, 0.1, f'p-value = {p_value:.3f}', ha='center', fontsize=14, bbox=dict(facecolor='white', alpha=0.8))
    plt.show()
    
    print("💡 The p-value is the total shaded red area.")
    print("It represents the probability of seeing a result as extreme as (or more extreme than) our observed statistic, assuming H₀ is true.")

visualize_p_value()

### Confidence Intervals

A **Confidence Interval (CI)** provides a range of plausible values for an unknown population parameter (like the mean or proportion). It's an alternative to a point estimate.

A **95% confidence interval** means that if we were to repeat our sampling process many times, 95% of the calculated confidence intervals would contain the true population parameter.

**Formula for a mean**:  `CI = sample_mean ± margin_of_error`

In [None]:
def visualize_confidence_intervals():
    """
    Demonstrate the meaning of a 95% confidence interval through simulation.
    """
    np.random.seed(101)
    
    # Population parameters (unknown in real life)
    true_mean = 50
    true_std = 10
    
    n_simulations = 100
    sample_size = 30
    confidence_level = 0.95
    
    plt.figure(figsize=(12, 8))
    
    n_captures = 0
    for i in range(n_simulations):
        # Take a sample from the population
        sample = np.random.normal(loc=true_mean, scale=true_std, size=sample_size)
        sample_mean = np.mean(sample)
        
        # Calculate the confidence interval
        # Using stats.t.interval for a more robust calculation when std is estimated
        ci_low, ci_high = stats.t.interval(confidence_level, df=sample_size-1, 
                                           loc=sample_mean, 
                                           scale=stats.sem(sample))
        
        # Check if the interval captures the true mean
        captures_mean = ci_low <= true_mean <= ci_high
        if captures_mean:
            n_captures += 1
        
        color = 'blue' if captures_mean else 'red'
        plt.plot([ci_low, ci_high], [i, i], color=color, linewidth=2)
        plt.plot(sample_mean, i, 'o', color='black', markersize=3)

    plt.axvline(true_mean, color='green', linestyle='--', lw=2, label=f'True Population Mean = {true_mean}')
    plt.title(f'{n_simulations} Simulated 95% Confidence Intervals', fontsize=16, weight='bold')
    plt.xlabel('Value')
    plt.ylabel('Simulation Number')
    plt.yticks([])
    plt.legend()
    plt.show()
    
    capture_rate = n_captures / n_simulations
    print(f"--- Confidence Interval Simulation ---")
    print(f"Number of simulations: {n_simulations}")
    print(f"Number of intervals that captured the true mean: {n_captures}")
    print(f"Observed Capture Rate: {capture_rate:.2f} (Expected: {confidence_level:.2f})")
    print("\n💡 Each blue line is a 95% CI from one sample that successfully 'captured' the true mean.")
    print("   Red lines are the ~5% of CIs that 'missed' the true mean by chance.")

visualize_confidence_intervals()

---

# ⚔️ Chapter 3: The t-test

A **t-test** is a common type of hypothesis test used to determine if there is a significant difference between the means of two groups.

### Example: Drug Efficacy
Let's test if a new drug reduces blood pressure compared to a placebo.

- **Group A**: Placebo
- **Group B**: New Drug
- **Measurement**: Change in blood pressure after 1 month.

- **H₀**: The mean change in blood pressure is the same for the drug and placebo groups.
- **H₁**: The mean change in blood pressure is different for the two groups.

In [None]:
def perform_t_test():
    """
    Simulate data for a drug trial and perform an independent t-test.
    """
    np.random.seed(7)
    
    # Simulate data
    # Placebo group: mean change of -2 (small placebo effect), std of 5
    placebo_group = np.random.normal(loc=-2, scale=5, size=50)
    # Drug group: mean change of -8 (stronger effect), std of 5
    drug_group = np.random.normal(loc=-8, scale=5, size=50)
    
    # Perform the independent t-test
    t_statistic, p_value = stats.ttest_ind(placebo_group, drug_group)
    
    # Visualization
    plt.figure(figsize=(12, 7))
    sns.kdeplot(placebo_group, fill=True, label='Placebo Group')
    sns.kdeplot(drug_group, fill=True, label='Drug Group')
    plt.axvline(np.mean(placebo_group), color=sns.color_palette()[0], linestyle='--', lw=2)
    plt.axvline(np.mean(drug_group), color=sns.color_palette()[1], linestyle='--', lw=2)
    plt.title('Distribution of Blood Pressure Change', fontsize=16, weight='bold')
    plt.xlabel('Change in Blood Pressure (mmHg)')
    plt.ylabel('Density')
    plt.legend()
    plt.show()
    
    # Print results
    print("--- Independent t-test Results ---")
    print(f"Placebo Group Mean: {np.mean(placebo_group):.2f}")
    print(f"Drug Group Mean: {np.mean(drug_group):.2f}")
    print(f"\nt-statistic: {t_statistic:.4f}")
    print(f"p-value: {p_value:.4f}")
    
    # Conclusion
    alpha = 0.05
    print(f"\nSignificance Level (α): {alpha}")
    if p_value < alpha:
        print(f"✅ Decision: Reject the null hypothesis (p-value < {alpha}).")
        print("   Conclusion: There is a statistically significant difference between the drug and placebo groups.")
    else:
        print(f"❌ Decision: Fail to reject the null hypothesis (p-value ≥ {alpha}).")
        print("   Conclusion: We do not have enough evidence to say the drug had a different effect than the placebo.")

perform_t_test()

---

# 🎯 Key Takeaways

## 🔬 The Scientific Method for Data
- **Hypothesis Testing**: Provides a structured way to ask and answer questions using data, forming the backbone of data-driven decision making.
- **Null Hypothesis (H₀)**: The default assumption of 'no effect'. We need strong evidence to overturn it.

## 📊 Interpreting the Results
- **p-value**: A measure of surprise. A small p-value means our data is surprising *if the null hypothesis is true*, leading us to question the null.
- **Confidence Interval**: A range of plausible values for a population parameter, giving us a sense of estimation uncertainty.
- **Statistical Significance**: When p < α, the result is called 'statistically significant'. This doesn't necessarily mean it's practically important, just that it's unlikely to be due to random chance alone.

## 🧠 AI Connections
- **A/B Testing**: The core of evaluating changes in products, from websites to recommendation algorithms.
- **Model Comparison**: Hypothesis tests can be used to determine if one machine learning model is significantly better than another.
- **Feature Selection**: Inferential tests can help decide if a feature has a statistically significant relationship with the target variable.
- **Causal Inference**: While this notebook focused on basic tests, the field of causal inference builds on these ideas to try and determine cause-and-effect relationships from data, a major frontier in AI.

---

# 🚀 What's Next?

This concludes our core journey through Probability and Statistics! The next section of this course will dive into **Optimization**, the engine that drives machine learning, allowing models to 'learn' from data by minimizing a loss function. We'll see how concepts like gradients (from Calculus) and random sampling (from Statistics) come together.

**Ready to find the minimum? Let's move on to Optimization! 📉**