# üé≤ Week 4, Day 3: Probability & Statistics for AI

**üéØ Goal:** Master probability and statistics - the language of uncertainty in AI

**‚è±Ô∏è Time:** 60-90 minutes

**üåü Why This Matters for AI:**
- **AI is probabilistic** - Models output probabilities, not certainties
- **Bayesian reasoning** - Update beliefs with new evidence
- **Statistical inference** - Make decisions from data
- **Uncertainty quantification** - Know when your model is confident
- **A/B testing** - Prove your AI improvements work

---

## üî• 2024-2025 AI Trend Alert!

**Large Language Models are Probability Machines**:
- GPT-4 predicts: "Next token has 60% chance of being 'the'"
- **Temperature = controlling randomness/creativity!**
- Sampling strategies: Top-p, Top-k = probability filtering

**Uncertainty in AI** is now CRITICAL:
- Healthcare AI: "90% confidence this is cancer"
- Self-driving: "Probability of obstacle ahead"
- **Knowing WHEN to trust AI = understanding probability!**

**Bayesian Deep Learning**:
- Neural networks with uncertainty estimates
- **Used in drug discovery, medical diagnosis**

**You'll learn the math behind AI decision-making!** üöÄ

---

## üé≤ What is Probability?

**Probability** = Math of uncertainty and randomness

Think of it as:
- Certainty: "The sun will rise tomorrow" (100%) ‚òÄÔ∏è
- Probability: "It will rain tomorrow" (30%) üåßÔ∏è
- Impossibility: "I'll grow wings" (0%) ü¶Ö

**In AI:**
- "This email is spam with 92% probability"
- "Customer will buy with 65% probability"
- "Next word is 'the' with 40% probability"

Let's explore! üéØ

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Set style
sns.set_theme(style="whitegrid")
np.random.seed(42)

print("NumPy version:", np.__version__)
print("‚úÖ Ready to explore probability!")

## üìä Basic Probability Concepts

### Key Rules:
1. **Probability range**: 0 ‚â§ P(A) ‚â§ 1
2. **Sum rule**: P(A or B) = P(A) + P(B) - P(A and B)
3. **Product rule**: P(A and B) = P(A) √ó P(B|A)
4. **Complement**: P(not A) = 1 - P(A)

In [None]:
# Simulate coin flips
n_flips = 10000
flips = np.random.choice(['Heads', 'Tails'], size=n_flips)

# Count outcomes
heads_count = np.sum(flips == 'Heads')
tails_count = np.sum(flips == 'Tails')

# Calculate probabilities
p_heads = heads_count / n_flips
p_tails = tails_count / n_flips

print(f"üé≤ Coin Flip Simulation ({n_flips:,} flips)\n")
print("=" * 50)
print(f"Heads: {heads_count:,} ({p_heads:.3f})")
print(f"Tails: {tails_count:,} ({p_tails:.3f})")
print(f"\n‚úÖ P(Heads) + P(Tails) = {p_heads + p_tails:.3f} (should be 1.0)")

# Visualize
plt.figure(figsize=(8, 6))
plt.bar(['Heads', 'Tails'], [heads_count, tails_count], color=['blue', 'orange'])
plt.ylabel('Count')
plt.title(f'Coin Flip Results ({n_flips:,} flips)', fontweight='bold')
plt.axhline(y=n_flips/2, color='red', linestyle='--', label='Expected (50%)')
plt.legend()
plt.show()

print("\nüß† Law of Large Numbers:")
print("   More trials ‚Üí Closer to theoretical probability (0.5)")

## üéØ Conditional Probability - The Heart of AI

**P(A|B)** = Probability of A given B has occurred

**Formula:** P(A|B) = P(A and B) / P(B)

**AI Examples:**
- P(spam | contains "lottery") - Email filtering
- P(click | shown ad) - Click-through rate
- P(churn | low engagement) - Customer retention

In [None]:
# Email spam classifier example
print("üìß EMAIL SPAM CLASSIFICATION\n")
print("=" * 60)

# Dataset statistics (simulated)
total_emails = 10000
spam_emails = 3000
ham_emails = 7000

# Word "free" statistics
free_in_spam = 2400  # 80% of spam contains "free"
free_in_ham = 700    # 10% of ham contains "free"

# Calculate probabilities
P_spam = spam_emails / total_emails
P_ham = ham_emails / total_emails
P_free_given_spam = free_in_spam / spam_emails
P_free_given_ham = free_in_ham / ham_emails

print(f"Dataset: {total_emails:,} emails")
print(f"  Spam: {spam_emails:,} ({P_spam:.1%})")
print(f"  Ham:  {ham_emails:,} ({P_ham:.1%})")
print()
print(f"Word 'free' statistics:")
print(f"  P('free' | spam) = {P_free_given_spam:.1%}")
print(f"  P('free' | ham)  = {P_free_given_ham:.1%}")

# Bayes' Theorem: P(spam | "free")
P_free = (free_in_spam + free_in_ham) / total_emails
P_spam_given_free = (P_free_given_spam * P_spam) / P_free

print(f"\nüéØ If email contains 'free':")
print(f"  P(spam | 'free') = {P_spam_given_free:.1%}")
print(f"  P(ham  | 'free') = {1-P_spam_given_free:.1%}")

print("\n‚ú® This is the foundation of Naive Bayes classifiers!")

## üîî Probability Distributions

### 1Ô∏è‚É£ Uniform Distribution - All outcomes equally likely

In [None]:
# Dice roll - uniform distribution
dice_rolls = np.random.randint(1, 7, size=10000)

plt.figure(figsize=(10, 6))
plt.hist(dice_rolls, bins=np.arange(0.5, 7.5, 1), 
        edgecolor='black', alpha=0.7, density=True)
plt.xlabel('Dice Value', fontsize=12)
plt.ylabel('Probability', fontsize=12)
plt.title('Uniform Distribution (Fair Dice)', fontsize=14, fontweight='bold')
plt.xticks(range(1, 7))
plt.axhline(y=1/6, color='red', linestyle='--', label='Expected (1/6)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print("üé≤ Each outcome has equal probability: 1/6 ‚âà 16.67%")

### 2Ô∏è‚É£ Bernoulli Distribution - Binary outcomes (success/failure)

In [None]:
# Click-through rate (CTR) simulation
p_click = 0.03  # 3% CTR
n_impressions = 10000

clicks = np.random.binomial(1, p_click, n_impressions)
total_clicks = np.sum(clicks)

print(f"üì± Ad Click Simulation\n")
print("=" * 50)
print(f"Impressions: {n_impressions:,}")
print(f"Clicks: {total_clicks} ({total_clicks/n_impressions:.2%})")
print(f"Expected: {p_click:.1%}")

plt.figure(figsize=(8, 6))
plt.bar(['No Click', 'Click'], 
       [n_impressions - total_clicks, total_clicks],
       color=['lightcoral', 'lightgreen'])
plt.ylabel('Count')
plt.title(f'Bernoulli Distribution (CTR = {p_click:.1%})', fontweight='bold')
plt.show()

print("\nüß† AI Applications:")
print("  - Binary classification (spam/ham, fraud/legit)")
print("  - Conversion prediction")
print("  - A/B testing")

### 3Ô∏è‚É£ Normal (Gaussian) Distribution - THE MOST IMPORTANT! üîî

In [None]:
# Height distribution (classic example)
mean_height = 170  # cm
std_height = 10    # cm
heights = np.random.normal(mean_height, std_height, 10000)

plt.figure(figsize=(12, 6))

# Histogram + KDE
plt.hist(heights, bins=50, density=True, alpha=0.6, 
        edgecolor='black', label='Data')

# Theoretical normal distribution
x = np.linspace(130, 210, 100)
pdf = stats.norm.pdf(x, mean_height, std_height)
plt.plot(x, pdf, 'r-', linewidth=2, label='Normal PDF')

# Mark mean and standard deviations
plt.axvline(mean_height, color='green', linestyle='--', linewidth=2, label='Mean')
plt.axvline(mean_height - std_height, color='orange', linestyle=':', linewidth=2, label='¬±1 std')
plt.axvline(mean_height + std_height, color='orange', linestyle=':', linewidth=2)
plt.axvline(mean_height - 2*std_height, color='purple', linestyle=':', linewidth=2, label='¬±2 std')
plt.axvline(mean_height + 2*std_height, color='purple', linestyle=':', linewidth=2)

plt.xlabel('Height (cm)', fontsize=12)
plt.ylabel('Probability Density', fontsize=12)
plt.title(f'Normal Distribution (Œº={mean_height}, œÉ={std_height})', 
         fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print("üìä 68-95-99.7 Rule (Empirical Rule):\n")
within_1std = np.sum((heights >= mean_height - std_height) & 
                    (heights <= mean_height + std_height)) / len(heights)
within_2std = np.sum((heights >= mean_height - 2*std_height) & 
                    (heights <= mean_height + 2*std_height)) / len(heights)
within_3std = np.sum((heights >= mean_height - 3*std_height) & 
                    (heights <= mean_height + 3*std_height)) / len(heights)

print(f"  Within ¬±1œÉ: {within_1std:.1%} (theory: 68%)")
print(f"  Within ¬±2œÉ: {within_2std:.1%} (theory: 95%)")
print(f"  Within ¬±3œÉ: {within_3std:.1%} (theory: 99.7%)")

print("\nüß† Why it matters for AI:")
print("  - Many natural phenomena are normally distributed")
print("  - Central Limit Theorem: Averages ‚Üí Normal")
print("  - Neural network weight initialization")
print("  - Noise in data")
print("  - Bayesian inference")

## üìä Descriptive Statistics - Understanding Data

### Measures of Central Tendency

In [None]:
# AI model prediction errors (residuals)
np.random.seed(42)
errors = np.concatenate([
    np.random.normal(0, 1, 950),  # Most errors are small
    np.random.normal(0, 5, 50)    # Some large outliers
])

# Calculate statistics
mean = np.mean(errors)
median = np.median(errors)
mode = stats.mode(errors.round(1), keepdims=True)[0][0]
std = np.std(errors)
variance = np.var(errors)

print("üìä MODEL PREDICTION ERRORS ANALYSIS\n")
print("=" * 60)
print(f"Mean:     {mean:8.3f}  (average error)")
print(f"Median:   {median:8.3f}  (middle value, robust to outliers)")
print(f"Std Dev:  {std:8.3f}  (spread of errors)")
print(f"Variance: {variance:8.3f}  (squared std dev)")
print(f"Min:      {errors.min():8.3f}")
print(f"Max:      {errors.max():8.3f}")

# Visualize
plt.figure(figsize=(14, 5))

# Histogram
plt.subplot(1, 2, 1)
plt.hist(errors, bins=50, edgecolor='black', alpha=0.7)
plt.axvline(mean, color='red', linestyle='--', linewidth=2, label=f'Mean: {mean:.2f}')
plt.axvline(median, color='green', linestyle='--', linewidth=2, label=f'Median: {median:.2f}')
plt.xlabel('Error')
plt.ylabel('Frequency')
plt.title('Prediction Error Distribution', fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)

# Box plot
plt.subplot(1, 2, 2)
plt.boxplot(errors, vert=True)
plt.ylabel('Error')
plt.title('Box Plot (shows quartiles & outliers)', fontweight='bold')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüéØ Key insights:")
print("  - Most errors are small (good!)")
print("  - Some large outliers exist")
print("  - Median is closer to typical value than mean")

## üîó Correlation - Relationship Between Variables

In [None]:
# Generate correlated data
np.random.seed(42)
n = 100

# Positive correlation
study_hours = np.random.uniform(0, 10, n)
test_score = 40 + 5 * study_hours + np.random.normal(0, 5, n)

# Negative correlation
social_media_hours = np.random.uniform(0, 8, n)
productivity = 100 - 8 * social_media_hours + np.random.normal(0, 10, n)

# Calculate correlations
corr_positive = np.corrcoef(study_hours, test_score)[0, 1]
corr_negative = np.corrcoef(social_media_hours, productivity)[0, 1]

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Positive correlation
axes[0].scatter(study_hours, test_score, alpha=0.6)
axes[0].plot(np.unique(study_hours), 
            np.poly1d(np.polyfit(study_hours, test_score, 1))(np.unique(study_hours)),
            color='red', linewidth=2, label='Trend line')
axes[0].set_xlabel('Study Hours')
axes[0].set_ylabel('Test Score')
axes[0].set_title(f'Positive Correlation (r = {corr_positive:.2f})', fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Negative correlation
axes[1].scatter(social_media_hours, productivity, alpha=0.6, color='orange')
axes[1].plot(np.unique(social_media_hours), 
            np.poly1d(np.polyfit(social_media_hours, productivity, 1))(np.unique(social_media_hours)),
            color='red', linewidth=2, label='Trend line')
axes[1].set_xlabel('Social Media Hours')
axes[1].set_ylabel('Productivity')
axes[1].set_title(f'Negative Correlation (r = {corr_negative:.2f})', fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("üìä Correlation Coefficient (r):\n")
print("  r = +1: Perfect positive correlation")
print("  r = 0:  No correlation")
print("  r = -1: Perfect negative correlation")
print("\nüß† AI Applications:")
print("  - Feature selection (remove correlated features)")
print("  - Understanding relationships in data")
print("  - Multicollinearity detection")
print("\n‚ö†Ô∏è Warning: Correlation ‚â† Causation!")

## üéØ Central Limit Theorem - Why Normal Distribution is Everywhere!

In [None]:
# Demonstrate CLT with dice rolls
sample_sizes = [1, 2, 5, 30]
n_samples = 10000

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()

for idx, n in enumerate(sample_sizes):
    # Roll dice and take average
    samples = np.random.randint(1, 7, size=(n_samples, n))
    sample_means = samples.mean(axis=1)
    
    # Plot histogram
    axes[idx].hist(sample_means, bins=30, density=True, 
                  edgecolor='black', alpha=0.7)
    
    # Overlay normal distribution
    x = np.linspace(sample_means.min(), sample_means.max(), 100)
    pdf = stats.norm.pdf(x, sample_means.mean(), sample_means.std())
    axes[idx].plot(x, pdf, 'r-', linewidth=2, label='Normal fit')
    
    axes[idx].set_title(f'Average of {n} dice roll(s)', fontweight='bold')
    axes[idx].set_xlabel('Sample Mean')
    axes[idx].set_ylabel('Density')
    axes[idx].legend()
    axes[idx].grid(True, alpha=0.3)

plt.suptitle('Central Limit Theorem in Action', fontsize=16, fontweight='bold', y=1.00)
plt.tight_layout()
plt.show()

print("üîî Central Limit Theorem:\n")
print("  The average of many random samples tends toward a normal distribution,")
print("  regardless of the original distribution!")
print("\nüß† Why it matters:")
print("  - Explains why normal distribution is everywhere")
print("  - Foundation of statistical inference")
print("  - Enables hypothesis testing")
print("  - Used in confidence intervals")

## üß™ Hypothesis Testing - Making Decisions with Data

In [None]:
# A/B test: Did the new AI model improve performance?
print("üß™ A/B TEST: MODEL IMPROVEMENT\n")
print("=" * 60)

# Model A (old)
np.random.seed(42)
model_a_accuracy = np.random.normal(0.85, 0.03, 1000)  # Mean: 85%

# Model B (new - supposedly better)
model_b_accuracy = np.random.normal(0.87, 0.03, 1000)  # Mean: 87%

# Statistics
mean_a = model_a_accuracy.mean()
mean_b = model_b_accuracy.mean()
std_a = model_a_accuracy.std()
std_b = model_b_accuracy.std()

print(f"Model A (Baseline):")
print(f"  Mean accuracy: {mean_a:.3f} ¬± {std_a:.3f}")
print(f"\nModel B (New):")
print(f"  Mean accuracy: {mean_b:.3f} ¬± {std_b:.3f}")
print(f"\nDifference: {mean_b - mean_a:.3f} ({(mean_b-mean_a)/mean_a*100:+.1f}%)")

# Perform t-test
t_stat, p_value = stats.ttest_ind(model_b_accuracy, model_a_accuracy)

print(f"\nüìä Statistical Test (t-test):")
print(f"  t-statistic: {t_stat:.3f}")
print(f"  p-value: {p_value:.6f}")

alpha = 0.05
if p_value < alpha:
    print(f"\n‚úÖ Result: SIGNIFICANT (p < {alpha})")
    print("   Model B is statistically better than Model A!")
else:
    print(f"\n‚ùå Result: NOT SIGNIFICANT (p >= {alpha})")
    print("   Cannot conclude Model B is better.")

# Visualize
plt.figure(figsize=(12, 6))
plt.hist(model_a_accuracy, bins=30, alpha=0.6, label='Model A', density=True)
plt.hist(model_b_accuracy, bins=30, alpha=0.6, label='Model B', density=True)
plt.axvline(mean_a, color='blue', linestyle='--', linewidth=2, label=f'Mean A: {mean_a:.3f}')
plt.axvline(mean_b, color='orange', linestyle='--', linewidth=2, label=f'Mean B: {mean_b:.3f}')
plt.xlabel('Accuracy', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.title('A/B Test: Model Performance Comparison', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print("\nüéØ Interpretation:")
print(f"  p-value = {p_value:.6f}")
print(f"  This means: {p_value*100:.4f}% chance of seeing this difference by random chance")
print(f"  Since p < 0.05, we're >95% confident the improvement is real!")

## üéØ Real AI Example: Confidence Intervals in Predictions

In [None]:
# Simulate AI model predictions with uncertainty
np.random.seed(42)

# True relationship: y = 2x + 1 + noise
X = np.linspace(0, 10, 50)
y_true = 2 * X + 1
y_observed = y_true + np.random.normal(0, 2, len(X))

# Model predictions (with uncertainty)
y_pred = 2 * X + 1
std_pred = 2  # Prediction uncertainty

# 95% confidence interval
ci_lower = y_pred - 1.96 * std_pred
ci_upper = y_pred + 1.96 * std_pred

# Visualize
plt.figure(figsize=(12, 6))

# Data points
plt.scatter(X, y_observed, alpha=0.6, s=50, label='Observed data')

# Model prediction
plt.plot(X, y_pred, 'r-', linewidth=2, label='Model prediction')

# Confidence interval
plt.fill_between(X, ci_lower, ci_upper, alpha=0.2, label='95% Confidence Interval')

plt.xlabel('Input (X)', fontsize=12)
plt.ylabel('Output (y)', fontsize=12)
plt.title('AI Predictions with Uncertainty (Confidence Intervals)', 
         fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Calculate coverage
in_ci = np.sum((y_observed >= ci_lower) & (y_observed <= ci_upper))
coverage = in_ci / len(y_observed)

print("üìä Confidence Interval Analysis:\n")
print(f"  95% CI coverage: {coverage:.1%} ({in_ci}/{len(y_observed)} points)")
print(f"  Expected: ~95%")
print("\nüß† This tells us:")
print("  - We're 95% confident the true value lies in the shaded region")
print("  - Wider intervals = more uncertainty")
print("  - Critical for high-stakes AI (healthcare, finance)")

## üéØ MINI CHALLENGE: Bayesian Spam Filter

In [None]:
# TODO: Build a Naive Bayes spam classifier!

print("üìß NAIVE BAYES SPAM FILTER\n")
print("=" * 60)

# Training data statistics
total_emails = 10000
spam_count = 3000
ham_count = 7000

# Word frequencies in spam vs ham
word_stats = {
    'free': {'spam': 2400/3000, 'ham': 700/7000},
    'winner': {'spam': 1800/3000, 'ham': 140/7000},
    'meeting': {'spam': 150/3000, 'ham': 3500/7000},
    'money': {'spam': 2100/3000, 'ham': 350/7000}
}

# Prior probabilities
P_spam = spam_count / total_emails
P_ham = ham_count / total_emails

print(f"Training data: {total_emails:,} emails")
print(f"  P(spam) = {P_spam:.2f}")
print(f"  P(ham)  = {P_ham:.2f}")

# Test email
test_email = ['free', 'winner', 'money']  # Looks spammy!

print(f"\nüì® Test email contains: {test_email}")

# TODO: Calculate P(spam|email) using Bayes' theorem
# Naive assumption: Words are independent

# P(words|spam)
P_words_given_spam = P_spam
for word in test_email:
    P_words_given_spam *= word_stats[word]['spam']

# P(words|ham)
P_words_given_ham = P_ham
for word in test_email:
    P_words_given_ham *= word_stats[word]['ham']

# Normalize (Bayes' theorem)
total_prob = P_words_given_spam + P_words_given_ham
P_spam_given_words = P_words_given_spam / total_prob
P_ham_given_words = P_words_given_ham / total_prob

print("\nüìä Probability Calculations:\n")
for word in test_email:
    print(f"  P('{word}' | spam) = {word_stats[word]['spam']:.2%}")
    print(f"  P('{word}' | ham)  = {word_stats[word]['ham']:.2%}")
    print()

print("üéØ Final Classification:\n")
print(f"  P(spam | email) = {P_spam_given_words:.4f} ({P_spam_given_words*100:.2f}%)")
print(f"  P(ham  | email) = {P_ham_given_words:.4f} ({P_ham_given_words*100:.2f}%)")

if P_spam_given_words > 0.5:
    print(f"\n‚úÖ Classification: SPAM (confidence: {P_spam_given_words*100:.1f}%)")
else:
    print(f"\n‚úÖ Classification: HAM (confidence: {P_ham_given_words*100:.1f}%)")

# Visualize
plt.figure(figsize=(10, 6))
plt.bar(['SPAM', 'HAM'], [P_spam_given_words, P_ham_given_words], 
       color=['red', 'green'], alpha=0.7)
plt.ylabel('Probability', fontsize=12)
plt.title('Email Classification Results', fontsize=14, fontweight='bold')
plt.ylim(0, 1)
for i, (label, prob) in enumerate([('SPAM', P_spam_given_words), 
                                   ('HAM', P_ham_given_words)]):
    plt.text(i, prob + 0.02, f'{prob*100:.1f}%', 
            ha='center', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, axis='y')
plt.show()

print("\n‚ú® This is the foundation of spam filters, sentiment analysis, and more!")

## üéâ Congratulations!

**You just learned:**
- ‚úÖ Probability fundamentals (rules, conditional probability)
- ‚úÖ Bayes' Theorem (the heart of probabilistic AI)
- ‚úÖ Key distributions (Uniform, Bernoulli, Normal)
- ‚úÖ Descriptive statistics (mean, median, std, correlation)
- ‚úÖ Central Limit Theorem (why normal is everywhere)
- ‚úÖ Hypothesis testing (A/B tests, p-values)
- ‚úÖ Confidence intervals (quantifying uncertainty)
- ‚úÖ Built a Naive Bayes spam filter!

**üéØ Probability & Statistics Cheat Sheet:**
```python
# Probability
P(A and B) = P(A) √ó P(B|A)  # Product rule
P(A or B) = P(A) + P(B) - P(A and B)  # Sum rule
P(A|B) = P(B|A) √ó P(A) / P(B)  # Bayes' theorem

# Distributions
np.random.uniform()     # Uniform
np.random.binomial()    # Bernoulli/Binomial
np.random.normal()      # Gaussian/Normal
np.random.poisson()     # Poisson

# Statistics
np.mean(), np.median(), np.std()  # Central tendency
np.corrcoef()                     # Correlation
stats.ttest_ind()                 # Hypothesis test
```

**üß† Key Insights:**
- AI is fundamentally probabilistic
- Uncertainty quantification is critical
- Normal distribution appears everywhere (CLT)
- Bayes' theorem = updating beliefs with evidence
- Always test if improvements are statistically significant!

**üéØ Practice Exercise:**

Build a complete Bayesian classifier:
1. Collect word frequencies from real emails
2. Implement smoothing (handle unseen words)
3. Calculate accuracy on test set
4. Create confusion matrix
5. Compare with different threshold values
6. Visualize decision boundaries

---

**üéä WEEK 4 COMPLETE!**

You've mastered the math foundations of AI:
- Day 1: Linear Algebra (neural network operations)
- Day 2: Calculus (how AI learns)
- Day 3: Probability (handling uncertainty)

**üìö Next Steps:**
- Week 5: Introduction to Machine Learning
- Week 6: Neural Networks from Scratch
- Week 7: Deep Learning with PyTorch

**üí° Fun Facts:** 
- Naive Bayes (1700s math!) still powers modern spam filters
- LLMs are probabilistic: They predict P(next_token | context)
- Temperature parameter = controlling output randomness
- All of AI: Optimize probabilities to make better predictions!

---

**üèÜ ACHIEVEMENT UNLOCKED: Math Master!**

*You now understand the mathematical foundations of ALL modern AI!* üöÄ

---

**Phase 1 Foundations COMPLETE!** üéâ

You're ready to build real AI systems!