[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jkitchin/s26-06642/blob/main/dsmles/participation/participation-12-uncertainty-quantification.ipynb)

# Module 12: Uncertainty Quantification - Participation Exercises

## Exercise Types

| Type | Icon | Description | Time |
|------|------|-------------|------|
| **Reflection** | :thinking: | Personal reflection on concepts and connections | 3-5 min |
| **Mini-Exercise** | :wrench: | Hands-on coding or problem solving | 5-10 min |
| **Discussion** | :speech_balloon: | Pair or group discussion with neighbors | 5-7 min |
| **Prediction** | :crystal_ball: | Make a prediction before seeing results | 2-3 min |
| **Critique** | :mag: | Analyze code, results, or approaches | 5-7 min |

## Exercise 12.1: Reflection - Communicating Uncertainty

**Type:** :thinking: Reflection (3 min)

Your model predicts a reactor yield of 75% with a 95% confidence interval of [65%, 85%].

**Reflect:**
1. How would you explain this to a plant manager who asks "So what's the yield going to be?"
2. Why might the manager find uncertainty uncomfortable?
3. Why is communicating uncertainty important for decision-making?

*Your reflection:*



## Exercise 12.2: Mini-Exercise - Bootstrap Confidence Intervals

**Type:** :wrench: Mini-Exercise (8 min)

Calculate a bootstrap confidence interval for a regression coefficient.

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Simple linear data
np.random.seed(42)
n = 50
X = np.random.uniform(0, 10, n).reshape(-1, 1)
y = 2.5 * X.ravel() + 5 + np.random.randn(n) * 2  # True slope = 2.5

# Single model fit
model = LinearRegression().fit(X, y)
print(f"Point estimate for slope: {model.coef_[0]:.3f}")

# TASK: Implement bootstrap to get confidence interval
n_bootstrap = 1000
bootstrap_slopes = []

for i in range(n_bootstrap):
    # 1. Sample with replacement from the data
    # indices = ???
    
    # 2. Fit model on bootstrap sample
    # X_boot = ???
    # y_boot = ???
    # model_boot = ???
    
    # 3. Store the coefficient
    # bootstrap_slopes.append(???)
    pass

# 4. Calculate 95% CI
# ci_lower = np.percentile(bootstrap_slopes, ???)
# ci_upper = np.percentile(bootstrap_slopes, ???)
# print(f"95% CI for slope: [{ci_lower:.3f}, {ci_upper:.3f}]")

## Exercise 12.3: Discussion - Sources of Uncertainty

**Type:** :speech_balloon: Discussion (5 min)

A machine learning model has multiple sources of uncertainty.

**Categorize these sources as "aleatory" (irreducible) or "epistemic" (reducible):**

1. Measurement noise in sensors
2. Model doesn't include an important variable
3. Inherent randomness in a chemical process
4. Not enough training data
5. Wrong model architecture

**Which can we reduce by collecting more data?**

*Classification:*

| Source | Aleatory or Epistemic? | Can more data help? |
|--------|----------------------|--------------------|
| 1. Measurement noise | | |
| 2. Missing variable | | |
| 3. Process randomness | | |
| 4. Limited training data | | |
| 5. Wrong architecture | | |