# 015: Quantile Regression

## 🎯 Learning Objectives

By the end of this notebook, you will:
1. Understand **quantile regression** and how it differs from mean regression
2. Master **conditional quantiles** for predicting distribution bounds (10th, 50th, 90th percentiles)
3. Implement **quantile loss function** from scratch
4. Apply **sklearn's QuantileRegressor** for production use
5. Build **prediction intervals** and uncertainty quantification
6. Deploy quantile regression for **post-silicon validation** (process capability bounds, worst-case yield, guard-band optimization)
7. Apply to **general AI/ML** scenarios (risk assessment, extreme value prediction, tail modeling)

## 📊 Concept Overview

**Quantile Regression** predicts conditional quantiles instead of conditional means:
- **OLS/Ridge/Lasso**: Predict E[Y|X] (mean/expected value)
- **Quantile Regression**: Predict Q_τ[Y|X] (τ-th quantile)

**Key Advantages:**
- **Distribution modeling**: Get full picture (not just mean)
- **Robust to outliers**: Median regression (τ=0.5) is highly robust
- **Asymmetric loss**: Different penalties for over/under-prediction
- **Prediction intervals**: Natural uncertainty quantification
- **Extreme value analysis**: Model tails directly (τ=0.01, τ=0.99)

**Why Quantile Regression Matters:**
- **Post-Silicon**: Predict worst-case performance (τ=0.01) for guard-banding
- **General AI/ML**: Risk management needs tail predictions, not averages
- **Business value**: "95th percentile delivery time" more useful than "average"

## 🗺️ Quantile Regression Workflow

```mermaid
graph TD
    A[Input Data] --> B[Choose Quantiles]
    B --> C[τ = 0.5: Median]
    B --> D[τ = 0.1, 0.9: Intervals]
    B --> E[τ = 0.01, 0.99: Extremes]
    C --> F[Check Loss Function]
    D --> F
    E --> F
    F --> G[Asymmetric Penalty]
    G --> H[Optimize via LP or SGD]
    H --> I[Multiple Quantile Models]
    I --> J[Prediction Bands]
    J --> K[Uncertainty Quantification]
```

## 🧮 Mathematical Foundation

### Quantile Loss (Check Loss)

For quantile τ ∈ (0,1), the **check loss** is:

$$
\rho_{\tau}(e) = \begin{cases}
\tau \cdot e & \text{if } e \geq 0 \\
(\tau - 1) \cdot e & \text{if } e < 0
\end{cases}
$$

Where e = y - ŷ (residual).

**Intuition:**
- τ = 0.5 (median): Symmetric loss, robust like MAE
- τ = 0.9: Penalizes under-prediction 9× more
- τ = 0.1: Penalizes over-prediction 9× more

### Optimization Problem

$$
\min_{\beta} \sum_{i=1}^{n} \rho_{\tau}(y_i - x_i^T \beta)
$$

**Properties:**
- Convex but non-differentiable at e=0
- Solved via linear programming or subgradient methods

## 📦 Setup and Imports

### 📝 What's Happening in This Code?

**Purpose:** Import libraries for quantile regression implementation.

**Key Points:**
- **sklearn.linear_model.QuantileRegressor**: Production quantile regression
- **scipy.optimize**: For custom optimization
- **Matplotlib/Seaborn**: Visualize prediction intervals
- **Check loss**: Requires specialized optimization (not standard gradient descent)

**Why This Matters:** Quantile regression needs different optimization than MSE-based methods.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import QuantileRegressor, LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

print("✅ Libraries imported")
print(f"NumPy: {np.__version__}")

---

## 🔧 Part 1: From Scratch Implementation

### 📝 What's Happening in This Code?

**Purpose:** Implement quantile regression using subgradient descent on check loss.

**Key Points:**
- **Check loss**: Asymmetric penalty ρ_τ(e)
- **Subgradient**: ∂ρ_τ/∂e = τ if e≥0, else (τ-1)
- **Multiple quantiles**: Train separate model for each τ
- **Convex optimization**: Guaranteed to find global minimum

**Why This Matters:** Understanding check loss reveals why quantile regression is robust and models distribution tails.

In [None]:
class QuantileRegressionScratch:
    """
    Quantile Regression from scratch using subgradient descent.
    """
    
    def __init__(self, quantile=0.5, learning_rate=0.01, n_iterations=1000):
        self.quantile = quantile
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.w = None
        self.b = None
        self.loss_history = []
        
    def _check_loss(self, y, y_pred):
        """Compute quantile (check) loss."""
        errors = y - y_pred
        loss = np.where(
            errors >= 0,
            self.quantile * errors,
            (self.quantile - 1) * errors
        )
        return np.mean(loss)
    
    def _compute_gradient(self, X, y, y_pred):
        """Compute subgradient of check loss."""
        n_samples = X.shape[0]
        errors = y - y_pred
        
        # Subgradient: τ if e≥0, else (τ-1)
        subgrad = np.where(errors >= 0, -self.quantile, -(self.quantile - 1))
        
        grad_w = (1/n_samples) * X.T @ subgrad
        grad_b = (1/n_samples) * np.sum(subgrad)
        
        return grad_w, grad_b
    
    def fit(self, X, y):
        """Train using subgradient descent."""
        n_samples, n_features = X.shape
        self.w = np.zeros(n_features)
        self.b = 0
        
        for iteration in range(self.n_iterations):
            y_pred = X @ self.w + self.b
            grad_w, grad_b = self._compute_gradient(X, y, y_pred)
            self.w -= self.learning_rate * grad_w
            self.b -= self.learning_rate * grad_b
            
            if iteration % 100 == 0:
                loss = self._check_loss(y, y_pred)
                self.loss_history.append(loss)
        
        return self
    
    def predict(self, X):
        return X @ self.w + self.b

print("✅ QuantileRegressionScratch defined")

### Test From-Scratch Implementation

### 📝 What's Happening in This Code?

**Purpose:** Validate from-scratch quantile regression on heteroscedastic data.

**Key Points:**
- **Heteroscedastic data**: Variance increases with X (realistic scenario)
- **Three quantiles**: τ=0.1, 0.5, 0.9 create prediction band
- **Outliers added**: Tests robustness of median (τ=0.5)
- **Different slopes**: Each quantile has its own regression line
- **80% interval**: 10th-90th percentile captures middle 80% of distribution

**Why This Matters:** Real data often has non-constant variance. Quantile regression handles this naturally.

In [None]:
# Generate heteroscedastic data (variance increases with X)
np.random.seed(42)
n_samples = 200
X_hetero = np.linspace(0, 10, n_samples).reshape(-1, 1)

# True relationship with increasing variance
y_hetero = 2 * X_hetero.ravel() + 5 + np.random.normal(
    0, 0.5 + 0.3*X_hetero.ravel(), n_samples
)

# Add extreme outliers
outlier_idx = np.random.choice(n_samples, size=10, replace=False)
y_hetero[outlier_idx] += np.random.choice([-1, 1], 10) * np.random.uniform(5, 10, 10)

# Split
X_train, X_test, y_train, y_test = train_test_split(
    X_hetero, y_hetero, test_size=0.2, random_state=42
)

# Train quantile models for τ = 0.1, 0.5, 0.9
quantiles = [0.1, 0.5, 0.9]
models_scratch = {}

for tau in quantiles:
    model = QuantileRegressionScratch(
        quantile=tau, learning_rate=0.01, n_iterations=1000
    )
    model.fit(X_train, y_train)
    models_scratch[tau] = model
    
print("\n🔧 From-Scratch Quantile Regression Trained:")
for tau, model in models_scratch.items():
    y_pred = model.predict(X_test)
    mae = mean_absolute_error(y_test, y_pred)
    print(f"τ={tau:.1f}: MAE={mae:.4f}, w={model.w[0]:.4f}, b={model.b:.4f}")

print("\n✅ Different quantiles → different slopes (models the distribution)")

### Visualize Quantile Bands

### 📝 What's Happening in This Code?

**Purpose:** Visualize prediction intervals from multiple quantile models.

**Key Points:**
- **Shaded band**: Region between 10th and 90th percentile predictions
- **Median line (red)**: τ=0.5, robust central tendency
- **Widening band**: Captures increasing variance as X increases
- **Coverage**: Should contain ~80% of data points
- **Outliers outside band**: Expected (10% below, 10% above)

**Why This Matters:** Visual validation that quantile models capture the data distribution, not just the mean.

In [None]:
# Create prediction grid
X_plot = np.linspace(X_train.min(), X_train.max(), 300).reshape(-1, 1)

# Get predictions for all quantiles
predictions = {}
for tau, model in models_scratch.items():
    predictions[tau] = model.predict(X_plot)

# Plot
plt.figure(figsize=(12, 6))

# Training data
plt.scatter(X_train, y_train, alpha=0.5, s=30, label='Training Data')

# Quantile lines
plt.plot(X_plot, predictions[0.5], 'r-', linewidth=2, label='Median (τ=0.5)')
plt.plot(X_plot, predictions[0.1], 'b--', linewidth=1.5, label='10th Percentile')
plt.plot(X_plot, predictions[0.9], 'g--', linewidth=1.5, label='90th Percentile')

# Shaded band (10th-90th percentile)
plt.fill_between(
    X_plot.ravel(), 
    predictions[0.1], 
    predictions[0.9],
    alpha=0.2, 
    color='gray', 
    label='80% Prediction Interval'
)

plt.xlabel('Feature', fontsize=12)
plt.ylabel('Target', fontsize=12)
plt.title('Quantile Regression: Prediction Intervals', fontsize=14, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\n📊 Interpretation:")
print("• Red line: Median prediction (robust to outliers)")
print("• Gray band: 80% prediction interval (10th-90th percentile)")
print("• Band widens with X: Captures heteroscedasticity")
print("• ~80% of points should fall within the gray band")

### Loss Convergence

### 📝 What's Happening in This Code?

**Purpose:** Verify that check loss converges during training.

**Key Points:**
- **Different loss scales**: Each quantile has different check loss magnitude
- **Convergence pattern**: Should decrease and stabilize
- **Non-smooth**: Expected due to non-differentiability at e=0
- **Convex optimization**: Guaranteed convergence to global minimum

**Why This Matters:** Loss convergence validates correct implementation of subgradient descent on check loss.

In [None]:
# Plot loss convergence for all quantiles
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for idx, (tau, model) in enumerate(models_scratch.items()):
    axes[idx].plot(model.loss_history, linewidth=2)
    axes[idx].set_xlabel('Iteration (x100)', fontsize=11)
    axes[idx].set_ylabel('Check Loss', fontsize=11)
    axes[idx].set_title(f'τ={tau} Loss Convergence', fontsize=12, fontweight='bold')
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n📉 Loss Analysis:")
for tau, model in models_scratch.items():
    print(f"τ={tau}: Initial loss={model.loss_history[0]:.4f}, Final loss={model.loss_history[-1]:.4f}")

---

## 🚀 Part 2: Production Quantile Regression with Sklearn

Sklearn's QuantileRegressor uses optimized solvers (linear programming) for better performance.

### 📝 What's Happening in This Code?

**Purpose:** Compare sklearn's optimized QuantileRegressor to from-scratch implementation.

**Key Points:**
- **solver='highs'**: Interior-point linear programming (much faster than gradient descent)
- **alpha parameter**: L1 regularization (Lasso-style, optional)
- **Production ready**: Handles large datasets efficiently
- **Same API**: Familiar fit/predict interface
- **Validation**: Should match from-scratch results

**Why This Matters:** Production code needs optimized solvers. Linear programming guarantees optimal solution faster than subgradient descent.

In [None]:
# Train sklearn QuantileRegressor for same quantiles
models_sklearn = {}

for tau in quantiles:
    model = QuantileRegressor(quantile=tau, alpha=0, solver='highs')
    model.fit(X_train, y_train)
    models_sklearn[tau] = model

print("\n�� Sklearn QuantileRegressor Trained:")
for tau, model in models_sklearn.items():
    y_pred = model.predict(X_test)
    mae = mean_absolute_error(y_test, y_pred)
    print(f"τ={tau:.1f}: MAE={mae:.4f}, coef={model.coef_[0]:.4f}, intercept={model.intercept_:.4f}")

# Compare to from-scratch
print("\n📊 Comparison (Sklearn vs From-Scratch):")
for tau in quantiles:
    mae_sklearn = mean_absolute_error(y_test, models_sklearn[tau].predict(X_test))
    mae_scratch = mean_absolute_error(y_test, models_scratch[tau].predict(X_test))
    print(f"τ={tau:.1f}: Sklearn MAE={mae_sklearn:.4f}, Scratch MAE={mae_scratch:.4f}")
    
print("\n✅ Results similar (sklearn uses better optimization)")

### Visual Comparison: Sklearn vs From-Scratch

In [None]:
# Side-by-side comparison
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

for idx, (models, title) in enumerate([
    (models_scratch, 'From-Scratch'), 
    (models_sklearn, 'Sklearn')
]):
    axes[idx].scatter(X_train, y_train, alpha=0.4, s=20, label='Data')
    
    for tau in quantiles:
        y_plot = models[tau].predict(X_plot)
        label = f'τ={tau}'
        linestyle = '-' if tau == 0.5 else '--'
        axes[idx].plot(X_plot, y_plot, linestyle, linewidth=2, label=label)
    
    axes[idx].set_xlabel('Feature', fontsize=11)
    axes[idx].set_ylabel('Target', fontsize=11)
    axes[idx].set_title(f'{title} Quantile Regression', fontsize=12, fontweight='bold')
    axes[idx].legend(fontsize=9)
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n📊 Both implementations produce similar quantile bands")

---

## 🏭 Real-World Application: Post-Silicon Validation

**Scenario:** Predict process capability bounds for parametric test yield

### �� What's Happening in This Code?

**Purpose:** Simulate post-silicon STDF data for process capability analysis.

**Key Points:**
- **Process capability**: Need 1st percentile (worst-case) and 99th percentile (best-case)
- **Guard-banding**: Use τ=0.01 to set conservative test limits
- **Yield prediction**: τ=0.10 and τ=0.90 bracket 80% of population
- **Business value**: Quantile predictions enable specification optimization
- **Realistic data**: Non-linear V-F relationship with measurement noise

**Why This Matters:** Post-silicon validation needs worst-case/best-case predictions, not averages. Quantile regression provides these directly.

In [None]:
# Generate post-silicon parametric test data
np.random.seed(42)
n_devices = 400

# Test conditions
Voltage = np.random.uniform(0.95, 1.05, n_devices)  # Vdd (V)
Temperature = np.random.uniform(25, 85, n_devices)  # Temp (°C)

# Frequency: Non-linear relationship with voltage
# f ∝ Vdd² with temperature effect and process variation
Frequency_base = 1.8 + 0.5 * (Voltage - 1.0)**2 * 10
Frequency_temp_effect = -0.003 * (Temperature - 25)
Frequency_noise = np.random.normal(0, 0.05, n_devices)

# Add process variation (wider distribution for some devices)
process_variation = np.random.normal(1, 0.1, n_devices)
Frequency = (Frequency_base + Frequency_temp_effect) * process_variation + Frequency_noise

# Create DataFrame
df_capability = pd.DataFrame({
    'Voltage_V': Voltage,
    'Temperature_C': Temperature,
    'Frequency_GHz': Frequency
})

print("\n🔬 Post-Silicon Process Capability Dataset:")
print(df_capability.head(10))
print(f"\nFrequency Statistics:")
print(df_capability['Frequency_GHz'].describe())
print(f"\n1st percentile: {df_capability['Frequency_GHz'].quantile(0.01):.4f} GHz")
print(f"99th percentile: {df_capability['Frequency_GHz'].quantile(0.99):.4f} GHz")

### Train Quantile Models for Process Capability

### 📝 What's Happening in This Code?

**Purpose:** Train multiple quantile models for complete distribution characterization.

**Key Points:**
- **7 quantiles**: 1st, 10th, 25th, 50th, 75th, 90th, 99th percentiles
- **Features**: Voltage and Temperature (common test conditions)
- **Capability bounds**: 1st and 99th percentiles define process capability
- **Median baseline**: 50th percentile as central tendency
- **StandardScaler**: Normalize features for better convergence

**Why This Matters:** Complete distribution model enables specification setting, guard-banding, and yield optimization.

In [None]:
# Prepare data
X_capability = df_capability[['Voltage_V', 'Temperature_C']].values
y_capability = df_capability['Frequency_GHz'].values

X_train_cap, X_test_cap, y_train_cap, y_test_cap = train_test_split(
    X_capability, y_capability, test_size=0.2, random_state=42
)

# Scale features
scaler_cap = StandardScaler()
X_train_cap_scaled = scaler_cap.fit_transform(X_train_cap)
X_test_cap_scaled = scaler_cap.transform(X_test_cap)

# Train multiple quantile models
quantiles_capability = [0.01, 0.10, 0.25, 0.50, 0.75, 0.90, 0.99]
models_capability = {}

for tau in quantiles_capability:
    model = QuantileRegressor(quantile=tau, alpha=0, solver='highs')
    model.fit(X_train_cap_scaled, y_train_cap)
    models_capability[tau] = model

print("\n🔧 Process Capability Models Trained:")
print(f"\n{'Quantile':<12} {'MAE':<10} {'Pred (V=1.0, T=25)':<20}")
print("-" * 50)

# Example prediction at nominal conditions
X_nominal = scaler_cap.transform([[1.0, 25]])
for tau, model in models_capability.items():
    y_pred = model.predict(X_test_cap_scaled)
    mae = mean_absolute_error(y_test_cap, y_pred)
    pred_nominal = model.predict(X_nominal)[0]
    print(f"τ={tau:<9.2f} {mae:<10.4f} {pred_nominal:<20.4f}")

print("\n✅ Quantile predictions span the capability range")

### Visualize Process Capability Distribution

In [None]:
# Create grid for visualization (vary voltage at fixed temperature)
V_grid = np.linspace(0.95, 1.05, 100)
T_fixed = 25  # Room temperature
X_viz = np.column_stack([V_grid, np.full(100, T_fixed)])
X_viz_scaled = scaler_cap.transform(X_viz)

# Get predictions for all quantiles
preds_capability = {}
for tau, model in models_capability.items():
    preds_capability[tau] = model.predict(X_viz_scaled)

# Plot
plt.figure(figsize=(14, 7))

# Data points
plt.scatter(X_train_cap[:, 0], y_train_cap, alpha=0.3, s=20, label='Training Data')

# Quantile lines
colors = ['darkred', 'red', 'orange', 'green', 'orange', 'blue', 'darkblue']
for (tau, preds), color in zip(preds_capability.items(), colors):
    linestyle = '-' if tau == 0.50 else '--'
    linewidth = 2.5 if tau in [0.01, 0.99] else 1.5
    plt.plot(V_grid, preds, linestyle, linewidth=linewidth, color=color,
             label=f'{int(tau*100)}th percentile')

# Capability bands
plt.fill_between(V_grid, preds_capability[0.01], preds_capability[0.99],
                 alpha=0.15, color='gray', label='Process Capability (1st-99th)')
plt.fill_between(V_grid, preds_capability[0.25], preds_capability[0.75],
                 alpha=0.2, color='blue', label='IQR (25th-75th)')

plt.xlabel('Voltage (V)', fontsize=12)
plt.ylabel('Frequency (GHz)', fontsize=12)
plt.title('Process Capability Analysis: Quantile Regression\n(Temperature = 25°C)', 
          fontsize=14, fontweight='bold')
plt.legend(fontsize=9, loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\n📊 Interpretation:")
print("• Dark red/blue lines: 1st and 99th percentiles (capability bounds)")
print("• Green line: Median (50th percentile)")
print("• Blue band: Interquartile range (middle 50% of devices)")
print("• Gray band: Full process capability (98% of devices)")
print("\n💡 Business Impact: Use 1st percentile for guard-banding (conservative test limits)")

---

## 🎯 Real-World Project Ideas

### Post-Silicon Validation Projects (4)

#### 1. **Process Capability Bounds Prediction**
**Objective:** Predict 1st and 99th percentile performance from early test data.

**Business Value:** Set specification limits that maximize yield while meeting reliability requirements. 1% guard-band optimization can increase revenue by $2-5M per product.

**Key Features:** Early parametric tests (first 20% of test flow), wafer-level spatial data, process corner indicators

**Implementation:** Train τ=0.01 and τ=0.99 models, compare to actual extremes, deploy for real-time limit adjustment

**Success Metric:** 95% accuracy predicting true 1st/99th percentiles, enable dynamic guard-banding

---

#### 2. **Worst-Case Yield Prediction**
**Objective:** Predict lower-bound yield (τ=0.05) from wafer test for final test planning.

**Business Value:** Conservative yield forecasts prevent overscheduling final test capacity. Reduces idle time and rush costs.

**Key Features:** Wafer test yields by bin, spatial patterns, lot-level metrics

**Implementation:** Quantile regression on historical wafer→final yield, focus on τ=0.05-0.10 for conservative estimates

**Success Metric:** 90% of actual yields exceed predicted 10th percentile

---

#### 3. **Guard-Band Optimization**
**Objective:** Minimize guard-bands using quantile regression on test-retest data.

**Business Value:** Tighter guard-bands increase yield by 2-5% without sacrificing quality. Direct revenue impact.

**Key Features:** Test-retest correlation data, measurement uncertainty, device performance distributions

**Implementation:** Model τ=0.01 performance vs. test limits, optimize limits to maximize yield while ensuring 99.9% quality

**Success Metric:** Increase yield 3% while maintaining <0.1% field failure rate

---

#### 4. **Specification Limit Setting**
**Objective:** Set optimal spec limits using quantile regression on capability data.

**Business Value:** Data-driven spec limits balance yield vs. market requirements. Prevents over-specifying (lost yield) or under-specifying (quality issues).

**Key Features:** Parametric test distributions, customer requirements, historical field data

**Implementation:** Model multiple quantiles (1st-99th percentile), align specs with market needs and process capability

**Success Metric:** 95% yield at target specs, <0.5% returns

---

### General AI/ML Projects (4)

#### 5. **Financial Risk Assessment**
**Objective:** Predict Value-at-Risk (VaR) using quantile regression on portfolio returns.

**Business Value:** Regulatory requirement (Basel III). Quantile regression provides 1st and 5th percentile loss predictions directly.

**Key Features:** Historical returns, volatility, market indicators, macro factors

**Success Metric:** VaR predictions pass regulatory backtesting (95% coverage)

---

#### 6. **Extreme Weather Prediction**
**Objective:** Predict 95th percentile rainfall/temperature for infrastructure planning.

**Business Value:** Design standards require extreme value predictions, not averages. Prevents under-designed infrastructure.

**Key Features:** Historical weather, climate models, geographical features

**Success Metric:** 90% accuracy for 95th percentile events over 10-year period

---

#### 7. **Healthcare Cost Tail Modeling**
**Objective:** Predict 90th percentile patient costs for budgeting.

**Business Value:** Average costs misleading due to expensive outliers. Quantile regression models high-cost patients.

**Key Features:** Demographics, diagnosis codes, comorbidities, treatment history

**Success Metric:** Budget within 10% of actual costs for high-cost patients

---

#### 8. **Supply Chain Delivery Time Prediction**
**Objective:** Predict 90th percentile delivery times for SLA setting.

**Business Value:** SLAs based on 90th percentile more realistic than averages. Reduces penalty costs from missed deliveries.

**Key Features:** Historical delivery times, carrier, distance, weather, season

**Success Metric:** 90% of deliveries meet predicted 90th percentile SLA

---

## ✅ Key Takeaways

### When to Use Quantile Regression

| **Use Case** | **Quantile Regression** | **OLS Regression** | **SVR** |
|-------------|------------------------|-------------------|--------|
| **Predict mean** | ❌ Use τ=0.5 | ✅ Optimal | ✅ Can work |
| **Predict tails** | ✅ Direct modeling | ❌ Extrapolation risky | ❌ Not designed for this |
| **Heteroscedastic data** | ✅ Models varying spread | ❌ Assumes constant variance | ✅ Robust but not distributional |
| **Outlier robustness** | ✅ Median (τ=0.5) robust | ❌ Sensitive | ✅ Epsilon-insensitive loss |
| **Prediction intervals** | ✅ Direct quantile predictions | ❌ Assumes normality | ❌ Single-point predictions |
| **Risk assessment** | ✅ VaR, CVaR directly | ❌ Indirect | ❌ Not applicable |
| **Guard-banding** | ✅ 1st/99th percentile | ❌ Mean ± k*σ assumes normality | ❌ Not distributional |

### Best Practices

1. **Choose quantiles strategically:**
   - **τ=0.01, 0.05:** Worst-case scenarios (guard-banding, risk)
   - **τ=0.50:** Robust central tendency (median regression)
   - **τ=0.95, 0.99:** Best-case scenarios (capability, target)
   - **Multiple quantiles:** Full distribution (IQR, capability bands)

2. **Hyperparameter tuning:**
   - **alpha:** Regularization (0.0 for no penalty, 0.1-1.0 for high-dimensional data)
   - **solver:** Use `'highs'` (linear programming, fast) or `'interior-point'` (large datasets)
   - **Quantile crossing:** If τ=0.9 predictions < τ=0.5, increase regularization or use monotonicity constraints

3. **Validation:**
   - Check quantile loss for each τ
   - Verify empirical coverage: ~τ% of actuals should be below τ-quantile predictions
   - Plot all quantiles together to check for crossing issues

4. **Production deployment:**
   - Train separate model for each quantile of interest
   - Store all models if predicting multiple quantiles
   - For real-time applications, cache predictions for common input ranges
   - Monitor empirical coverage over time (model drift detection)

### Limitations

- **Computational cost:** Training K quantiles requires K separate models
- **Quantile crossing:** Low quantile predictions can exceed high quantile predictions without constraints
- **Linear relationships:** Standard quantile regression assumes linear effects (use kernel methods or trees for non-linearity)
- **Large datasets:** Can be slow compared to OLS (linear programming solver)

### Next Steps

- **016_Decision_Trees.ipynb:** Non-linear models, feature interactions, automatic variable selection
- **051_Deep_Learning_Intro.ipynb:** Neural networks for complex non-linear quantile regression
- **Advanced quantile methods:** Quantile random forests, quantile gradient boosting, conformalized quantile regression

---

## 📚 References & Further Reading

**Foundational Papers:**
- Koenker & Bassett (1978): "Regression Quantiles" - Original quantile regression paper
- Koenker (2005): "Quantile Regression" (textbook) - Comprehensive treatment

**Applications:**
- Post-silicon: "Statistical Yield Analysis using Quantile Regression" (IEEE TCAD 2019)
- Finance: "Regression Quantiles in Risk Management" (Journal of Risk 2002)

**Sklearn Documentation:**
- `sklearn.linear_model.QuantileRegressor`: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.QuantileRegressor.html
- User guide: https://scikit-learn.org/stable/modules/linear_model.html#quantile-regression

**Advanced Topics:**
- Quantile random forests (Meinshausen 2006)
- Conformalized quantile regression (Romano et al. 2019)
- Quantile gradient boosting (LightGBM, XGBoost)

---

**Notebook Complete!** 🎉

You now understand:
- ✅ Quantile regression theory (check loss, conditional quantiles)
- ✅ From-scratch implementation (subgradient descent)
- ✅ Production sklearn usage (QuantileRegressor)
- ✅ Post-silicon applications (guard-banding, capability bounds)
- ✅ General AI/ML applications (risk assessment, tail prediction)
- ✅ 8 real-world projects to practice

**Next:** `016_Decision_Trees.ipynb` for non-linear modeling and feature interactions.