# 011: Polynomial Regression

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score, validation_curve
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.pipeline import Pipeline
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

print('✅ Libraries imported successfully')
print(f'NumPy version: {np.__version__}')
print(f'Pandas version: {pd.__version__}')

### 📝 What's Happening in This Code?

**Purpose:** Import essential libraries and configure the environment for polynomial regression

**Key Points:**
- **PolynomialFeatures**: Scikit-learn's transformer for creating polynomial and interaction features
- **Pipeline**: Chains preprocessing (polynomial transform) and modeling steps for cleaner code
- **validation_curve**: Tool for analyzing how model performance changes with polynomial degree
- **Configuration**: Reproducibility and visualization setup matching workspace standards

**Why This Matters:**
- Pipeline prevents data leakage by ensuring train/test splits happen before feature engineering
- validation_curve is critical for degree selection to avoid overfitting
- Consistent setup allows fair comparison with linear regression baseline

### 2.1 Generate Non-Linear Synthetic Dataset

Create data with **known polynomial relationships** to validate our implementation.

### 📝 What's Happening in This Code?

**Purpose:** Generate synthetic data with quadratic and cubic relationships for educational validation

**Key Points:**
- **Temperature-Performance**: Quadratic relationship mimics thermal effects (peak performance at optimal temp)
- **Voltage-Frequency**: Cubic relationship models complex V-F curves from semiconductor physics
- **Controlled Noise**: Added to simulate real measurement variance while maintaining ground truth
- **Domain Realism**: Coefficients chosen to reflect realistic semiconductor parameter ranges

**Why This Approach:**
- Knowing true polynomial degree lets us validate model selection methods
- Mimics real STDF patterns where temperature and voltage affect device performance non-linearly
- Allows comparison between different polynomial degrees to demonstrate overfitting

In [None]:
def generate_polynomial_data(n_samples=200, noise_level=5.0):
    """
    Generate synthetic data with polynomial relationships
    Simulates post-silicon validation scenarios
    
    Returns:
        X: Features (temperature, voltage)
        y: Target (device performance score)
    """
    # Temperature feature (Celsius): 20-100°C
    temperature = np.random.uniform(20, 100, n_samples)
    
    # Voltage feature (Volts): 0.8-1.2V
    voltage = np.random.uniform(0.8, 1.2, n_samples)
    
    # Ground truth: Quadratic relationship with temperature (optimal at 60°C)
    # Performance degrades at extreme temperatures
    temp_effect = 100 - 0.02 * (temperature - 60)**2
    
    # Ground truth: Cubic relationship with voltage (complex V-F curve)
    voltage_effect = 50 * voltage**3 - 30 * voltage**2 + 20 * voltage
    
    # Combined performance score with noise
    y = temp_effect + voltage_effect + np.random.normal(0, noise_level, n_samples)
    
    # Create feature matrix
    X = np.column_stack([temperature, voltage])
    
    return X, y

# Generate dataset
X, y = generate_polynomial_data(n_samples=200, noise_level=3.0)

# Create DataFrame for better visualization
df = pd.DataFrame(X, columns=['Temperature_C', 'Voltage_V'])
df['Performance_Score'] = y

print('✅ Synthetic polynomial dataset generated')
print(f'Dataset shape: {df.shape}')
print('\nFirst 5 samples:')
print(df.head())
print('\nDataset statistics:')
print(df.describe())

### 2.2 Exploratory Data Analysis

### 📝 What's Happening in This Code?

**Purpose:** Visualize relationships to identify non-linearity before modeling

**Key Points:**
- **Scatter plots**: Reveal curved patterns that linear regression cannot capture
- **Temperature plot**: Shows inverted U-shape (quadratic) with peak performance at mid-range
- **Voltage plot**: Shows cubic curve with complex non-monotonic behavior
- **Visual evidence**: Justifies using polynomial regression over linear baseline

**Why This Matters:**
- Visual inspection is first step in determining polynomial degree
- Helps set expectations for model performance
- Documents data characteristics for stakeholders (e.g., "performance peaks at 60°C")

In [None]:
# Visualize relationships
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Temperature vs Performance
axes[0].scatter(df['Temperature_C'], df['Performance_Score'], alpha=0.6, edgecolor='k')
axes[0].set_xlabel('Temperature (°C)', fontsize=12)
axes[0].set_ylabel('Performance Score', fontsize=12)
axes[0].set_title('Temperature vs Performance\n(Non-linear Quadratic Relationship)', fontsize=13, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Voltage vs Performance
axes[1].scatter(df['Voltage_V'], df['Performance_Score'], alpha=0.6, edgecolor='k', color='coral')
axes[1].set_xlabel('Voltage (V)', fontsize=12)
axes[1].set_ylabel('Performance Score', fontsize=12)
axes[1].set_title('Voltage vs Performance\n(Non-linear Cubic Relationship)', fontsize=13, fontweight='bold')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print('📊 Visual inspection confirms non-linear relationships')
print('   → Temperature shows inverted U-shape (quadratic)')
print('   → Voltage shows complex curve (cubic)')

### 2.3 Train-Test Split

### 📝 What's Happening in This Code?

**Purpose:** Split data for unbiased model evaluation

**Key Points:**
- **80-20 split**: Standard ratio balancing training data quantity and test reliability
- **Random state**: Ensures reproducible splits for consistent experimentation
- **Stratification not needed**: Continuous target variable (unlike classification)
- **Validation importance**: Critical for detecting overfitting in polynomial models

**Why This Matters:**
- Polynomial models are prone to overfitting - test set reveals this
- Without proper validation, high-degree polynomials appear artificially good
- Simulates production scenario where model faces unseen data

In [None]:
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(f'✅ Data split completed')
print(f'Training samples: {X_train.shape[0]}')
print(f'Test samples: {X_test.shape[0]}')
print(f'Feature count: {X_train.shape[1]}')

---

## 3. Mathematical Foundation

### 3.1 Polynomial Feature Transformation

Given features $x_1, x_2$, polynomial features of degree $d=2$ include:

$$\text{Original: } [x_1, x_2]$$

$$\text{Degree 2: } [1, x_1, x_2, x_1^2, x_1 x_2, x_2^2]$$

For $n$ features and degree $d$, number of polynomial features:

$$\text{Number of features} = \binom{n+d}{d} = \frac{(n+d)!}{n! \cdot d!}$$

**Example:** 2 features, degree 3 → $\binom{2+3}{3} = \binom{5}{3} = 10$ features

### 3.2 Model Fitting

After transformation, we fit standard linear regression:

$$\hat{y} = \mathbf{X}_{\text{poly}} \boldsymbol{\beta}$$

Where $\mathbf{X}_{\text{poly}}$ contains all polynomial terms.

### 3.3 The Bias-Variance Tradeoff

```mermaid
graph LR
    A[Low Degree<br/>High Bias<br/>Underfitting] --> B[Optimal Degree<br/>Balanced]
    B --> C[High Degree<br/>High Variance<br/>Overfitting]
    style B fill:#4CAF50,stroke:#333,stroke-width:2px,color:#fff
    style A fill:#FF9800,stroke:#333,stroke-width:2px,color:#fff
    style C fill:#f44336,stroke:#333,stroke-width:2px,color:#fff
```

- **Underfitting (degree too low)**: Model too simple, misses true relationship
- **Optimal**: Captures true pattern without memorizing noise
- **Overfitting (degree too high)**: Model memorizes training noise, poor generalization

---

## 4. Implementation from Scratch

Build polynomial regression manually to understand the mechanics.

### 📝 What's Happening in This Code?

**Purpose:** Implement polynomial regression from scratch for educational understanding

**Key Points:**
- **Manual feature generation**: Creates polynomial terms using nested loops over features and degrees
- **Normal equation**: Computes optimal coefficients $\boldsymbol{\beta} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$
- **Interaction terms**: Includes cross-products like $x_1 x_2$ for feature interactions
- **Validation**: Compare against sklearn to ensure correctness

**Why This Matters:**
- Demystifies polynomial regression as "linear regression on transformed features"
- Shows how feature explosion happens with high degrees
- Understanding internals helps debug production issues

In [None]:
class PolynomialRegressionScratch:
    """
    Polynomial Regression implemented from scratch
    """
    def __init__(self, degree=2):
        self.degree = degree
        self.coefficients = None
        self.intercept = None
        
    def _create_polynomial_features(self, X):
        """
        Transform features to polynomial features
        """
        n_samples, n_features = X.shape
        
        # Start with bias term (column of ones)
        X_poly = [np.ones(n_samples)]
        
        # Add polynomial terms
        for d in range(1, self.degree + 1):
            for i in range(n_features):
                X_poly.append(X[:, i] ** d)
                
                # Add interaction terms for degree > 1
                if d > 1:
                    for j in range(i + 1, n_features):
                        X_poly.append((X[:, i] ** (d-1)) * X[:, j])
        
        return np.column_stack(X_poly)
    
    def fit(self, X, y):
        """
        Fit polynomial regression using normal equation
        """
        # Transform to polynomial features
        X_poly = self._create_polynomial_features(X)
        
        # Normal equation: β = (X^T X)^(-1) X^T y
        X_transpose = X_poly.T
        
        try:
            # Compute coefficients
            coefficients = np.linalg.inv(X_transpose @ X_poly) @ X_transpose @ y
            
            self.intercept = coefficients[0]
            self.coefficients = coefficients[1:]
            
            return self
            
        except np.linalg.LinAlgError:
            raise ValueError("Matrix is singular. Try regularization or reduce degree.")
    
    def predict(self, X):
        """
        Make predictions
        """
        X_poly = self._create_polynomial_features(X)
        return X_poly @ np.concatenate([[self.intercept], self.coefficients])
    
    def score(self, X, y):
        """
        Calculate R² score
        """
        y_pred = self.predict(X)
        ss_res = np.sum((y - y_pred) ** 2)
        ss_tot = np.sum((y - np.mean(y)) ** 2)
        return 1 - (ss_res / ss_tot)

# Train from-scratch model (degree 2)
model_scratch = PolynomialRegressionScratch(degree=2)
model_scratch.fit(X_train, y_train)

# Evaluate
train_r2_scratch = model_scratch.score(X_train, y_train)
test_r2_scratch = model_scratch.score(X_test, y_test)
y_pred_scratch = model_scratch.predict(X_test)
rmse_scratch = np.sqrt(mean_squared_error(y_test, y_pred_scratch))

print('✅ From-Scratch Polynomial Regression (Degree 2)')
print(f'Training R²: {train_r2_scratch:.4f}')
print(f'Test R²: {test_r2_scratch:.4f}')
print(f'Test RMSE: {rmse_scratch:.4f}')
print(f'\nNumber of coefficients: {len(model_scratch.coefficients) + 1}')

---

## 5. Production Implementation with Scikit-learn

### 📝 What's Happening in This Code?

**Purpose:** Use production-ready sklearn pipeline for polynomial regression

**Key Points:**
- **Pipeline**: Chains PolynomialFeatures → StandardScaler → LinearRegression in one object
- **StandardScaler**: Essential for polynomial features to prevent numerical instability
- **include_bias=False**: Pipeline's LinearRegression adds intercept, avoid duplication
- **Degree sweep**: Test degrees 1-5 to find optimal complexity

**Why This Matters:**
- Pipeline ensures consistent preprocessing in train/test/production
- StandardScaler prevents overflow with high-degree polynomial terms
- Systematic degree evaluation prevents arbitrary choices

In [None]:
# Compare multiple polynomial degrees
degrees = [1, 2, 3, 4, 5]
results = []

for degree in degrees:
    # Create pipeline
    pipeline = Pipeline([
        ('poly_features', PolynomialFeatures(degree=degree, include_bias=False)),
        ('scaler', StandardScaler()),
        ('regressor', LinearRegression())
    ])
    
    # Train
    pipeline.fit(X_train, y_train)
    
    # Evaluate
    train_r2 = pipeline.score(X_train, y_train)
    test_r2 = pipeline.score(X_test, y_test)
    y_pred = pipeline.predict(X_test)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    mae = mean_absolute_error(y_test, y_pred)
    
    # Cross-validation score
    cv_scores = cross_val_score(pipeline, X_train, y_train, cv=5, 
                                 scoring='r2')
    cv_mean = cv_scores.mean()
    
    results.append({
        'Degree': degree,
        'Train_R2': train_r2,
        'Test_R2': test_r2,
        'CV_R2_Mean': cv_mean,
        'Test_RMSE': rmse,
        'Test_MAE': mae,
        'Overfitting_Gap': train_r2 - test_r2
    })

# Create results DataFrame
results_df = pd.DataFrame(results)

print('📊 Polynomial Regression Results (Multiple Degrees)\n')
print(results_df.to_string(index=False))

# Identify best degree
best_degree = results_df.loc[results_df['Test_R2'].idxmax(), 'Degree']
print(f'\n🎯 Best polynomial degree: {int(best_degree)} (highest Test R²)')

### 5.1 Visualize Degree Selection

### 📝 What's Happening in This Code?

**Purpose:** Visualize bias-variance tradeoff to select optimal polynomial degree

**Key Points:**
- **Training R² trend**: Monotonically increases with degree (models memorize training data)
- **Test R² trend**: Increases then decreases (overfitting after optimal point)
- **Overfitting gap**: Widening gap between train/test indicates overfitting
- **Sweet spot**: Degree where test R² peaks before declining

**Why This Matters:**
- Visual proof of overfitting - crucial for explaining to stakeholders
- Objective criterion for degree selection (not guesswork)
- Documents model selection process for regulatory compliance (e.g., FDA, ISO)

In [None]:
# Visualize performance vs degree
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# R² scores
axes[0].plot(results_df['Degree'], results_df['Train_R2'], marker='o', 
             label='Training R²', linewidth=2)
axes[0].plot(results_df['Degree'], results_df['Test_R2'], marker='s', 
             label='Test R²', linewidth=2)
axes[0].plot(results_df['Degree'], results_df['CV_R2_Mean'], marker='^', 
             label='CV R² (mean)', linewidth=2, linestyle='--')
axes[0].axvline(best_degree, color='red', linestyle='--', alpha=0.7, 
                label=f'Best Degree ({int(best_degree)})')
axes[0].set_xlabel('Polynomial Degree', fontsize=12)
axes[0].set_ylabel('R² Score', fontsize=12)
axes[0].set_title('Model Performance vs Polynomial Degree', fontsize=13, fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
axes[0].set_xticks(degrees)

# Overfitting gap
axes[1].bar(results_df['Degree'], results_df['Overfitting_Gap'], 
            color=['green' if d == best_degree else 'coral' for d in results_df['Degree']],
            edgecolor='black', alpha=0.7)
axes[1].set_xlabel('Polynomial Degree', fontsize=12)
axes[1].set_ylabel('Train R² - Test R² (Overfitting Gap)', fontsize=12)
axes[1].set_title('Overfitting Detection', fontsize=13, fontweight='bold')
axes[1].grid(True, alpha=0.3, axis='y')
axes[1].set_xticks(degrees)

plt.tight_layout()
plt.show()

print('📈 Interpretation:')
print(f'   → Degree {int(best_degree)} provides best generalization')
print(f'   → Higher degrees show increasing overfitting gap')
print(f'   → Training R² keeps improving, but test R² degrades')

### 5.2 Train Final Model with Optimal Degree

### 📝 What's Happening in This Code?

**Purpose:** Train production model with validated optimal degree

**Key Points:**
- **Best degree selection**: Based on empirical test R² performance, not assumptions
- **Final retraining**: Use optimal hyperparameter for production deployment
- **Feature count**: Track polynomial feature explosion for memory planning
- **Model persistence**: Production model ready for serialization (pickle/joblib)

**Why This Matters:**
- Systematic hyperparameter selection builds confidence in model
- Knowing feature count helps capacity planning for production inference
- Clear validation path supports model governance and auditing

In [None]:
# Train final model with best degree
final_model = Pipeline([
    ('poly_features', PolynomialFeatures(degree=int(best_degree), include_bias=False)),
    ('scaler', StandardScaler()),
    ('regressor', LinearRegression())
])

final_model.fit(X_train, y_train)

# Predictions
y_train_pred = final_model.predict(X_train)
y_test_pred = final_model.predict(X_test)

# Metrics
print(f'\n🎯 Final Model (Polynomial Degree {int(best_degree)}) Performance:')
print('='*60)
print(f'Training R²:   {r2_score(y_train, y_train_pred):.4f}')
print(f'Test R²:       {r2_score(y_test, y_test_pred):.4f}')
print(f'Test RMSE:     {np.sqrt(mean_squared_error(y_test, y_test_pred)):.4f}')
print(f'Test MAE:      {mean_absolute_error(y_test, y_test_pred):.4f}')

# Feature count
poly_features = final_model.named_steps['poly_features']
n_poly_features = poly_features.transform(X_train[:1]).shape[1]
print(f'\nPolynomial features generated: {n_poly_features}')

---

## 6. Model Diagnostics and Validation

### 📝 What's Happening in This Code?

**Purpose:** Validate assumptions and detect potential issues through residual analysis

**Key Points:**
- **Predicted vs Actual**: Scatter around 45° line indicates good fit
- **Residual plot**: Random scatter (no patterns) confirms model captures relationship
- **Residual distribution**: Should be approximately normal (Gaussian) with mean ≈ 0
- **QQ plot**: Points on diagonal line confirm normality assumption

**Why This Matters:**
- Patterns in residuals indicate model misspecification (need higher degree or different approach)
- Non-normal residuals invalidate confidence intervals and p-values
- Stakeholders need visual proof that model is trustworthy

In [None]:
# Diagnostic plots
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Predicted vs Actual
axes[0, 0].scatter(y_test, y_test_pred, alpha=0.6, edgecolor='k')
axes[0, 0].plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 
                'r--', lw=2, label='Perfect Prediction')
axes[0, 0].set_xlabel('Actual Performance Score', fontsize=12)
axes[0, 0].set_ylabel('Predicted Performance Score', fontsize=12)
axes[0, 0].set_title('Predicted vs Actual (Test Set)', fontsize=13, fontweight='bold')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# 2. Residual Plot
residuals = y_test - y_test_pred
axes[0, 1].scatter(y_test_pred, residuals, alpha=0.6, edgecolor='k')
axes[0, 1].axhline(0, color='r', linestyle='--', lw=2)
axes[0, 1].set_xlabel('Predicted Performance Score', fontsize=12)
axes[0, 1].set_ylabel('Residuals', fontsize=12)
axes[0, 1].set_title('Residual Plot (Test Set)', fontsize=13, fontweight='bold')
axes[0, 1].grid(True, alpha=0.3)

# 3. Residual Distribution
axes[1, 0].hist(residuals, bins=20, edgecolor='black', alpha=0.7)
axes[1, 0].axvline(residuals.mean(), color='r', linestyle='--', lw=2, 
                   label=f'Mean: {residuals.mean():.2f}')
axes[1, 0].set_xlabel('Residuals', fontsize=12)
axes[1, 0].set_ylabel('Frequency', fontsize=12)
axes[1, 0].set_title('Residual Distribution', fontsize=13, fontweight='bold')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3, axis='y')

# 4. Q-Q Plot
from scipy import stats
stats.probplot(residuals, dist="norm", plot=axes[1, 1])
axes[1, 1].set_title('Q-Q Plot (Normality Check)', fontsize=13, fontweight='bold')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print('🔍 Diagnostic Analysis:')
print(f'   → Residual mean: {residuals.mean():.4f} (should be ≈ 0)')
print(f'   → Residual std: {residuals.std():.4f}')
print(f'   → Min residual: {residuals.min():.4f}')
print(f'   → Max residual: {residuals.max():.4f}')

### 6.1 Feature Importance (Polynomial Coefficients)

### 📝 What's Happening in This Code?

**Purpose:** Interpret which polynomial terms contribute most to predictions

**Key Points:**
- **Coefficient magnitudes**: Larger absolute values indicate stronger feature influence
- **Feature names**: Generated automatically by PolynomialFeatures (e.g., 'x0^2', 'x0 x1')
- **Scaling context**: StandardScaler normalizes features, so coefficients are comparable
- **Interaction terms**: Identifies important feature interactions (e.g., temp × voltage)

**Why This Matters:**
- Explains model behavior to domain experts (e.g., "quadratic temperature term dominates")
- Identifies physics relationships (e.g., voltage squared term confirms power law)
- Supports feature engineering decisions for future models

In [None]:
# Extract coefficients
regressor = final_model.named_steps['regressor']
poly_features = final_model.named_steps['poly_features']

# Get feature names
feature_names = poly_features.get_feature_names_out(['Temperature', 'Voltage'])
coefficients = regressor.coef_

# Create DataFrame
coef_df = pd.DataFrame({
    'Feature': feature_names,
    'Coefficient': coefficients,
    'Abs_Coefficient': np.abs(coefficients)
}).sort_values('Abs_Coefficient', ascending=False)

print('📊 Top 10 Most Important Polynomial Features:\n')
print(coef_df.head(10).to_string(index=False))

# Visualize top features
top_features = coef_df.head(10)

plt.figure(figsize=(12, 6))
colors = ['green' if c >= 0 else 'red' for c in top_features['Coefficient']]
plt.barh(top_features['Feature'], top_features['Coefficient'], color=colors, 
         edgecolor='black', alpha=0.7)
plt.xlabel('Coefficient Value', fontsize=12)
plt.ylabel('Polynomial Feature', fontsize=12)
plt.title('Top 10 Most Important Features (After Scaling)', fontsize=13, fontweight='bold')
plt.axvline(0, color='black', linewidth=0.8)
plt.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

---

## 7. Comparison: Linear vs Polynomial Regression

### 📝 What's Happening in This Code?

**Purpose:** Quantitatively demonstrate value of polynomial regression over linear baseline

**Key Points:**
- **Linear baseline**: Simple model (degree 1) for comparison
- **Performance gain**: Polynomial model's improvement over linear in R² and RMSE
- **Overfitting check**: Ensure polynomial doesn't sacrifice test performance for training fit
- **Visualization**: Side-by-side predictions show polynomial captures curvature

**Why This Matters:**
- Justifies model complexity to stakeholders ("25% R² improvement")
- Documents modeling decision for audit trails
- Validates domain intuition about non-linearity

In [None]:
# Train baseline linear model
linear_model = Pipeline([
    ('scaler', StandardScaler()),
    ('regressor', LinearRegression())
])
linear_model.fit(X_train, y_train)

y_test_pred_linear = linear_model.predict(X_test)

# Compare metrics
comparison = pd.DataFrame({
    'Model': ['Linear Regression', f'Polynomial (Degree {int(best_degree)})'],
    'Test_R2': [
        r2_score(y_test, y_test_pred_linear),
        r2_score(y_test, y_test_pred)
    ],
    'Test_RMSE': [
        np.sqrt(mean_squared_error(y_test, y_test_pred_linear)),
        np.sqrt(mean_squared_error(y_test, y_test_pred))
    ],
    'Test_MAE': [
        mean_absolute_error(y_test, y_test_pred_linear),
        mean_absolute_error(y_test, y_test_pred)
    ]
})

print('📊 Model Comparison: Linear vs Polynomial\n')
print(comparison.to_string(index=False))

# Calculate improvements
r2_improvement = ((comparison.loc[1, 'Test_R2'] - comparison.loc[0, 'Test_R2']) / 
                  comparison.loc[0, 'Test_R2'] * 100)
rmse_improvement = ((comparison.loc[0, 'Test_RMSE'] - comparison.loc[1, 'Test_RMSE']) / 
                    comparison.loc[0, 'Test_RMSE'] * 100)

print(f'\n🎯 Improvements:')
print(f'   → R² improved by {r2_improvement:.1f}%')
print(f'   → RMSE reduced by {rmse_improvement:.1f}%')

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Linear predictions
axes[0].scatter(y_test, y_test_pred_linear, alpha=0.6, edgecolor='k')
axes[0].plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 
             'r--', lw=2)
axes[0].set_xlabel('Actual', fontsize=12)
axes[0].set_ylabel('Predicted', fontsize=12)
axes[0].set_title(f'Linear Regression\nTest R²: {comparison.loc[0, "Test_R2"]:.4f}', 
                  fontsize=13, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Polynomial predictions
axes[1].scatter(y_test, y_test_pred, alpha=0.6, edgecolor='k', color='coral')
axes[1].plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 
             'r--', lw=2)
axes[1].set_xlabel('Actual', fontsize=12)
axes[1].set_ylabel('Predicted', fontsize=12)
axes[1].set_title(f'Polynomial Regression (Degree {int(best_degree)})\nTest R²: {comparison.loc[1, "Test_R2"]:.4f}', 
                  fontsize=13, fontweight='bold')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## 8. Real-World Projects

Apply polynomial regression to practical scenarios with implementation guidance.

---

### 🔬 Post-Silicon Validation Projects

#### **Project 1: Temperature-Performance Curve Characterization**

**Objective:** Model device performance across temperature range to identify optimal operating conditions and thermal limits.

**Business Value:**
- Optimize thermal design of products
- Set temperature specifications for datasheets
- Predict field performance under varying conditions
- Reduce thermal testing time by 40%

**Dataset Features:**
- Temperature (°C): Junction temperature measurements
- Performance metrics: Speed, power, error rate
- Environmental: Ambient temp, cooling method
- Device ID: Track individual unit variations

**Implementation Tips:**
- Start with degree 2 (quadratic) - most thermal curves are parabolic
- Include interaction term: temperature × voltage
- Validate at extreme temperatures (corner cases)
- Use physically meaningful features (Kelvin scale for reactions)

**Expected Outcomes:**
- Identify optimal temperature for peak performance
- Quantify performance degradation at extremes
- R² > 0.85 indicates reliable characterization

---

#### **Project 2: Voltage-Frequency (V-F) Curve Modeling**

**Objective:** Predict maximum operating frequency at different voltage levels for power-performance optimization.

**Business Value:**
- Enable dynamic voltage-frequency scaling (DVFS)
- Optimize power vs performance tradeoffs
- Support low-power product variants
- Improve battery life by 25%

**Dataset Features:**
- Supply voltage (V): Core voltage levels
- Max frequency (MHz): Measured at each voltage
- Process corner: Fast, typical, slow
- Temperature: Operating temperature during test

**Implementation Tips:**
- Polynomial degree 2-3 typical for V-F curves
- Physical constraint: Frequency must increase with voltage
- Consider separate models per process corner
- Validate against silicon measurements at 5+ voltage points

**Expected Outcomes:**
- Accurate frequency prediction within 5%
- Identify voltage scaling limits
- Support DVFS algorithms in firmware

---

#### **Project 3: Device Aging and Degradation Modeling**

**Objective:** Predict long-term parameter drift due to aging effects (NBTI, HCI, EM) for reliability engineering.

**Business Value:**
- Set product lifetime warranties
- Design guardbands for end-of-life performance
- Predict field failure rates
- Reduce qualification testing by 30%

**Dataset Features:**
- Time (hours): Stress test duration
- Parameter drift: Threshold voltage shift, speed degradation
- Stress conditions: Voltage, temperature, duty cycle
- Initial value: Baseline parameter measurement

**Implementation Tips:**
- Aging often follows power-law: $\Delta V_{th} \propto t^n$ (n ≈ 0.25-0.5)
- Transform time feature: $\log(t)$ or $t^{0.5}$
- Polynomial degree 2 after transformation
- Include temperature acceleration factor

**Expected Outcomes:**
- Predict 10-year drift from 1000-hour stress test
- R² > 0.90 needed for reliability predictions
- Validate against industry models (e.g., Black's equation)

---

#### **Project 4: Non-Linear Parametric Test Correlation**

**Objective:** Find non-linear relationships between parametric test results to optimize test flow and reduce test time.

**Business Value:**
- Eliminate redundant tests
- Reduce ATE test time by 20%
- Predict expensive tests from cheap ones
- Lower cost per device tested

**Dataset Features:**
- Fast tests: Digital patterns, basic DC tests
- Slow tests: Detailed AC characterization, high-precision measurements
- Correlations: Non-linear relationships between parameters
- Device binning: Pass/fail categories

**Implementation Tips:**
- Use polynomial degree 2-3 for test correlations
- Create interaction terms between related tests
- Cross-validate on different lots to avoid overfitting
- Threshold predictions for pass/fail decisions

**Expected Outcomes:**
- Predict slow test results with 95% accuracy
- Skip 15-20% of expensive tests
- Maintain quality (yield/defect level)

---

### 📊 General AI/ML Projects

#### **Project 5: Marketing Response Curve Optimization**

**Objective:** Model diminishing returns in marketing spend to optimize budget allocation across channels.

**Business Value:**
- Maximize ROI on marketing spend
- Identify saturation points per channel
- Optimize budget allocation
- Improve efficiency by 30%

**Dataset Features:**
- Marketing spend ($): Investment per channel
- Conversions: Sales, leads, sign-ups
- Channel: Email, social, paid search
- Time: Campaign duration

**Implementation Tips:**
- Degree 2 polynomial captures diminishing returns
- Separate models per channel
- Include interaction: spend × time
- Constrain: Response must increase with spend (enforce monotonicity)

---

#### **Project 6: Growth Trajectory Forecasting (S-Curves)**

**Objective:** Predict user/revenue growth following S-curve patterns (slow start, rapid growth, saturation).

**Business Value:**
- Forecast revenue for planning
- Identify growth stage (early, rapid, mature)
- Set realistic targets
- Support investor presentations

**Dataset Features:**
- Time: Days/months since launch
- Growth metric: Users, revenue, engagement
- Marketing events: Campaigns, launches
- External factors: Seasonality, competition

**Implementation Tips:**
- Logistic transformation: $\log(\frac{y}{K-y})$ where K = saturation
- Polynomial degree 2-3 on transformed target
- Validate saturation point estimate with domain experts

---

#### **Project 7: Price Elasticity Modeling (Non-Linear Demand)**

**Objective:** Model non-linear relationship between price and demand to optimize pricing strategy.

**Business Value:**
- Maximize revenue (price × volume)
- Identify optimal price point
- Understand customer price sensitivity
- Increase profit margins by 15%

**Dataset Features:**
- Price: Product price points tested
- Demand: Units sold
- Competitor prices: Market context
- Customer segment: Premium, budget

**Implementation Tips:**
- Degree 2-3 polynomial captures elasticity curves
- Log-transform features for multiplicative effects
- Include interaction: price × competitor_price
- Validate against economic theory (downward sloping demand)

---

#### **Project 8: Environmental Trend Analysis (Climate Data)**

**Objective:** Model non-linear environmental trends (temperature, pollution) for policy and forecasting.

**Business Value:**
- Long-term climate predictions
- Policy impact assessment
- Risk management
- Support sustainability goals

**Dataset Features:**
- Time: Years/decades
- Environmental metric: Temperature, CO2, air quality
- Location: Geographic coordinates
- Human activity: Emissions, industrial output

**Implementation Tips:**
- High-degree polynomials (3-5) for long-term trends
- Careful validation - avoid extrapolation beyond data range
- Include seasonal components (Fourier terms)
- Cross-validate across geographic regions

---

## 9. Key Takeaways

### ✅ When to Use Polynomial Regression

1. **Visual evidence** of curvature in scatter plots
2. **Domain knowledge** suggests polynomial relationships
3. **Residual patterns** from linear regression show systematic errors
4. **Moderate data size** (risk of overfitting with small datasets)

### ⚠️ Limitations and Alternatives

**Limitations:**
- **Feature explosion**: $O(n^d)$ features grow rapidly with degree
- **Extrapolation risk**: Poor predictions outside training range
- **Overfitting prone**: Requires careful validation
- **Interpretability loss**: High-degree coefficients hard to explain

**Better Alternatives:**
- **Splines/GAMs**: For complex curves with many local changes
- **Tree-based models**: For arbitrary non-linearity with minimal tuning
- **Neural networks**: For very complex multi-dimensional non-linearity
- **Domain transformations**: Log, sqrt, exp transformations may linearize relationship

### 🎯 Best Practices

1. **Start simple**: Try linear first, add complexity if justified
2. **Cross-validate**: Always use validation set for degree selection
3. **Visualize**: Plot training and test performance vs degree
4. **Scale features**: StandardScaler essential for polynomial features
5. **Monitor overfitting**: Watch gap between train and test R²
6. **Use pipelines**: Prevent data leakage in preprocessing
7. **Physical constraints**: Enforce monotonicity or bounds if domain requires
8. **Regularization**: Consider Ridge/Lasso for high-degree polynomials

### 📚 Next Learning Steps

After mastering polynomial regression, explore:

1. **`012_Ridge_Lasso_ElasticNet.ipynb`** - Regularization for polynomial models
2. **`016_Decision_Trees.ipynb`** - Non-parametric non-linearity
3. **Splines** - Piecewise polynomials for smoother curves
4. **Kernel methods** - Implicit infinite-degree polynomials (SVM)

### 🔑 Core Concepts Mastered

✅ Polynomial feature transformation  
✅ Bias-variance tradeoff in practice  
✅ Degree selection via validation curves  
✅ Overfitting detection and prevention  
✅ Pipeline usage for clean ML workflows  
✅ Domain-specific applications (post-silicon + general)  

---

**Congratulations!** You now understand how to capture non-linear relationships while avoiding overfitting. This is a critical skill for real-world ML where relationships are rarely perfectly linear.

Continue to **012_Ridge_Lasso_ElasticNet** to learn regularization techniques that stabilize polynomial models.

---

## Appendix: Mathematical Derivation

### Feature Expansion Example

For input $\mathbf{x} = [x_1, x_2]$ and degree $d=2$:

$$\phi(\mathbf{x}) = [1, x_1, x_2, x_1^2, x_1 x_2, x_2^2]$$

Model becomes:

$$\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1^2 + \beta_4 x_1 x_2 + \beta_5 x_2^2$$

This is **still linear in parameters** $\boldsymbol{\beta}$, so normal equation applies:

$$\boldsymbol{\beta} = (\mathbf{X}_{\text{poly}}^T \mathbf{X}_{\text{poly}})^{-1} \mathbf{X}_{\text{poly}}^T \mathbf{y}$$

### Overfitting Illustration

As degree increases:

- **Degree 1**: Underfits (high bias, low variance)
- **Degree 2-3**: Optimal (balanced bias-variance)
- **Degree 10+**: Overfits (low bias, high variance)

Training error always decreases, but test error U-shaped:

$$\text{Test Error} = \text{Bias}^2 + \text{Variance} + \text{Irreducible Error}$$

Optimal degree minimizes total test error.

---

**End of Notebook**