# 014: Support Vector Regression (SVR)

## üéØ Learning Objectives

By the end of this notebook, you will:
- **Understand** SVR's epsilon-insensitive loss and kernel trick
- **Implement** linear and non-linear SVR with RBF/polynomial kernels
- **Master** hyperparameter tuning (C, epsilon, gamma) for optimal performance
- **Apply** SVR to robust regression with outliers in test data
- **Build** accurate predictors resilient to measurement noise and anomalies

## üìö What is Support Vector Regression?

SVR extends support vector machines to regression by finding a function within an epsilon tube around the data. It's robust to outliers and handles non-linear relationships via kernels.

**Why SVR?**
- ‚úÖ Robust to outliers (measurement errors, equipment glitches)
- ‚úÖ Non-linear modeling via kernels (voltage-frequency curves)
- ‚úÖ Sparse solution (only support vectors matter)
- ‚úÖ Handles high-dimensional data (100+ test parameters)

## üè≠ Post-Silicon Validation Use Cases

**Robust Parametric Prediction**
- Input: Test data with 5-10% outliers (sensor spikes, ESD events)
- Output: SVR model ignoring outliers, accurate predictions
- Value: 88% accuracy vs 72% OLS with outliers, save $4M/year

**Non-Linear V-F Characterization**
- Input: Voltage-frequency pairs (device speed vs power)
- Output: RBF kernel SVR fitting complex non-linear relationship
- Value: Precise speed binning, optimize performance 15%

**Multi-Site Correlation Analysis**
- Input: Parallel test site measurements (site 1-8 readings)
- Output: SVR predicting site 1 from sites 2-8 (detect drifts)
- Value: Early calibration issues, prevent 2-3% yield loss

**Yield Forecasting Under Uncertainty**
- Input: Noisy historical yield data, process variations
- Output: Robust SVR prediction with confidence intervals
- Value: Better capacity planning, reduce inventory costs 20%

---

Let's master Support Vector Regression! üöÄ

# 014: Support Vector Regression (SVR)

## üéØ Learning Objectives

By the end of this notebook, you will:
- **Understand** epsilon-insensitive loss and support vectors
- **Master** kernel trick (linear, RBF, polynomial) for non-linear patterns
- **Implement** SVR from scratch and with sklearn
- **Apply** robust regression to noisy semiconductor test data
- **Tune** hyperparameters (C, epsilon, gamma) for optimal performance

## üìö What is Support Vector Regression?

**Support Vector Regression (SVR)** is a robust regression algorithm that fits a tube of width epsilon (Œµ) around the predicted function, ignoring errors within the tube. Only points outside the tube (support vectors) contribute to the model.

Key equation (epsilon-insensitive loss):
$$L_\varepsilon(y, \hat{y}) = \begin{cases} 0 & \text{if } |y - \hat{y}| \leq \varepsilon \\ |y - \hat{y}| - \varepsilon & \text{otherwise} \end{cases}$$

**Why SVR?**
- ‚úÖ Robust to outliers (epsilon tube ignores small errors)
- ‚úÖ Kernel trick handles non-linear relationships
- ‚úÖ Sparse solution (only support vectors matter)
- ‚úÖ Effective in high-dimensional spaces (p > n)

## üè≠ Post-Silicon Validation Use Cases

**Outlier-Robust Yield Prediction**
- Input: Parametric test data with measurement noise and outliers
- Output: Robust yield estimates ignoring transient noise
- Value: Stable predictions for capacity planning ($5-10M)

**Noise-Resistant Test Time Modeling**
- Input: Test time measurements with ATE jitter and variability
- Output: SVR model that filters out noise patterns
- Value: Accurate test time forecasting (15-25% optimization)

**Non-Linear V-F Characterization**
- Input: Voltage-frequency sweep data (non-linear relationships)
- Output: RBF kernel SVR capturing complex V-F curves
- Value: Precise power-performance modeling ($3-8M)

**Extreme Condition Performance**
- Input: Stress test results (temperature, voltage corners)
- Output: SVR predicting behavior at untested corners
- Value: Reduced characterization cost (30-40% fewer tests)

## üîÑ SVR Workflow

```mermaid
graph LR
    A[Training Data] --> B[Kernel Transformation]
    B --> C[Epsilon Tube Fitting]
    C --> D[Support Vector Selection]
    D --> E[Sparse Model]
    E --> F[Robust Predictions]
    
    style A fill:#e1f5ff
    style F fill:#e1ffe1
```

## üìä Learning Path Context

**Prerequisites:**
- 010: Linear Regression (regression fundamentals)
- 012: Ridge/Lasso (regularization concepts)

**Next Steps:**
- 024: Support Vector Machines (SVC for classification)
- 015: Quantile Regression (another robust approach)

---

Let's master robust regression with SVR! üöÄ

## üì¶ Setup and Imports

### üìù What's Happening in This Code?

**Purpose:** Import essential libraries for SVR implementation, visualization, and evaluation.

**Key Points:**
- **NumPy**: Core mathematical operations for from-scratch SVR implementation
- **Pandas**: Data manipulation for post-silicon STDF datasets
- **Matplotlib/Seaborn**: Visualize epsilon tubes, support vectors, kernel effects
- **Scikit-learn**: Production-ready SVR, StandardScaler (SVR requires scaled features), metrics
- **Warnings**: Suppress convergence warnings during hyperparameter search

**Why This Matters:** SVR is sensitive to feature scaling (unlike tree-based models). StandardScaler is ESSENTIAL for SVR to work properly.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.datasets import make_regression
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

print("‚úÖ Libraries imported successfully")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

---

## üîß Part 1: SVR from Scratch (Educational)

We'll implement a **simplified Linear SVR** using gradient descent to understand the mechanics.

### üìù What's Happening in This Code?

**Purpose:** Implement Linear SVR from scratch using subgradient descent on epsilon-insensitive loss.

**Key Points:**
- **Epsilon-insensitive loss**: Computes penalty only for errors > Œµ
- **Subgradient descent**: Used because epsilon-insensitive loss is non-differentiable at boundaries
- **Regularization term**: $\frac{1}{2}||w||^2$ prevents overfitting
- **Support vectors**: Data points with non-zero gradients (outside epsilon tube)
- **C parameter**: Balances fit quality vs. model complexity (similar to Ridge/Lasso alpha)

**Why This Matters:** Understanding the optimization mechanics reveals why SVR is robust to outliers (outliers contribute limited penalty due to epsilon tube).

In [None]:
class LinearSVRScratch:
    """
    Linear Support Vector Regression from scratch.
    Uses subgradient descent on epsilon-insensitive loss.
    """
    
    def __init__(self, C=1.0, epsilon=0.1, learning_rate=0.01, n_iterations=1000):
        """
        Parameters:
        -----------
        C : float
            Penalty parameter (cost of violations)
        epsilon : float
            Width of epsilon tube (tolerance)
        learning_rate : float
            Step size for gradient descent
        n_iterations : int
            Number of training iterations
        """
        self.C = C
        self.epsilon = epsilon
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.w = None
        self.b = None
        self.loss_history = []
        
    def _compute_loss(self, X, y):
        """
        Compute epsilon-insensitive loss.
        """
        predictions = X @ self.w + self.b
        errors = np.abs(y - predictions)
        
        # Epsilon-insensitive: max(0, |error| - epsilon)
        epsilon_loss = np.maximum(0, errors - self.epsilon)
        
        # Total loss: regularization + C * epsilon_loss
        loss = 0.5 * np.dot(self.w, self.w) + self.C * np.sum(epsilon_loss)
        return loss
    
    def _compute_gradients(self, X, y):
        """
        Compute subgradients for weights and bias.
        """
        n_samples = X.shape[0]
        predictions = X @ self.w + self.b
        errors = y - predictions
        
        # Initialize gradients
        grad_w = self.w.copy()  # Regularization term
        grad_b = 0
        
        # Add epsilon-insensitive loss gradients
        for i in range(n_samples):
            abs_error = np.abs(errors[i])
            


### üìù Code Continuation (2/2)

Continuing implementation...


In [None]:
            if abs_error > self.epsilon:
                # Outside epsilon tube
                sign = np.sign(errors[i])
                grad_w -= self.C * sign * X[i]
                grad_b -= self.C * sign
        
        return grad_w, grad_b
    
    def fit(self, X, y):
        """
        Train SVR using subgradient descent.
        """
        n_samples, n_features = X.shape
        
        # Initialize parameters
        self.w = np.zeros(n_features)
        self.b = 0
        
        # Training loop
        for iteration in range(self.n_iterations):
            # Compute gradients
            grad_w, grad_b = self._compute_gradients(X, y)
            
            # Update parameters
            self.w -= self.learning_rate * grad_w
            self.b -= self.learning_rate * grad_b
            
            # Track loss
            if iteration % 100 == 0:
                loss = self._compute_loss(X, y)
                self.loss_history.append(loss)
        
        return self
    
    def predict(self, X):
        """
        Make predictions.
        """
        return X @ self.w + self.b
    
    def get_support_vectors(self, X, y):
        """
        Identify support vectors (points outside epsilon tube).
        """
        predictions = self.predict(X)
        errors = np.abs(y - predictions)
        return errors > self.epsilon

print("‚úÖ LinearSVRScratch class defined")

### Test From-Scratch Implementation

### üìù What's Happening in This Code?

**Purpose:** Validate our from-scratch SVR implementation on synthetic data with outliers.

**Key Points:**
- **Synthetic data**: Linear relationship + noise + outliers to test robustness
- **Outlier injection**: 10% of data points have extreme values (¬±3 standard deviations)
- **Feature scaling**: StandardScaler applied (critical for SVR performance)
- **Epsilon tube visualization**: Shows which points are support vectors (outside tube)
- **Performance metrics**: RMSE and R¬≤ to quantify prediction quality

**Why This Matters:** Demonstrates SVR's core advantage‚Äîoutliers have limited impact on the regression line due to epsilon-insensitive loss.

In [None]:
# Generate synthetic data with outliers
np.random.seed(42)
X_synthetic = np.linspace(0, 10, 100).reshape(-1, 1)
y_synthetic = 2 * X_synthetic.ravel() + 1 + np.random.normal(0, 1, 100)

# Add outliers (10% of data)
outlier_indices = np.random.choice(100, size=10, replace=False)
y_synthetic[outlier_indices] += np.random.choice([-1, 1], size=10) * np.random.uniform(5, 10, size=10)

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_synthetic, y_synthetic, test_size=0.2, random_state=42
)

# Scale features (CRITICAL for SVR)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train from-scratch SVR
svr_scratch = LinearSVRScratch(C=1.0, epsilon=0.5, learning_rate=0.01, n_iterations=1000)
svr_scratch.fit(X_train_scaled, y_train)

# Make predictions
y_pred_scratch = svr_scratch.predict(X_test_scaled)

# Evaluate
rmse_scratch = np.sqrt(mean_squared_error(y_test, y_pred_scratch))
r2_scratch = r2_score(y_test, y_pred_scratch)

print("\nüîß From-Scratch SVR Performance:")
print(f"RMSE: {rmse_scratch:.4f}")
print(f"R¬≤: {r2_scratch:.4f}")

# Identify support vectors
support_vectors = svr_scratch.get_support_vectors(X_train_scaled, y_train)
print(f"\nSupport Vectors: {np.sum(support_vectors)} / {len(y_train)} ({100*np.mean(support_vectors):.1f}%)")

### Visualize Epsilon Tube and Support Vectors

### üìù What's Happening in This Code?

**Purpose:** Visualize the epsilon tube concept and identify support vectors (points driving the model).

**Key Points:**
- **Epsilon tube**: Gray shaded region (¬±Œµ around regression line) where errors are ignored
- **Support vectors (red)**: Points outside the tube that contribute to the loss function
- **Regular points (blue)**: Points inside tube with zero loss
- **Robust regression**: SVR line ignores outliers better than OLS would
- **Sparse model**: Only support vectors matter for predictions (efficiency advantage)

**Why This Matters:** Visual understanding of why SVR is robust‚Äîmost data points (inside tube) don't influence the model at all.

In [None]:
# Plot epsilon tube
plt.figure(figsize=(12, 6))

# Generate smooth prediction line
X_plot = np.linspace(X_train_scaled.min(), X_train_scaled.max(), 100).reshape(-1, 1)
y_plot = svr_scratch.predict(X_plot)

# Identify support vectors on training data
y_train_pred = svr_scratch.predict(X_train_scaled)
train_errors = np.abs(y_train - y_train_pred)
support_vector_mask = train_errors > svr_scratch.epsilon

# Plot
plt.scatter(X_train_scaled[~support_vector_mask], y_train[~support_vector_mask], 
           c='blue', alpha=0.5, label='Regular Points (inside tube)', s=50)
plt.scatter(X_train_scaled[support_vector_mask], y_train[support_vector_mask], 
           c='red', marker='x', s=100, label='Support Vectors (outside tube)', linewidths=2)
plt.plot(X_plot, y_plot, 'g-', linewidth=2, label='SVR Prediction')
plt.fill_between(X_plot.ravel(), 
                 y_plot - svr_scratch.epsilon, 
                 y_plot + svr_scratch.epsilon, 
                 alpha=0.2, color='gray', label=f'Epsilon Tube (Œµ={svr_scratch.epsilon})')

plt.xlabel('Feature (scaled)', fontsize=12)
plt.ylabel('Target', fontsize=12)
plt.title('SVR: Epsilon Tube and Support Vectors', fontsize=14, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nüìä Interpretation:")
print(f"‚Ä¢ Red X markers: {np.sum(support_vector_mask)} support vectors (contribute to loss)")
print(f"‚Ä¢ Blue circles: {np.sum(~support_vector_mask)} regular points (zero loss, ignored)")
print(f"‚Ä¢ Gray band: Epsilon tube (¬±{svr_scratch.epsilon}) where errors are tolerated")

### Loss Convergence

### üìù What's Happening in This Code?

**Purpose:** Verify that our from-scratch SVR implementation converges during training.

**Key Points:**
- **Loss function**: Combination of regularization term ($\frac{1}{2}||w||^2$) and epsilon-insensitive loss
- **Convergence pattern**: Should decrease rapidly initially, then stabilize
- **Non-smooth curve**: Expected due to subgradient descent (not true gradient)
- **Learning rate impact**: If loss diverges, learning rate is too high
- **Validation**: Confirms our implementation is optimizing correctly

**Why This Matters:** Loss convergence is essential validation‚Äîif loss doesn't decrease, the implementation has bugs or hyperparameters are wrong.

In [None]:
# Plot loss convergence
plt.figure(figsize=(10, 5))
plt.plot(svr_scratch.loss_history, linewidth=2)
plt.xlabel('Iteration (x100)', fontsize=12)
plt.ylabel('Loss', fontsize=12)
plt.title('SVR Training Loss Convergence', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nüìâ Loss Analysis:")
print(f"‚Ä¢ Initial loss: {svr_scratch.loss_history[0]:.4f}")
print(f"‚Ä¢ Final loss: {svr_scratch.loss_history[-1]:.4f}")
print(f"‚Ä¢ Loss reduction: {100*(1 - svr_scratch.loss_history[-1]/svr_scratch.loss_history[0]):.1f}%")