# 037: One-Class SVM### 1. Primal Optimization Problem**Objective**: Find hyperplane that separates normal data from origin with maximum margin$$\min_{w, \rho, \xi} \frac{1}{2} \|w\|^2 - \rho + \frac{1}{\nu n} \sum_{i=1}^{n} \xi_i$$**Subject to**:$$w \cdot \phi(x_i) \geq \rho - \xi_i, \quad \xi_i \geq 0, \quad \forall i$$Where:- $w$: Normal vector to hyperplane in feature space- $\rho$: Offset from origin (larger = tighter boundary)- $\xi_i$: Slack variables (allow some training points outside boundary)- $\nu \in (0,1]$: Controls trade-off between boundary tightness and outlier tolerance- $\phi(x)$: Feature mapping to high-dimensional space### 2. Dual Formulation (Kernel Trick)**Dual Optimization** (solved in practice):$$\min_{\alpha} \frac{1}{2} \sum_{i,j} \alpha_i \alpha_j K(x_i, x_j)$$**Subject to**:$$0 \leq \alpha_i \leq \frac{1}{\nu n}, \quad \sum_{i=1}^{n} \alpha_i = 1$$Where:- $\alpha_i$: Dual variables (Lagrange multipliers)- $K(x_i, x_j) = \phi(x_i) \cdot \phi(x_j)$: Kernel function (avoids explicit $\phi$ computation)**Support Vectors**: $x_i$ with $\alpha_i > 0$### 3. Decision Function**Predict new sample** $x$:$$f(x) = \sum_{i \in SV} \alpha_i K(x, x_i) - \rho$$Where $\rho$ is computed from support vectors on the boundary ($0 < \alpha_i < \frac{1}{\nu n}$):$$\rho = \sum_{j \in SV} \alpha_j K(x_i, x_j) \quad \text{(for any margin support vector } x_i\text{)}$$**Classification**:- $f(x) \geq 0$: Normal (inside boundary)- $f(x) < 0$: Anomaly (outside boundary)### 4. Common Kernel Functions**RBF (Radial Basis Function)** - Most popular:$$K(x, x') = \exp\left(-\gamma \|x - x'\|^2\right)$$- $\gamma$: Inverse of bandwidth (larger = tighter fit)- Good for non-linear boundaries**Polynomial**:$$K(x, x') = (\gamma x \cdot x' + r)^d$$- $d$: Degree (2 = quadratic, 3 = cubic)- Good for polynomial relationships**Linear**:$$K(x, x') = x \cdot x'$$- No kernel trick (hyperplane in original space)- Fast but limited to linear boundaries**Sigmoid**:$$K(x, x') = \tanh(\gamma x \cdot x' + r)$$- Similar to neural network activation- Not always positive semi-definite (less common)### 5. Interpretation of ν (nu)**Two key properties**:1. **Upper bound on training errors**: $\leq \nu$ fraction of training points can be misclassified (outside boundary)2. **Lower bound on support vectors**: $\geq \nu$ fraction of training points will be support vectors**Typical values**:- $\nu = 0.1$: At most 10% outliers in training, tight boundary- $\nu = 0.5$: Looser boundary, tolerates more variability- **Rule of thumb**: Set $\nu \approx$ expected contamination rate

## 💻 Implementation from Scratch

### 📝 What's Happening in This Code?

**Purpose:** Build One-Class SVM from scratch using quadratic programming (CVXOPT)

**Key Points:**
- **Kernel computation**: Implement RBF, polynomial, linear kernels
- **Dual QP formulation**: Minimize $\frac{1}{2}\alpha^T K \alpha$ subject to box constraints
- **CVXOPT solver**: Convex optimization library for QP (similar to scipy.optimize)
- **Support vector identification**: $\alpha_i > 10^{-5}$ threshold for numerical stability
- **ρ computation**: Average over margin support vectors ($0 < \alpha_i < \frac{1}{\nu n}$)
- **Decision function**: $f(x) = \sum \alpha_i K(x, x_i) - \rho$

**Why This Matters:** 
- Understanding QP formulation clarifies support vector role
- Kernel functions enable non-linear boundaries without explicit feature engineering
- Implementation shows computational complexity (QP is O(n³), expensive for large n)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
import seaborn as sns
from scipy.spatial.distance import cdist

# Set style
sns.set_style('whitegrid')
np.random.seed(42)

class OneClassSVM:
    """One-Class SVM for anomaly detection (from scratch)."""
    
    def __init__(self, nu=0.1, kernel='rbf', gamma=0.1, degree=3, coef0=0.0):
        self.nu = nu
        self.kernel_type = kernel
        self.gamma = gamma
        self.degree = degree
        self.coef0 = coef0
        self.alpha = None
        self.rho = None
        self.support_vectors = None
        self.support_vector_indices = None
        self.X_train = None
        
    def _kernel(self, X1, X2):
        """Compute kernel matrix K(X1, X2)."""
        if self.kernel_type == 'linear':
            return X1 @ X2.T
        elif self.kernel_type == 'rbf':
            # RBF: exp(-gamma * ||x - x'||^2)
            sq_dists = cdist(X1, X2, 'sqeuclidean')
            return np.exp(-self.gamma * sq_dists)
        elif self.kernel_type == 'poly':
            # Polynomial: (gamma * x.x' + coef0)^degree
            return (self.gamma * (X1 @ X2.T) + self.coef0) ** self.degree
        else:
            raise ValueError(f"Unknown kernel: {self.kernel_type}")
    
    def fit(self, X):
        """Train One-Class SVM on normal data."""
        n_samples = len(X)
        self.X_train = X
        
        # Compute kernel matrix
        K = self._kernel(X, X)
        
        # Solve dual QP: min 0.5 * alpha^T K alpha
        # Subject to: 0 <= alpha_i <= 1/(nu*n), sum(alpha_i) = 1
        try:
            from cvxopt import matrix, solvers
            solvers.options['show_progress'] = False
            
            # QP formulation: min 0.5 x^T P x + q^T x
            P = matrix(K)
            q = matrix(np.zeros(n_samples))
            
            # Inequality constraints: -alpha <= 0, alpha <= 1/(nu*n)
            G = matrix(np.vstack([-np.eye(n_samples), np.eye(n_samples)]))
            h = matrix(np.hstack([np.zeros(n_samples), np.ones(n_samples) / (self.nu * n_samples)]))
            
            # Equality constraint: sum(alpha) = 1
            A = matrix(np.ones((1, n_samples)))
            b = matrix([1.0])
            
            # Solve
            solution = solvers.qp(P, q, G, h, A, b)
            self.alpha = np.array(solution['x']).flatten()
            


### 📝 Code Continuation (2/2)

Continuing implementation...


In [None]:
        except ImportError:
            # Fallback: Simple gradient descent (not optimal, for demo only)
            print("⚠️  CVXOPT not available, using simplified solver (install cvxopt for better results)")
            self.alpha = np.ones(n_samples) / n_samples  # Uniform initialization
            # Normalize to satisfy sum(alpha) = 1
            self.alpha = self.alpha / self.alpha.sum()
        
        # Identify support vectors (alpha > threshold)
        sv_threshold = 1e-5
        self.support_vector_indices = np.where(self.alpha > sv_threshold)[0]
        self.support_vectors = X[self.support_vector_indices]
        
        # Compute rho (offset) from margin support vectors
        # Margin SVs: 0 < alpha < 1/(nu*n)
        upper_bound = 1.0 / (self.nu * n_samples)
        margin_sv_mask = (self.alpha > sv_threshold) & (self.alpha < upper_bound - sv_threshold)
        
        if margin_sv_mask.sum() > 0:
            margin_indices = np.where(margin_sv_mask)[0]
            # rho = sum(alpha_j * K(x_i, x_j)) for any margin SV x_i
            K_margin = K[margin_indices[0], :]
            self.rho = np.dot(self.alpha, K_margin)
        else:
            # Fallback if no margin SVs (use all SVs)
            K_sv = K[self.support_vector_indices[0], :]
            self.rho = np.dot(self.alpha, K_sv)
        
        return self
    
    def decision_function(self, X):
        """Compute decision values f(x) = sum(alpha_i * K(x, x_i)) - rho."""
        K = self._kernel(X, self.X_train)
        return np.dot(K, self.alpha) - self.rho
    
    def predict(self, X):
        """Predict labels: 1 (normal) or -1 (anomaly)."""
        scores = self.decision_function(X)
        return np.where(scores >= 0, 1, -1)


print("✅ One-Class SVM implementation complete!")
print(f"   - Kernel functions: RBF, polynomial, linear")
print(f"   - Dual QP solver: CVXOPT (or fallback)")
print(f"   - Decision function: f(x) = Σ α_i K(x, x_i) - ρ")
print(f"   - Note: Install cvxopt for optimal results (pip install cvxopt)")

## 🧪 Test on Synthetic Data

### 📝 What's Happening in This Code?

**Purpose:** Validate from-scratch implementation with RBF kernel on 2D data

**Key Points:**
- **Data**: Single Gaussian cluster (normal) + scattered outliers
- **RBF kernel with γ=0.1**: Smooth, non-linear boundary around cluster
- **nu=0.0625**: Matches true outlier proportion (6.25%)
- **Decision boundary**: Contour plot shows f(x) = 0 (positive = normal, negative = anomaly)
- **Support vectors**: Highlighted points that define the boundary
- **Margin interpretation**: Support vectors on f(x) = 0 define tightest boundary

**Why This Matters:** 
- Visualizes non-linear boundary capability (contrast with linear classifier)
- Shows support vector sparsity (only ~10-20% of training points matter)
- Confirms boundary wraps tightly around normal cluster

In [None]:
# Generate synthetic data
X_normal, _ = make_blobs(n_samples=300, centers=1, cluster_std=1.0, random_state=42)
X_anomalies = np.random.uniform(low=-6, high=6, size=(20, 2))
X = np.vstack([X_normal, X_anomalies])
y_true = np.array([1]*300 + [-1]*20)

# Fit One-Class SVM (from scratch)
ocsvm = OneClassSVM(nu=0.0625, kernel='rbf', gamma=0.1)
ocsvm.fit(X_normal)  # Train only on normal data

# Predict on all data (including anomalies)
y_pred = ocsvm.predict(X)
scores = ocsvm.decision_function(X)

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Decision boundary
ax = axes[0]
xx, yy = np.meshgrid(np.linspace(-8, 8, 200), np.linspace(-8, 8, 200))
X_grid = np.c_[xx.ravel(), yy.ravel()]
Z = ocsvm.decision_function(X_grid).reshape(xx.shape)

# Contour plot
contour = ax.contourf(xx, yy, Z, levels=20, cmap='RdYlBu', alpha=0.6)
ax.contour(xx, yy, Z, levels=[0], linewidths=3, colors='black', label='Decision Boundary')
plt.colorbar(contour, ax=ax, label='Decision Function f(x)')

# Data points
ax.scatter(X_normal[:, 0], X_normal[:, 1], c='blue', s=30, alpha=0.6, label='Normal (training)')
ax.scatter(X_anomalies[:, 0], X_anomalies[:, 1], c='red', s=30, alpha=0.6, label='Anomalies (test)')

# Support vectors
if ocsvm.support_vectors is not None:
    ax.scatter(ocsvm.support_vectors[:, 0], ocsvm.support_vectors[:, 1],
               s=150, facecolors='none', edgecolors='green', linewidths=2,
               label=f'Support Vectors ({len(ocsvm.support_vectors)})')

ax.set_xlabel('Feature 1')
ax.set_ylabel('Feature 2')
ax.set_title('One-Class SVM: Decision Boundary\n(RBF kernel, γ=0.1, ν=0.0625)')
ax.legend()
ax.grid(True, alpha=0.3)

# Plot 2: Score distribution
ax = axes[1]
ax.hist(scores[y_true == 1], bins=30, alpha=0.6, label='Normal', color='blue')
ax.hist(scores[y_true == -1], bins=30, alpha=0.6, label='Anomalies', color='red')
ax.axvline(0, color='green', linestyle='--', linewidth=2, label='Decision Threshold (f(x)=0)')
ax.set_xlabel('Decision Function f(x)')
ax.set_ylabel('Frequency')
ax.set_title('Decision Score Distribution\n(f(x) ≥ 0 → Normal, f(x) < 0 → Anomaly)')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Performance
from sklearn.metrics import classification_report, confusion_matrix
print("\n📊 Classification Report:")
print(classification_report(y_true, y_pred, target_names=['Normal', 'Anomaly']))
print("\nConfusion Matrix:")
print(confusion_matrix(y_true, y_pred))
print(f"\n✅ Support vectors: {len(ocsvm.support_vector_indices)} / {len(X_normal)} "
      f"({100*len(ocsvm.support_vector_indices)/len(X_normal):.1f}%)")
print(f"   Rho (offset): {ocsvm.rho:.4f}")

## 🏭 Post-Silicon Application: Novelty Detection in Test Data

### 📝 What's Happening in This Code?

**Purpose:** Train on known-good devices, detect novel defect patterns at test

**Key Points:**
- **Training data**: Only normal devices (Vdd, Idd, Frequency from qualified lots)
- **Test data**: Mix of normal + novel defects (high leakage, frequency outliers)
- **RBF kernel**: Captures non-linear correlations (e.g., Idd vs Frequency coupling)
- **nu=0.05**: Tight boundary (expect <5% novelties in production)
- **Novelty interpretation**: Device behaviors outside trained normal region
- **Business value**: Flag unexpected defects for engineering investigation

**Why This Matters:** 
- Real-world test: Training data = baseline lot, test data = new production lots
- Catches process changes not covered by specification limits
- Enables proactive defect discovery vs reactive customer returns

In [None]:
# Generate semiconductor test data
np.random.seed(42)

# Training: Known-good devices (qualified lot)
n_train = 500
vdd_train = np.random.normal(1.8, 0.05, n_train)
idd_train = np.random.normal(0.5, 0.08, n_train)
freq_train = np.random.normal(3.2, 0.15, n_train)
X_train_psv = np.column_stack([vdd_train, idd_train, freq_train])

# Test: Normal devices + novel defects
n_test_normal = 200
vdd_test_normal = np.random.normal(1.8, 0.05, n_test_normal)
idd_test_normal = np.random.normal(0.5, 0.08, n_test_normal)
freq_test_normal = np.random.normal(3.2, 0.15, n_test_normal)

n_test_novel = 20  # Novel defect signatures
vdd_test_novel = np.random.uniform(1.6, 2.1, n_test_novel)
idd_test_novel = np.random.uniform(0.7, 1.2, n_test_novel)  # High leakage
freq_test_novel = np.random.uniform(2.5, 2.9, n_test_novel)  # Low frequency

X_test_psv = np.vstack([
    np.column_stack([vdd_test_normal, idd_test_normal, freq_test_normal]),
    np.column_stack([vdd_test_novel, idd_test_novel, freq_test_novel])
])
y_true_psv = np.array([1]*n_test_normal + [-1]*n_test_novel)
device_ids_psv = np.arange(1, len(X_test_psv) + 1)

# Fit One-Class SVM (train on normal only)
ocsvm_psv = OneClassSVM(nu=0.05, kernel='rbf', gamma=0.5)
ocsvm_psv.fit(X_train_psv)

# Predict novelties
y_pred_psv = ocsvm_psv.predict(X_test_psv)
scores_psv = ocsvm_psv.decision_function(X_test_psv)

# 3D Visualization
fig = plt.figure(figsize=(14, 6))

# Plot 1: 3D scatter with decision boundary indication
ax = fig.add_subplot(121, projection='3d')
normal_mask = y_pred_psv == 1
novel_mask = y_pred_psv == -1

ax.scatter(X_train_psv[:, 0], X_train_psv[:, 1], X_train_psv[:, 2],
           c='lightblue', s=20, alpha=0.3, label='Training (normal)')
ax.scatter(X_test_psv[normal_mask, 0], X_test_psv[normal_mask, 1], X_test_psv[normal_mask, 2],
           c='blue', s=50, alpha=0.7, label='Test: Predicted Normal')
ax.scatter(X_test_psv[novel_mask, 0], X_test_psv[novel_mask, 1], X_test_psv[novel_mask, 2],
           c='red', marker='x', s=200, linewidths=3, label='Test: Predicted Novelty')

# Support vectors from training
if ocsvm_psv.support_vectors is not None:
    ax.scatter(ocsvm_psv.support_vectors[:, 0], 
               ocsvm_psv.support_vectors[:, 1], 
               ocsvm_psv.support_vectors[:, 2],
               s=100, facecolors='none', edgecolors='green', linewidths=2,
               label=f'Support Vectors ({len(ocsvm_psv.support_vectors)})')



### 📝 Code Continuation (2/2)

Continuing implementation...


In [None]:
ax.set_xlabel('Vdd (V)')
ax.set_ylabel('Idd (A)')
ax.set_zlabel('Frequency (GHz)')
ax.set_title('Novelty Detection: Parametric Test\n(Trained on Known-Good Devices)')
ax.legend(fontsize=8)

# Plot 2: Decision score distribution
ax2 = fig.add_subplot(122)
ax2.hist(scores_psv[y_true_psv == 1], bins=30, alpha=0.6, label='True Normal', color='blue')
ax2.hist(scores_psv[y_true_psv == -1], bins=30, alpha=0.6, label='True Novelty', color='red')
ax2.axvline(0, color='green', linestyle='--', linewidth=2, label='Threshold (f(x)=0)')
ax2.set_xlabel('Decision Function f(x)')
ax2.set_ylabel('Number of Devices')
ax2.set_title('Novelty Score Distribution\n(Negative = Novel/Anomalous)')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Report novel devices
novel_indices = np.where(y_pred_psv == -1)[0]
print("\n⚠️  Novel Devices Detected (Outside Normal Boundary):")
print(f"   Total flagged: {len(novel_indices)} / {len(X_test_psv)} ({100*len(novel_indices)/len(X_test_psv):.1f}%)")
print("\n   Top 5 Most Novel Devices (most negative f(x)):")
top5_novel = np.argsort(scores_psv)[:5]
for idx in top5_novel:
    print(f"   Device {device_ids_psv[idx]:04d}: Vdd={X_test_psv[idx,0]:.3f}V, "
          f"Idd={X_test_psv[idx,1]:.3f}A, Freq={X_test_psv[idx,2]:.3f}GHz, f(x)={scores_psv[idx]:.3f}")

print("\n📊 Novelty Detection Performance:")
print(classification_report(y_true_psv, y_pred_psv, target_names=['Normal', 'Novelty']))

## 🔧 Production Implementation with Scikit-Learn

### 📝 What's Happening in This Code?

**Purpose:** Use sklearn.svm.OneClassSVM for optimized production deployment

**Key Points:**
- **sklearn advantages**:
  - LIBSVM backend (C++ implementation, 10-100x faster)
  - Automatic kernel caching for repeated evaluations
  - Sparse matrix support for high-dimensional data
  - Standardized API with other sklearn models
- **Hyperparameter tuning**: GridSearchCV on nu and gamma
- **gamma='scale'**: Automatic γ = 1/(n_features × var(X))
- **Comparison with from-scratch**: Validate implementation correctness

**Why This Matters:** 
- Production requires speed (process thousands of devices per minute)
- Sklearn integrates with pipelines, cross-validation, model persistence
- Same mathematical foundation, enterprise-grade implementation

In [None]:
from sklearn.svm import OneClassSVM as SklearnOneClassSVM
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_auc_score, roc_curve

# Build pipeline with scaling (important for SVM)
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('ocsvm', SklearnOneClassSVM(nu=0.05, kernel='rbf', gamma='scale'))
])

# Fit on training data
pipeline.fit(X_train_psv)

# Predict on test data
y_pred_sklearn = pipeline.predict(X_test_psv)
scores_sklearn = pipeline.decision_function(X_test_psv)

# Compare implementations
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Score comparison
ax = axes[0]
ax.scatter(scores_psv, scores_sklearn, alpha=0.6, s=40)
ax.plot([scores_psv.min(), scores_psv.max()], 
        [scores_psv.min(), scores_psv.max()], 
        'r--', linewidth=2, label='Perfect Agreement')
ax.set_xlabel('From-Scratch Decision Score')
ax.set_ylabel('Sklearn Decision Score')
ax.set_title('Score Comparison: From-Scratch vs Sklearn\n(Correlation indicates correctness)')
ax.legend()
ax.grid(True, alpha=0.3)

# Plot 2: ROC curves
ax = axes[1]
y_true_binary = (y_true_psv == -1).astype(int)  # 1=novel, 0=normal for ROC

# From-scratch ROC
fpr1, tpr1, _ = roc_curve(y_true_binary, -scores_psv)  # Negate for "higher = anomalous"
auc1 = roc_auc_score(y_true_binary, -scores_psv)
ax.plot(fpr1, tpr1, linewidth=2, label=f'From-Scratch (AUC={auc1:.3f})')

# Sklearn ROC
fpr2, tpr2, _ = roc_curve(y_true_binary, -scores_sklearn)
auc2 = roc_auc_score(y_true_binary, -scores_sklearn)
ax.plot(fpr2, tpr2, linewidth=2, label=f'Sklearn (AUC={auc2:.3f})')

ax.plot([0, 1], [0, 1], 'k--', linewidth=1, label='Random')
ax.set_xlabel('False Positive Rate')
ax.set_ylabel('True Positive Rate')
ax.set_title('ROC Curve: Novelty Detection Performance')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n🔬 Implementation Comparison:")
print(f"   From-Scratch: {(y_pred_psv == -1).sum()} novelties detected, AUC={auc1:.3f}")
print(f"   Sklearn:      {(y_pred_sklearn == -1).sum()} novelties detected, AUC={auc2:.3f}")
print(f"   Agreement:    {(y_pred_psv == y_pred_sklearn).mean()*100:.1f}% predictions match")

# Extract sklearn model info
sklearn_model = pipeline.named_steps['ocsvm']
print(f"\n📊 Sklearn Model Details:")
print(f"   Support vectors: {len(sklearn_model.support_)} / {len(X_train_psv)} "
      f"({100*len(sklearn_model.support_)/len(X_train_psv):.1f}%)")
print(f"   Intercept (rho): {sklearn_model.intercept_[0]:.4f}")
print(f"   Dual coefficients: {sklearn_model.dual_coef_.shape}")

## 📊 Kernel Comparison: Linear vs RBF vs Polynomial

### 📝 What's Happening in This Code?

**Purpose:** Compare different kernels on same dataset to understand boundary shapes

**Key Points:**
- **Linear kernel**: Hyperplane boundary (fast, interpretable, limited to linear separability)
- **RBF kernel**: Smooth, circular/elliptical boundaries (most flexible, default choice)
- **Polynomial kernel**: Angular boundaries (good for polynomial relationships, degree sensitive)
- **Visualization**: Side-by-side decision boundaries show kernel impact
- **Performance**: RBF typically best for non-linear data, linear for simple distributions

**Why This Matters:** 
- Kernel choice dramatically affects boundary shape
- No universal best kernel (depends on data geometry)
- RBF safest default (smooth, flexible), tune γ carefully

In [None]:
from sklearn.svm import OneClassSVM as SklearnOneClassSVM

# Use 2D data for visualization
X_kernel_compare = X_normal  # Train on normal cluster only

# Different kernels
kernels = [
    ('Linear', {'kernel': 'linear', 'nu': 0.05}),
    ('RBF (γ=0.1)', {'kernel': 'rbf', 'gamma': 0.1, 'nu': 0.05}),
    ('RBF (γ=0.5)', {'kernel': 'rbf', 'gamma': 0.5, 'nu': 0.05}),
    ('Polynomial (d=2)', {'kernel': 'poly', 'degree': 2, 'gamma': 'scale', 'nu': 0.05}),
    ('Polynomial (d=3)', {'kernel': 'poly', 'degree': 3, 'gamma': 'scale', 'nu': 0.05}),
]

# Train and visualize
fig, axes = plt.subplots(2, 3, figsize=(18, 11))
axes = axes.flatten()

for idx, (name, params) in enumerate(kernels):
    ax = axes[idx]
    
    # Fit model
    model = SklearnOneClassSVM(**params)
    model.fit(X_kernel_compare)
    
    # Create decision boundary grid
    xx, yy = np.meshgrid(np.linspace(-4, 4, 200), np.linspace(-4, 4, 200))
    X_grid = np.c_[xx.ravel(), yy.ravel()]
    Z = model.decision_function(X_grid).reshape(xx.shape)
    
    # Plot
    contour = ax.contourf(xx, yy, Z, levels=20, cmap='RdYlBu', alpha=0.6)
    ax.contour(xx, yy, Z, levels=[0], linewidths=3, colors='black')
    ax.scatter(X_kernel_compare[:, 0], X_kernel_compare[:, 1], 
               c='blue', s=30, alpha=0.6, label='Training Data')
    
    # Support vectors
    sv = X_kernel_compare[model.support_]
    ax.scatter(sv[:, 0], sv[:, 1], s=150, facecolors='none', 
               edgecolors='green', linewidths=2, label=f'SVs ({len(sv)})')
    
    # Test anomalies
    y_pred_test = model.predict(X_anomalies)
    detected = (y_pred_test == -1).sum()
    ax.scatter(X_anomalies[:, 0], X_anomalies[:, 1], 
               c='red', marker='x', s=100, linewidths=2, 
               label=f'Anomalies ({detected}/{len(X_anomalies)} detected)')
    
    ax.set_title(f'{name}\nSupport Vectors: {len(sv)}')
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')
    ax.legend(fontsize=8)
    ax.grid(True, alpha=0.3)
    plt.colorbar(contour, ax=ax, label='f(x)')

# Hide unused subplot
axes[-1].axis('off')

plt.tight_layout()
plt.show()

print("\n🔍 Kernel Comparison Insights:")
print("   - Linear: Fast, interpretable, but rigid (hyperplane only)")
print("   - RBF (γ=0.1): Smooth, gentle boundary (good for scattered data)")
print("   - RBF (γ=0.5): Tighter boundary (risk of overfitting to training cluster)")
print("   - Polynomial (d=2): Quadratic boundary (good for elliptical clusters)")
print("   - Polynomial (d=3): More complex boundary (risk of overfitting)")
print("\n💡 Recommendation: Start with RBF (γ='scale' or 0.1-0.5), tune via cross-validation")

## 🎯 Real-World Project Ideas

### Post-Silicon Validation Projects

1. **Process Baseline Drift Monitor** 💰 $10M+ Yield Protection
   - **Objective**: Train OCSVM on qualified baseline lot, detect process drift in new lots
   - **Features**: 15+ parametric tests (Vdd, Idd, freq, leakage, timing margins)
   - **Success Metric**: Detect 2σ process shifts within 1 wafer (before yield impact)
   - **Implementation**: Per-product OCSVM models, retrain monthly, alert on novelty clusters
   - **Business Value**: Early warning for fab process excursions (etch, litho, implant drift)

2. **Multi-Product Test Coverage Analyzer** 💰 $5M+ Quality Improvement
   - **Objective**: Identify test coverage gaps by flagging novel device behaviors
   - **Features**: All parametric + functional test results (50+ features)
   - **Success Metric**: Discover 5+ uncovered defect modes per product generation
   - **Implementation**: Train on high-volume baseline, analyze novelties for new test development
   - **Business Value**: Improve test escape rate, reduce DPPM (defects per million)

3. **Counterfeit Component Detector** 💰 $20M+ Brand Protection
   - **Objective**: Detect counterfeit/remarked chips at incoming inspection
   - **Features**: Electrical signature (Vdd-Idd curve, frequency vs voltage, leakage profile)
   - **Success Metric**: 98% counterfeit detection, <1% false reject of genuine parts
   - **Implementation**: Train on authentic components, flag novelties for destructive analysis
   - **Business Value**: Protect supply chain, prevent customer failures from fakes

4. **Equipment Health Anomaly System** 💰 $8M+ Unscheduled Downtime Prevention
   - **Objective**: Detect tester/handler/prober degradation via device test signature changes
   - **Features**: Per-equipment test time, parametric trends, spatial patterns on wafer
   - **Success Metric**: Predict equipment failure 48 hours in advance, 85% accuracy
   - **Implementation**: Per-tool OCSVM, train on healthy baseline, alert on drift
   - **Business Value**: Proactive PM scheduling, avoid wafer scraps from bad tooling

### General AI/ML Projects

5. **Cybersecurity Intrusion Detection** 💰 $50M+ Breach Prevention
   - **Objective**: Detect zero-day attacks by modeling normal network behavior
   - **Features**: Packet size, protocol, port, session duration, payload entropy, connection patterns
   - **Success Metric**: 95% novel attack detection, <0.5% false positive rate
   - **Implementation**: OCSVM per network segment, train on verified clean traffic, retrain weekly
   - **Business Value**: Detect unknown threats, reduce mean-time-to-detect (MTTD)

6. **Medical Diagnosis Rare Disease Screening** 💰 $100M+ Healthcare Savings
   - **Objective**: Flag patients with rare disease biomarkers for specialist referral
   - **Features**: Lab test panel (blood count, metabolic panel, genetic markers, imaging features)
   - **Success Metric**: 90% rare disease detection, 80% specificity
   - **Implementation**: Train on healthy population, detect novelties for follow-up testing
   - **Business Value**: Early diagnosis of rare diseases (e.g., cancer, genetic disorders)

7. **Manufacturing Quality Control** 💰 $15M+ Defect Reduction
   - **Objective**: Detect novel defect patterns in product assembly (automotive, aerospace)
   - **Features**: Vision system features (shape, texture, color), dimensional measurements, torque sensors
   - **Success Metric**: 99% defect detection, <2% false alarm rate
   - **Implementation**: OCSVM on known-good units, flag novelties for manual inspection
   - **Business Value**: Reduce warranty claims, improve six-sigma quality

8. **Financial Transaction Fraud Novelty Detection** 💰 $200M+ Fraud Loss Prevention
   - **Objective**: Detect novel fraud patterns not covered by rule-based systems
   - **Features**: Transaction amount, merchant category, location, time, user device fingerprint
   - **Success Metric**: 75% novel fraud detection (complement existing rules), 0.5% FPR
   - **Implementation**: OCSVM per user profile, train on verified legitimate transactions
   - **Business Value**: Catch emerging fraud tactics before rules updated

## 🔍 Key Takeaways

### ✅ When to Use One-Class SVM
- **Novelty detection**: Training data = only normal class, need to detect unseen anomalies
- **Non-linear boundaries**: Complex data distributions requiring flexible decision boundary
- **Small to medium datasets** (<100K samples): QP solver scales O(n²) to O(n³)
- **Need interpretable boundary**: Support vectors show which training points define normal region
- **Robust to outliers in training**: nu parameter allows some contamination

### ❌ Limitations
- **Slow training**: Quadratic programming expensive for large n (use SGDOneClassSVM for >100K samples)
- **Kernel/nu selection**: Requires careful hyperparameter tuning (more sensitive than Isolation Forest)
- **Memory intensive**: Kernel matrix K(n×n) for training, support vectors for inference
- **Assumes single normal cluster**: Struggles with multi-modal normal distributions (use multiple OCSVMs)
- **No probabilistic output**: Decision function f(x) not calibrated probability

### 🔧 Hyperparameter Tuning Guidelines
1. **nu** (ν, most critical):
   - Interpretation: Upper bound on training errors, lower bound on support vectors
   - Start with expected outlier rate (e.g., 0.05 for 5% contamination)
   - Too small: Overfit to training data (rejects too many test samples)
   - Too large: Loose boundary (misses anomalies)
   - Typical range: 0.01 to 0.2

2. **kernel**:
   - **RBF**: Default choice (smooth, flexible, works for most data)
   - **Linear**: Fast, interpretable (if data linearly separable)
   - **Polynomial**: Good for polynomial feature relationships (tune degree)
   - Try RBF first, fallback to linear if too slow

3. **gamma** (γ, for RBF/poly kernels):
   - Controls kernel width (inverse of bandwidth)
   - Larger γ: Tighter fit to training (risk overfitting)
   - Smaller γ: Smoother boundary (risk underfitting)
   - Use 'scale' (automatic: 1/(n_features × var(X))) as starting point
   - Tune via cross-validation on contaminated validation set

4. **Preprocessing**:
   - **Always standardize** (StandardScaler) for SVM (distance-based)
   - Consider PCA for high dimensions (>50 features) to reduce kernel computation

### 🔬 One-Class SVM vs Isolation Forest

| Aspect | One-Class SVM | Isolation Forest |
|--------|---------------|------------------|
| **Approach** | Boundary-based (hyperplane) | Isolation-based (path length) |
| **Speed** | O(n²) training (slow for large n) | O(n log n) (much faster) |
| **Boundary** | Smooth, kernel-defined | Irregular, tree-based |
| **Hyperparameters** | nu, kernel, gamma (sensitive) | contamination, n_estimators (robust) |
| **High dimensions** | Kernel trick helps, but memory costly | No distance metric, scales well |
| **Novelty detection** | ✅ Excellent (train on normal only) | ⚠️ Assumes some anomalies in training |
| **Interpretability** | Support vectors show boundary | Path lengths show isolation ease |
| **Best for** | Novelty, non-linear, interpretable | Large data, high-D, speed-critical |

### 🚀 Next Steps
1. **Notebook 038**: AutoEncoders for deep learning-based anomaly detection
2. **Advanced topic**: Ensemble (OCSVM + Isolation Forest) for robust detection
3. **Kernel engineering**: Custom kernels for domain-specific similarity (e.g., graph kernels)
4. **Online learning**: Incremental OCSVM for streaming data (approximate kernel methods)

### 📚 Further Reading
- Schölkopf et al. (2001): "Estimating the Support of a High-Dimensional Distribution" - Original OCSVM paper
- Tax & Duin (2004): "Support Vector Data Description" (SVDD) - Alternative formulation
- Kernel methods for anomaly detection in time series
- SGDOneClassSVM for large-scale data (linear kernel only)