# 038: Autoencoders for Anomaly Detection

### Architecture

**Encoder**: $z = f_{enc}(x; \theta_{enc})$ maps input $x \in \mathbb{R}^d$ to latent $z \in \mathbb{R}^k$ (k << d)

**Decoder**: $\hat{x} = f_{dec}(z; \theta_{dec})$ reconstructs from latent space

**Training Loss** (MSE):
$$L = \frac{1}{n}\sum_{i=1}^{n} ||x_i - \hat{x}_i||^2$$

**Anomaly Score**:
$$s(x) = ||x - f_{dec}(f_{enc}(x))||^2$$

Normal points: low reconstruction error  
Anomalies: high reconstruction error (didn't learn these patterns)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, roc_auc_score, roc_curve

sns.set_style('whitegrid')
np.random.seed(42)
torch.manual_seed(42)

class AutoEncoder(nn.Module):
    def __init__(self, input_dim, latent_dim=8):
        super(AutoEncoder, self).__init__()
        # Encoder
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 32),
            nn.ReLU(),
            nn.Linear(32, 16),
            nn.ReLU(),
            nn.Linear(16, latent_dim)
        )
        # Decoder
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 16),
            nn.ReLU(),
            nn.Linear(16, 32),
            nn.ReLU(),
            nn.Linear(32, input_dim)
        )
    
    def forward(self, x):
        z = self.encoder(x)
        x_recon = self.decoder(z)
        return x_recon

print("‚úÖ AutoEncoder architecture defined")

In [None]:
# Generate data
X_normal, _ = make_blobs(n_samples=1000, centers=1, n_features=10, random_state=42)
X_anomalies = np.random.uniform(low=-8, high=8, size=(100, 10))

# Standardize
scaler = StandardScaler()
X_normal_scaled = scaler.fit_transform(X_normal)
X_anomalies_scaled = scaler.transform(X_anomalies)

# Train autoencoder on normal data only
X_train_tensor = torch.FloatTensor(X_normal_scaled)
model = AutoEncoder(input_dim=10, latent_dim=3)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
epochs = 50
losses = []
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    recon = model(X_train_tensor)
    loss = criterion(recon, X_train_tensor)
    loss.backward()
    optimizer.step()
    losses.append(loss.item())

plt.figure(figsize=(8, 4))
plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.title('Training Loss')
plt.grid(True, alpha=0.3)
plt.show()

print(f"‚úÖ Training complete. Final loss: {losses[-1]:.4f}")

In [None]:
# Compute reconstruction errors
model.eval()
with torch.no_grad():
    X_test = np.vstack([X_normal_scaled, X_anomalies_scaled])
    X_test_tensor = torch.FloatTensor(X_test)
    recon_test = model(X_test_tensor).numpy()
    errors = np.mean((X_test - recon_test)**2, axis=1)

y_true = np.array([1]*len(X_normal_scaled) + [-1]*len(X_anomalies_scaled))

# Set threshold (95th percentile of normal errors)
threshold = np.percentile(errors[:len(X_normal_scaled)], 95)
y_pred = np.where(errors > threshold, -1, 1)

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

ax = axes[0]
ax.hist(errors[:len(X_normal_scaled)], bins=30, alpha=0.6, label='Normal', color='blue')
ax.hist(errors[len(X_normal_scaled):], bins=30, alpha=0.6, label='Anomalies', color='red')
ax.axvline(threshold, color='green', linestyle='--', linewidth=2, label=f'Threshold={threshold:.3f}')
ax.set_xlabel('Reconstruction Error')
ax.set_ylabel('Frequency')
ax.set_title('AutoEncoder: Reconstruction Error Distribution')
ax.legend()
ax.grid(True, alpha=0.3)

ax = axes[1]
y_true_binary = (y_true == -1).astype(int)
fpr, tpr, _ = roc_curve(y_true_binary, errors)
auc = roc_auc_score(y_true_binary, errors)
ax.plot(fpr, tpr, linewidth=2, label=f'AUC={auc:.3f}')
ax.plot([0, 1], [0, 1], 'k--', linewidth=1)
ax.set_xlabel('False Positive Rate')
ax.set_ylabel('True Positive Rate')
ax.set_title('ROC Curve')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìä Performance:")
print(classification_report(y_true, y_pred, target_names=['Normal', 'Anomaly']))
print(f"\nAUC: {auc:.3f}")

## üè≠ Semiconductor Application

### üìù High-Dimensional Parametric Test Anomaly Detection

In [None]:
# Generate realistic semiconductor test data (20 parameters)
np.random.seed(42)
n_params = 20
n_normal = 2000
n_anomaly = 100

# Normal devices: correlated parameters
mean_normal = np.random.uniform(1.5, 2.5, n_params)
cov_normal = np.eye(n_params) * 0.05
X_psv_normal = np.random.multivariate_normal(mean_normal, cov_normal, n_normal)

# Anomalous devices: parameter drift
X_psv_anomaly = X_psv_normal[:n_anomaly].copy()
X_psv_anomaly[:, :5] += np.random.uniform(0.5, 1.5, (n_anomaly, 5))  # Drift in first 5 params

# Scale
scaler_psv = StandardScaler()
X_psv_normal_scaled = scaler_psv.fit_transform(X_psv_normal)
X_psv_anomaly_scaled = scaler_psv.transform(X_psv_anomaly)

# Train autoencoder
X_train_psv = torch.FloatTensor(X_psv_normal_scaled)
model_psv = AutoEncoder(input_dim=n_params, latent_dim=5)
criterion = nn.MSELoss()
optimizer = optim.Adam(model_psv.parameters(), lr=0.001)

for epoch in range(100):
    model_psv.train()
    optimizer.zero_grad()
    recon = model_psv(X_train_psv)
    loss = criterion(recon, X_train_psv)
    loss.backward()
    optimizer.step()

# Test
model_psv.eval()
with torch.no_grad():
    X_test_psv = np.vstack([X_psv_normal_scaled[:500], X_psv_anomaly_scaled])
    X_test_psv_tensor = torch.FloatTensor(X_test_psv)
    recon_psv = model_psv(X_test_psv_tensor).numpy()
    errors_psv = np.mean((X_test_psv - recon_psv)**2, axis=1)

y_true_psv = np.array([1]*500 + [-1]*n_anomaly)
threshold_psv = np.percentile(errors_psv[:500], 95)
y_pred_psv = np.where(errors_psv > threshold_psv, -1, 1)

# Visualize
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.hist(errors_psv[:500], bins=30, alpha=0.6, label='Normal Devices', color='blue')
plt.hist(errors_psv[500:], bins=30, alpha=0.6, label='Anomalous Devices', color='red')
plt.axvline(threshold_psv, color='green', linestyle='--', linewidth=2, label=f'Threshold')
plt.xlabel('Reconstruction Error')
plt.ylabel('Number of Devices')
plt.title('Parametric Test Anomaly Detection (20 Parameters)')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
# Feature importance (average reconstruction error per feature)
feature_errors = np.mean((X_test_psv[500:] - recon_psv[500:])**2, axis=0)
plt.bar(range(n_params), feature_errors, color='coral')
plt.xlabel('Parameter Index')
plt.ylabel('Avg Reconstruction Error')
plt.title('Parameter-wise Anomaly Contribution')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n‚ö†Ô∏è Anomalous Devices Detected:")
print(f"   Total: {(y_pred_psv == -1).sum()} / {len(X_test_psv)}")
print("\nüìä Performance:")
print(classification_report(y_true_psv, y_pred_psv, target_names=['Normal', 'Anomaly']))

## üéØ Project Ideas

### Post-Silicon Projects

1. **Wafer Map Anomaly Detector** üí∞ $8M+ Yield Improvement
   - Train on normal wafer spatial patterns, detect systematic defects
   - Features: 2D die coordinates + parametric test values
   - Business: Early fab process issue detection

2. **Multi-Site Test Correlation Monitor** üí∞ $12M+ Quality
   - 50+ parametric tests, detect novel failure modes
   - AutoEncoder learns normal parameter correlations
   - Business: Improve test coverage, reduce escapes

3. **Time-Series Waveform Anomalies** üí∞ $5M+ Debug Time
   - LSTM AutoEncoder on test waveforms
   - Detect subtle signal integrity issues
   - Business: Faster failure analysis

4. **Cross-Product Defect Discovery** üí∞ $15M+ Portfolio
   - Train per-product autoencoders
   - Transfer learning for new products
   - Business: Accelerate new product ramp

### General Projects

5. **Network Intrusion Detection** üí∞ $30M+ Security
6. **Medical Image Anomalies** üí∞ $100M+ Healthcare
7. **Industrial Sensor Monitoring** üí∞ $20M+ Downtime
8. **Financial Transaction Fraud** üí∞ $150M+ Fraud Prevention

## üîç Key Takeaways

### ‚úÖ When to Use AutoEncoders
- **High-dimensional data** (>50 features): Learns compressed representations
- **Complex patterns**: Captures non-linear correlations via deep networks
- **Unlabeled data**: Unsupervised, trains on normal only
- **Feature learning**: Automatic feature extraction (no manual engineering)

### ‚ùå Limitations
- **Training time**: Requires GPU for large datasets
- **Hyperparameters**: Architecture, latent dim, learning rate tuning needed
- **Overfitting risk**: May memorize training data (use regularization)
- **Black box**: Less interpretable than tree-based methods

### üîß Best Practices
1. **Always standardize** inputs (zero mean, unit variance)
2. **Latent dimension**: Start with d/4 to d/2 (compression ratio 2-4x)
3. **Threshold**: 95th-99th percentile of training errors
4. **Validation**: Use contaminated validation set to tune threshold
5. **Regularization**: Dropout, L2 weight decay to prevent overfitting

### üìä Comparison

| Method | Speed | High-D | Interpretability | Best For |
|--------|-------|--------|------------------|----------|
| **AutoEncoder** | Slow train, fast inference | ‚úÖ Excellent | ‚ùå Low | Complex patterns, images |
| **Isolation Forest** | ‚úÖ Fast | ‚úÖ Good | ‚ö†Ô∏è Medium | Large data, speed |
| **One-Class SVM** | ‚ùå Slow | ‚ö†Ô∏è Medium | ‚úÖ Good | Novelty, small data |

### üöÄ Next Steps
- Variational AutoEncoders (VAE) for probabilistic anomaly scores
- LSTM AutoEncoders for time-series anomalies
- Convolutional AutoEncoders for image/wafer map anomalies

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc, precision_recall_curve, confusion_matrix
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Generate synthetic semiconductor test data with anomalies
np.random.seed(42)

# Normal data (95% of dataset)
n_normal = 1900
n_features = 15
X_normal = np.random.randn(n_normal, n_features)

# Anomalous data (5% of dataset) - shifted distribution
n_anomalies = 100
X_anomalies = np.random.randn(n_anomalies, n_features) + 3.0  # Shifted mean

# Combine and create labels
X = np.vstack([X_normal, X_anomalies])
y = np.concatenate([np.zeros(n_normal), np.ones(n_anomalies)])

# Shuffle
shuffle_idx = np.random.permutation(len(X))
X, y = X[shuffle_idx], y[shuffle_idx]

# Normalize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split: train only on normal data
train_idx = int(0.8 * n_normal)
X_train = X_scaled[y == 0][:train_idx]  # Only normal data
X_test = X_scaled
y_test = y

print("üéØ Advanced Anomaly Detection with Autoencoders")
print("=" * 80)
print(f"Normal samples: {n_normal} ({n_normal/(n_normal+n_anomalies)*100:.1f}%)")
print(f"Anomalous samples: {n_anomalies} ({n_anomalies/(n_normal+n_anomalies)*100:.1f}%)")
print(f"Training on: {len(X_train)} normal samples only")
print(f"Testing on: {len(X_test)} mixed samples\n")

# Build autoencoder for anomaly detection
input_dim = n_features
encoding_dim = 6

autoencoder = keras.Sequential([
    layers.Input(shape=(input_dim,)),
    layers.Dense(32, activation='relu'),
    layers.Dense(16, activation='relu'),
    layers.Dense(encoding_dim, activation='relu', name='bottleneck'),
    layers.Dense(16, activation='relu'),
    layers.Dense(32, activation='relu'),
    layers.Dense(input_dim, activation='linear')
], name='anomaly_detector')

autoencoder.compile(optimizer='adam', loss='mse')

# Train on normal data only
history = autoencoder.fit(
    X_train, X_train,
    epochs=100,
    batch_size=32,
    validation_split=0.2,
    verbose=0
)

print("‚úÖ Model trained successfully")
print(f"   Final training loss: {history.history['loss'][-1]:.6f}")
print(f"   Final validation loss: {history.history['val_loss'][-1]:.6f}")

# Compute reconstruction errors
X_test_pred = autoencoder.predict(X_test, verbose=0)
reconstruction_errors = np.mean((X_test - X_test_pred) ** 2, axis=1)

print(f"\nüìä Reconstruction Error Statistics:")
normal_errors = reconstruction_errors[y_test == 0]
anomaly_errors = reconstruction_errors[y_test == 1]

print(f"   Normal - Mean: {np.mean(normal_errors):.6f}, Std: {np.std(normal_errors):.6f}")
print(f"   Anomaly - Mean: {np.mean(anomaly_errors):.6f}, Std: {np.std(anomaly_errors):.6f}")
print(f"   Separation: {np.mean(anomaly_errors) / np.mean(normal_errors):.2f}x")

# Threshold tuning using percentile method
percentiles = [90, 95, 99, 99.5, 99.9]
print(f"\nüéöÔ∏è Threshold Tuning (using normal data percentiles):")
print(f"{'Percentile':<12} {'Threshold':<12} {'Precision':<12} {'Recall':<12} {'F1':<12}")
print("-" * 60)

best_f1 = 0
best_threshold = 0

for p in percentiles:
    threshold = np.percentile(normal_errors, p)
    predictions = (reconstruction_errors > threshold).astype(int)
    
    tp = np.sum((predictions == 1) & (y_test == 1))
    fp = np.sum((predictions == 1) & (y_test == 0))
    fn = np.sum((predictions == 0) & (y_test == 1))
    
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
    
    print(f"{p:<12.1f} {threshold:<12.6f} {precision:<12.3f} {recall:<12.3f} {f1:<12.3f}")
    
    if f1 > best_f1:
        best_f1 = f1
        best_threshold = threshold

print(f"\n‚úÖ Best threshold: {best_threshold:.6f} (F1: {best_f1:.3f})")

# ROC curve analysis
fpr, tpr, thresholds_roc = roc_curve(y_test, reconstruction_errors)
roc_auc = auc(fpr, tpr)

print(f"\nüìà ROC Analysis:")
print(f"   AUC: {roc_auc:.4f}")
print(f"   Performance: {'Excellent' if roc_auc > 0.95 else 'Good' if roc_auc > 0.85 else 'Fair'}")

# Precision-Recall curve
precision, recall, thresholds_pr = precision_recall_curve(y_test, reconstruction_errors)
pr_auc = auc(recall, precision)

print(f"\nüìä Precision-Recall Analysis:")
print(f"   AUC: {pr_auc:.4f}")

# Final predictions with best threshold
final_predictions = (reconstruction_errors > best_threshold).astype(int)
cm = confusion_matrix(y_test, final_predictions)

print(f"\nüìã Confusion Matrix:")
print(f"   TN: {cm[0,0]:<6} FP: {cm[0,1]:<6}")
print(f"   FN: {cm[1,0]:<6} TP: {cm[1,1]:<6}")

accuracy = (cm[0,0] + cm[1,1]) / np.sum(cm)
print(f"\n‚úÖ Final Performance:")
print(f"   Accuracy: {accuracy:.3f}")
print(f"   Precision: {cm[1,1]/(cm[1,1]+cm[0,1]):.3f}")
print(f"   Recall: {cm[1,1]/(cm[1,1]+cm[1,0]):.3f}")
print(f"   F1 Score: {best_f1:.3f}")

print(f"\nüè≠ Post-Silicon Validation Application:")
print(f"   Detected {cm[1,1]} out of {np.sum(y_test==1)} anomalous wafers")
print(f"   False alarms: {cm[0,1]} out of {np.sum(y_test==0)} normal wafers")
print(f"   Catch rate: {cm[1,1]/np.sum(y_test==1)*100:.1f}%")

In [None]:
from collections import deque
import time

class RealTimeAnomalyDetector:
    """
    Real-time anomaly detection system for streaming semiconductor test data.
    
    Features:
    - Sliding window detection
    - Adaptive threshold updating
    - Concept drift handling
    - Streaming pipeline
    """
    
    def __init__(self, autoencoder, initial_threshold, window_size=100):
        self.autoencoder = autoencoder
        self.threshold = initial_threshold
        self.window = deque(maxlen=window_size)
        self.error_history = deque(maxlen=1000)
        self.drift_detected = False
        
    def update_threshold(self, alpha=0.1):
        """Exponential moving average threshold update"""
        if len(self.error_history) > 50:
            recent_errors = list(self.error_history)[-50:]
            new_threshold = np.percentile(recent_errors, 95)
            self.threshold = alpha * new_threshold + (1 - alpha) * self.threshold
            
    def detect_drift(self):
        """Detect concept drift using error distribution changes"""
        if len(self.error_history) < 200:
            return False
        
        recent = list(self.error_history)[-100:]
        historical = list(self.error_history)[-200:-100]
        
        # Compare distributions using mean and std
        recent_mean = np.mean(recent)
        hist_mean = np.mean(historical)
        hist_std = np.std(historical)
        
        # Drift if recent mean shifts significantly
        drift = abs(recent_mean - hist_mean) > 2 * hist_std
        
        if drift:
            self.drift_detected = True
            print(f"‚ö†Ô∏è Concept drift detected! Recent mean: {recent_mean:.6f}, Historical: {hist_mean:.6f}")
            
        return drift
        
    def process_sample(self, sample):
        """Process a single incoming sample"""
        # Normalize
        sample_scaled = scaler.transform(sample.reshape(1, -1))
        
        # Predict and compute error
        reconstruction = self.autoencoder.predict(sample_scaled, verbose=0)
        error = np.mean((sample_scaled - reconstruction) ** 2)
        
        # Update history
        self.error_history.append(error)
        self.window.append(error)
        
        # Detect anomaly
        is_anomaly = error > self.threshold
        
        # Periodically update threshold and check drift
        if len(self.error_history) % 50 == 0:
            self.update_threshold()
            self.detect_drift()
            
        return {
            'is_anomaly': is_anomaly,
            'error': error,
            'threshold': self.threshold,
            'confidence': min(1.0, error / self.threshold if is_anomaly else self.threshold / error)
        }

# Initialize real-time detector
rt_detector = RealTimeAnomalyDetector(autoencoder, best_threshold)

print("üöÄ Real-Time Anomaly Detection System")
print("=" * 80)

# Simulate streaming data
print("\nüìä Processing streaming test data...")
print(f"{'Sample #':<10} {'Error':<15} {'Threshold':<15} {'Status':<10} {'Confidence':<12}")
print("-" * 70)

stream_results = []
anomaly_count = 0

# Simulate 50 samples
for i in range(50):
    # Randomly choose normal or anomaly
    if np.random.random() < 0.9:
        sample = np.random.randn(n_features)  # Normal
        true_label = 0
    else:
        sample = np.random.randn(n_features) + 3.0  # Anomaly
        true_label = 1
    
    result = rt_detector.process_sample(sample)
    stream_results.append((result['is_anomaly'], true_label))
    
    if result['is_anomaly']:
        anomaly_count += 1
        
    # Print every 10th sample
    if i % 10 == 9:
        status = "üö® ANOMALY" if result['is_anomaly'] else "‚úÖ Normal"
        print(f"{i+1:<10} {result['error']:<15.6f} {result['threshold']:<15.6f} {status:<10} {result['confidence']:<12.3f}")

# Performance metrics
stream_preds = [r[0] for r in stream_results]
stream_labels = [r[1] for r in stream_results]
stream_accuracy = np.mean([p == l for p, l in zip(stream_preds, stream_labels)])

print(f"\n‚úÖ Real-Time Detection Performance:")
print(f"   Samples processed: {len(stream_results)}")
print(f"   Anomalies detected: {anomaly_count}")
print(f"   Accuracy: {stream_accuracy:.3f}")
print(f"   Final threshold: {rt_detector.threshold:.6f}")
print(f"   Threshold adjusted: {abs(rt_detector.threshold - best_threshold) > 0.01}")

print(f"\nüè≠ Production Deployment Considerations:")
print(f"   Latency: <10ms per sample (single prediction)")
print(f"   Memory: ~{len(rt_detector.error_history) * 8 / 1024:.1f}KB (error history)")
print(f"   Throughput: ~100 samples/sec (single thread)")
print(f"   Adaptation: Threshold updates every 50 samples")
print(f"   Drift detection: Checked every 50 samples")

In [None]:
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor
from scipy.spatial.distance import mahalanobis

print("üåê Multivariate Anomaly Detection Comparison")
print("=" * 80)

# Method 1: Autoencoder (already computed)
ae_predictions = (reconstruction_errors > best_threshold).astype(int)
ae_accuracy = np.mean(ae_predictions == y_test)

print("\n1Ô∏è‚É£ Autoencoder Method:")
print(f"   Accuracy: {ae_accuracy:.3f}")
print(f"   Advantages: Learns complex nonlinear patterns, good for high-dim data")
print(f"   Disadvantages: Needs training, hyperparameter tuning")

# Method 2: Mahalanobis Distance
print("\n2Ô∏è‚É£ Mahalanobis Distance:")
X_train_normal = X_scaled[y == 0][:train_idx]
mean = np.mean(X_train_normal, axis=0)
cov = np.cov(X_train_normal, rowvar=False)
cov_inv = np.linalg.pinv(cov)  # Pseudo-inverse for stability

mahal_distances = np.array([
    mahalanobis(x, mean, cov_inv) for x in X_test
])

# Threshold using chi-squared distribution (95th percentile)
mahal_threshold = np.percentile(mahal_distances[y_test == 0], 99)
mahal_predictions = (mahal_distances > mahal_threshold).astype(int)
mahal_accuracy = np.mean(mahal_predictions == y_test)

print(f"   Accuracy: {mahal_accuracy:.3f}")
print(f"   Threshold: {mahal_threshold:.2f}")
print(f"   Advantages: Statistical foundation, interpretable, fast")
print(f"   Disadvantages: Assumes Gaussian, sensitive to outliers in training")

# Method 3: Isolation Forest
print("\n3Ô∏è‚É£ Isolation Forest:")
iso_forest = IsolationForest(
    contamination=0.05,  # Expected proportion of anomalies
    random_state=42,
    n_estimators=100
)
iso_forest.fit(X_train_normal)
iso_predictions = iso_forest.predict(X_test)
iso_predictions = (iso_predictions == -1).astype(int)  # -1 = anomaly
iso_accuracy = np.mean(iso_predictions == y_test)

print(f"   Accuracy: {iso_accuracy:.3f}")
print(f"   Contamination: 5%")
print(f"   Advantages: No assumptions, handles outliers, fast")
print(f"   Disadvantages: Less interpretable, needs contamination estimate")

# Method 4: Local Outlier Factor
print("\n4Ô∏è‚É£ Local Outlier Factor (LOF):")
lof = LocalOutlierFactor(
    n_neighbors=20,
    contamination=0.05,
    novelty=True  # For use on test data
)
lof.fit(X_train_normal)
lof_predictions = lof.predict(X_test)
lof_predictions = (lof_predictions == -1).astype(int)
lof_accuracy = np.mean(lof_predictions == y_test)

print(f"   Accuracy: {lof_accuracy:.3f}")
print(f"   Neighbors: 20")
print(f"   Advantages: Finds local density anomalies, no global assumptions")
print(f"   Disadvantages: Computationally expensive, needs n_neighbors tuning")

# Comparison summary
print("\nüìä Method Comparison Summary:")
print(f"{'Method':<25} {'Accuracy':<12} {'Speed':<15} {'Best For':<30}")
print("-" * 85)
print(f"{'Autoencoder':<25} {ae_accuracy:<12.3f} {'Medium':<15} {'Complex patterns, high-dim':<30}")
print(f"{'Mahalanobis Distance':<25} {mahal_accuracy:<12.3f} {'Fast':<15} {'Gaussian data, real-time':<30}")
print(f"{'Isolation Forest':<25} {iso_accuracy:<12.3f} {'Fast':<15} {'Mixed distributions':<30}")
print(f"{'Local Outlier Factor':<25} {lof_accuracy:<12.3f} {'Slow':<15} {'Local density anomalies':<30}")

# Ensemble approach
print("\nüîÄ Ensemble Approach (Voting):")
ensemble_predictions = (
    ae_predictions + 
    mahal_predictions + 
    iso_predictions + 
    lof_predictions
) >= 2  # At least 2 methods agree

ensemble_accuracy = np.mean(ensemble_predictions == y_test)
print(f"   Accuracy: {ensemble_accuracy:.3f}")
print(f"   Strategy: Majority voting (‚â•2 out of 4 methods)")
print(f"   Advantage: More robust, reduces false positives")

# Feature correlation analysis
print("\nüîó Feature Correlation Impact:")
corr_matrix = np.corrcoef(X_train_normal.T)
avg_corr = np.mean(np.abs(corr_matrix[np.triu_indices_from(corr_matrix, k=1)]))
print(f"   Average feature correlation: {avg_corr:.3f}")
print(f"   High correlation (>{avg_corr:.2f}): {np.sum(np.abs(corr_matrix) > avg_corr) // 2} pairs")
print(f"   Impact: {'High' if avg_corr > 0.5 else 'Moderate' if avg_corr > 0.3 else 'Low'} - "
      f"{'Autoencoders excel' if avg_corr > 0.5 else 'All methods viable'}")

print(f"\nüè≠ Post-Silicon Application Guidance:")
print(f"   ‚úÖ Use Autoencoder when: High-dimensional parametric test data (>20 params)")
print(f"   ‚úÖ Use Mahalanobis when: Real-time detection needed, data ~Gaussian")
print(f"   ‚úÖ Use Isolation Forest when: Unknown anomaly patterns, mixed distributions")
print(f"   ‚úÖ Use LOF when: Detecting wafer map spatial clusters")
print(f"   ‚úÖ Use Ensemble when: Critical decisions (false positives costly)")

In [None]:
from tensorflow.keras.models import Model

print("‚è±Ô∏è Time-Series Anomaly Detection with LSTM Autoencoder")
print("=" * 80)

# Generate synthetic time-series data
np.random.seed(42)
n_timesteps = 1000
n_features_ts = 5

# Create normal pattern (sine wave with noise)
t = np.linspace(0, 100, n_timesteps)
normal_pattern = np.column_stack([
    np.sin(t / 5 + i) + np.random.randn(n_timesteps) * 0.1
    for i in range(n_features_ts)
])

# Inject anomalies (sudden spikes)
anomaly_indices = [200, 450, 750]
for idx in anomaly_indices:
    normal_pattern[idx:idx+10] += 5.0  # Spike anomaly

# Normalize
ts_scaler = StandardScaler()
ts_data = ts_scaler.fit_transform(normal_pattern)

# Create sequences (sliding window)
sequence_length = 20
X_sequences = []
y_labels = []

for i in range(len(ts_data) - sequence_length):
    X_sequences.append(ts_data[i:i+sequence_length])
    # Label as anomaly if any point in sequence is anomalous
    is_anomaly = any(abs(i - idx) < 10 for idx in anomaly_indices)
    y_labels.append(1 if is_anomaly else 0)

X_sequences = np.array(X_sequences)
y_labels = np.array(y_labels)

print(f"üìä Time-Series Dataset:")
print(f"   Total timesteps: {n_timesteps}")
print(f"   Features per timestep: {n_features_ts}")
print(f"   Sequence length: {sequence_length}")
print(f"   Number of sequences: {len(X_sequences)}")
print(f"   Anomalous sequences: {np.sum(y_labels)} ({np.sum(y_labels)/len(y_labels)*100:.1f}%)")

# Split data
train_size = int(0.7 * len(X_sequences))
X_train_ts = X_sequences[:train_size]
X_test_ts = X_sequences[train_size:]
y_test_ts = y_labels[train_size:]

# Build LSTM Autoencoder
latent_dim = 10

# Encoder
encoder_inputs = layers.Input(shape=(sequence_length, n_features_ts))
x = layers.LSTM(64, activation='relu', return_sequences=True)(encoder_inputs)
x = layers.LSTM(32, activation='relu', return_sequences=False)(x)
latent = layers.Dense(latent_dim, activation='relu', name='latent')(x)

# Decoder
x = layers.RepeatVector(sequence_length)(latent)
x = layers.LSTM(32, activation='relu', return_sequences=True)(x)
x = layers.LSTM(64, activation='relu', return_sequences=True)(x)
decoder_outputs = layers.TimeDistributed(layers.Dense(n_features_ts))(x)

# Full autoencoder
lstm_autoencoder = Model(encoder_inputs, decoder_outputs, name='lstm_autoencoder')
lstm_autoencoder.compile(optimizer='adam', loss='mse')

print(f"\nüèóÔ∏è LSTM Autoencoder Architecture:")
print(f"   Encoder: Input({sequence_length}, {n_features_ts}) ‚Üí LSTM(64) ‚Üí LSTM(32) ‚Üí Dense({latent_dim})")
print(f"   Decoder: RepeatVector({sequence_length}) ‚Üí LSTM(32) ‚Üí LSTM(64) ‚Üí TimeDistributed(Dense({n_features_ts}))")
print(f"   Parameters: {lstm_autoencoder.count_params():,}")

# Train
history_ts = lstm_autoencoder.fit(
    X_train_ts, X_train_ts,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    verbose=0
)

print(f"\n‚úÖ Training complete:")
print(f"   Final loss: {history_ts.history['loss'][-1]:.6f}")
print(f"   Final val_loss: {history_ts.history['val_loss'][-1]:.6f}")

# Predict and compute sequence-level reconstruction errors
X_test_pred_ts = lstm_autoencoder.predict(X_test_ts, verbose=0)
sequence_errors = np.mean((X_test_ts - X_test_pred_ts) ** 2, axis=(1, 2))

# Threshold determination
normal_seq_errors = sequence_errors[y_test_ts == 0]
ts_threshold = np.percentile(normal_seq_errors, 95)

# Predictions
ts_predictions = (sequence_errors > ts_threshold).astype(int)
ts_accuracy = np.mean(ts_predictions == y_test_ts)

print(f"\nüìä Time-Series Detection Performance:")
print(f"   Threshold: {ts_threshold:.6f}")
print(f"   Accuracy: {ts_accuracy:.3f}")

# Temporal pattern analysis
print(f"\nüîç Temporal Pattern Analysis:")
print(f"   Normal sequence error - Mean: {np.mean(normal_seq_errors):.6f}, Std: {np.std(normal_seq_errors):.6f}")
print(f"   Anomaly sequence error - Mean: {np.mean(sequence_errors[y_test_ts==1]):.6f}, Std: {np.std(sequence_errors[y_test_ts==1]):.6f}")
print(f"   Separation ratio: {np.mean(sequence_errors[y_test_ts==1]) / np.mean(normal_seq_errors):.2f}x")

# Change point detection
print(f"\nüìç Change Point Detection:")
window_size = 50
change_points = []

for i in range(len(sequence_errors) - window_size):
    window = sequence_errors[i:i+window_size]
    if np.mean(window) > 3 * np.std(normal_seq_errors):
        change_points.append(i + train_size)
        
print(f"   Detected {len(change_points)} change points")
if change_points:
    print(f"   First change point at sequence {change_points[0]} (timestep ~{change_points[0] + sequence_length})")

print(f"\nüè≠ Post-Silicon Time-Series Applications:")
print(f"   ‚úÖ Equipment drift monitoring: Detect gradual parameter shifts")
print(f"   ‚úÖ Test station anomalies: Identify sudden calibration issues")
print(f"   ‚úÖ Yield trend analysis: Flag unexpected yield drops")
print(f"   ‚úÖ Thermal cycling tests: Detect abnormal temperature patterns")
print(f"   ‚úÖ Burn-in failures: Early prediction from power consumption trends")

print(f"\nüí° Key Insights:")
print(f"   ‚Ä¢ LSTM captures temporal dependencies (sequence context)")
print(f"   ‚Ä¢ Sequence-level errors smoother than point-wise")
print(f"   ‚Ä¢ Good for: Gradual drifts, periodic patterns, multi-step anomalies")
print(f"   ‚Ä¢ Latency: ~{sequence_length} timesteps (need full sequence)")
print(f"   ‚Ä¢ Trade-off: Longer sequences = better context, but higher latency")

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set style
plt.style.use('default')
sns.set_palette("husl")

fig = plt.figure(figsize=(16, 12))
gs = fig.add_gridspec(3, 2, hspace=0.3, wspace=0.3)

# Plot 1: Reconstruction Error Distribution with Threshold
ax1 = fig.add_subplot(gs[0, 0])
bins = np.linspace(0, max(reconstruction_errors), 50)
ax1.hist(reconstruction_errors[y_test==0], bins=bins, alpha=0.6, label='Normal', color='#2ecc71', edgecolor='black')
ax1.hist(reconstruction_errors[y_test==1], bins=bins, alpha=0.6, label='Anomaly', color='#e74c3c', edgecolor='black')
ax1.axvline(best_threshold, color='#f39c12', linestyle='--', linewidth=2, label=f'Threshold = {best_threshold:.4f}')
ax1.set_xlabel('Reconstruction Error', fontsize=11, fontweight='bold')
ax1.set_ylabel('Frequency', fontsize=11, fontweight='bold')
ax1.set_title('Reconstruction Error Distribution', fontsize=13, fontweight='bold', pad=15)
ax1.legend(fontsize=10)
ax1.grid(alpha=0.3, linestyle='--')

# Add statistics text
normal_median = np.median(reconstruction_errors[y_test==0])
anomaly_median = np.median(reconstruction_errors[y_test==1])
ax1.text(0.98, 0.97, f'Normal median: {normal_median:.4f}\nAnomaly median: {anomaly_median:.4f}\nSeparation: {anomaly_median/normal_median:.1f}x',
         transform=ax1.transAxes, ha='right', va='top', fontsize=9,
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))

# Plot 2: ROC and Precision-Recall Curves
ax2 = fig.add_subplot(gs[0, 1])
ax2.plot(fpr, tpr, color='#3498db', linewidth=2.5, label=f'ROC (AUC = {roc_auc:.3f})')
ax2.plot([0, 1], [0, 1], 'k--', linewidth=1.5, alpha=0.5)
ax2_twin = ax2.twinx()
ax2_twin.plot(recall, precision, color='#9b59b6', linewidth=2.5, linestyle='--', label=f'PR (AUC = {pr_auc:.3f})')
ax2.set_xlabel('False Positive Rate / Recall', fontsize=11, fontweight='bold')
ax2.set_ylabel('True Positive Rate', fontsize=11, fontweight='bold', color='#3498db')
ax2_twin.set_ylabel('Precision', fontsize=11, fontweight='bold', color='#9b59b6')
ax2.set_title('ROC & Precision-Recall Curves', fontsize=13, fontweight='bold', pad=15)
ax2.legend(loc='lower right', fontsize=10)
ax2_twin.legend(loc='upper right', fontsize=10)
ax2.grid(alpha=0.3, linestyle='--')
ax2.tick_params(axis='y', labelcolor='#3498db')
ax2_twin.tick_params(axis='y', labelcolor='#9b59b6')

# Plot 3: Time-Series Detection Timeline
ax3 = fig.add_subplot(gs[1, :])
timeline = np.arange(len(sequence_errors)) + train_size
ax3.plot(timeline, sequence_errors, color='#34495e', linewidth=1, alpha=0.7, label='Reconstruction Error')
ax3.axhline(ts_threshold, color='#e74c3c', linestyle='--', linewidth=2, label=f'Threshold = {ts_threshold:.4f}')
anomaly_mask = y_test_ts == 1
ax3.scatter(timeline[anomaly_mask], sequence_errors[anomaly_mask], 
           color='#e74c3c', s=50, zorder=5, label='True Anomalies', marker='X')
detected_mask = ts_predictions == 1
ax3.scatter(timeline[detected_mask], sequence_errors[detected_mask],
           facecolors='none', edgecolors='#f39c12', s=100, linewidths=2, 
           zorder=4, label='Detected', marker='o')
ax3.set_xlabel('Sequence Index', fontsize=11, fontweight='bold')
ax3.set_ylabel('Reconstruction Error', fontsize=11, fontweight='bold')
ax3.set_title('Time-Series Anomaly Detection Timeline', fontsize=13, fontweight='bold', pad=15)
ax3.legend(fontsize=10, loc='upper left')
ax3.grid(alpha=0.3, linestyle='--')

# Highlight change points
for cp in change_points[:3]:  # Show first 3
    ax3.axvline(cp, color='#95a5a6', linestyle=':', linewidth=1.5, alpha=0.6)
    ax3.text(cp, ax3.get_ylim()[1]*0.95, 'Change', rotation=90, fontsize=8, ha='right', va='top')

# Plot 4: Method Comparison (Confusion Matrices)
ax4 = fig.add_subplot(gs[2, 0])
methods = ['Autoencoder', 'Mahalanobis', 'Iso Forest', 'LOF']
accuracies = [ae_accuracy, mahal_accuracy, iso_accuracy, lof_accuracy]
colors_bar = ['#3498db', '#2ecc71', '#f39c12', '#9b59b6']
bars = ax4.barh(methods, accuracies, color=colors_bar, edgecolor='black', linewidth=1.5)
ax4.set_xlabel('Accuracy', fontsize=11, fontweight='bold')
ax4.set_title('Method Comparison', fontsize=13, fontweight='bold', pad=15)
ax4.set_xlim(0, 1)
ax4.grid(axis='x', alpha=0.3, linestyle='--')

# Add value labels
for i, (bar, acc) in enumerate(zip(bars, accuracies)):
    ax4.text(acc + 0.02, i, f'{acc:.3f}', va='center', fontsize=10, fontweight='bold')

# Add best indicator
best_idx = np.argmax(accuracies)
bars[best_idx].set_edgecolor('#e74c3c')
bars[best_idx].set_linewidth(3)

# Plot 5: Ensemble Confusion Matrix
ax5 = fig.add_subplot(gs[2, 1])
cm_ensemble = confusion_matrix(y_test, ensemble_predictions)
sns.heatmap(cm_ensemble, annot=True, fmt='d', cmap='Blues', cbar=True, 
            xticklabels=['Normal', 'Anomaly'], yticklabels=['Normal', 'Anomaly'],
            ax=ax5, linewidths=2, linecolor='black', annot_kws={'fontsize': 14, 'fontweight': 'bold'})
ax5.set_xlabel('Predicted', fontsize=11, fontweight='bold')
ax5.set_ylabel('Actual', fontsize=11, fontweight='bold')
ax5.set_title(f'Ensemble Confusion Matrix (Acc: {ensemble_accuracy:.3f})', fontsize=13, fontweight='bold', pad=15)

# Add performance metrics text
tn, fp, fn, tp = cm_ensemble.ravel()
precision_ens = tp / (tp + fp)
recall_ens = tp / (tp + fn)
f1_ens = 2 * precision_ens * recall_ens / (precision_ens + recall_ens)

metrics_text = f'Precision: {precision_ens:.3f}\nRecall: {recall_ens:.3f}\nF1: {f1_ens:.3f}\nFPR: {fp/(fp+tn):.3f}'
ax5.text(1.35, 0.5, metrics_text, transform=ax5.transAxes, fontsize=10,
        bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.8),
        verticalalignment='center')

plt.suptitle('üîç Autoencoder Anomaly Detection - Comprehensive Analysis', 
            fontsize=16, fontweight='bold', y=0.995)

plt.savefig('autoencoder_anomaly_detection_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

print("‚úÖ Visualization saved as 'autoencoder_anomaly_detection_analysis.png'")
print("\nüìä Analysis Summary:")
print(f"   Best individual method: {methods[best_idx]} ({accuracies[best_idx]:.3f})")
print(f"   Ensemble accuracy: {ensemble_accuracy:.3f}")
print(f"   Improvement: {(ensemble_accuracy - accuracies[best_idx])*100:+.1f}%")
print(f"   False positive rate: {fp/(fp+tn):.3f} (critical for production)")
print(f"   Time-series accuracy: {ts_accuracy:.3f}")
print(f"   ROC AUC: {roc_auc:.3f} - {'Excellent' if roc_auc > 0.95 else 'Good' if roc_auc > 0.85 else 'Fair'}")

## üöÄ Real-World Projects

### Project 1: Equipment Drift Detection System üè≠
**Objective:** Real-time monitoring of test equipment drift to prevent false failures  
**Business Value:** $5M annual savings from reduced unnecessary equipment maintenance and false rejects

**Architecture:**
```
STDF Stream ‚Üí Feature Extraction ‚Üí LSTM Autoencoder ‚Üí Drift Score ‚Üí Alert Dashboard
                      ‚Üì
              Historical Baseline
```

**Key Features:**
- Multi-parameter monitoring (Vdd, Idd, frequency, temperature)
- Adaptive threshold with seasonal adjustment
- Equipment-specific baseline models
- Automated calibration recommendations
- ROI: Detect drift 3-5 days before failures, 85% reduction in unplanned downtime

**Implementation Tips:**
- Sequence length: 100-200 test cycles (~1-2 hours)
- Update baseline weekly with concept drift detection
- Use ensemble (Autoencoder + Mahalanobis) for robustness
- Alert severity levels: Warning (95th%), Critical (99th%)

---

### Project 2: Wafer-Level Outlier Detection üî¨
**Objective:** Identify anomalous wafers before costly packaging  
**Business Value:** 99.2% accuracy, prevent $2M/year in packaging costs for defective wafers

**Architecture:**
```
Parametric Test Data ‚Üí Spatial Feature Engineering ‚Üí Autoencoder ‚Üí Outlier Score ‚Üí Bin Decision
         ‚Üì                        ‚Üì
   Die-level stats          Wafer map patterns
```

**Key Features:**
- 50+ parametric test features per die
- Spatial correlation features (neighbor statistics)
- Multi-level detection (die, wafer, lot)
- Integration with MES system for auto-binning
- ROI: Catch 98% of problem wafers pre-packaging

**Implementation Tips:**
- Train separate models per product family
- Feature engineering: mean, std, spatial gradients, edge effects
- Ensemble with Isolation Forest for robustness
- Retrain monthly with production data feedback

---

### Project 3: Sensor Fault Detection in Test Stations üì°
**Objective:** Detect faulty sensors causing test accuracy degradation  
**Business Value:** 30% reduction in false test escapes, improved product quality

**Architecture:**
```
Multi-Sensor Stream ‚Üí Correlation Analysis ‚Üí VAE ‚Üí Sensor Health Score ‚Üí Maintenance Queue
                              ‚Üì
                    Cross-sensor validation
```

**Key Features:**
- Monitors 20+ sensors per test station (voltage, current, temp, pressure)
- Cross-correlation anomaly detection
- Predictive maintenance scheduling
- Sensor-specific degradation curves
- ROI: Proactive replacement before critical failures

**Implementation Tips:**
- Use VAE for probabilistic outlier scoring
- Compare sensor readings across multiple test stations
- Time-series analysis for gradual drift
- Alert when 2+ sensors show correlated anomalies

---

### Project 4: Production Line Anomaly Alerting System üö®
**Objective:** Real-time detection of production line issues with automated escalation  
**Business Value:** 40% faster incident response, $3M annual yield improvement

**Architecture:**
```
Edge Devices ‚Üí Streaming Pipeline (Kafka) ‚Üí Real-Time AE ‚Üí Anomaly DB ‚Üí Dashboard + Alerts
                                                   ‚Üì
                                          Context-aware rules
```

**Key Features:**
- Sub-second detection latency (<500ms)
- Context-aware alerting (shift, product, line)
- Automated escalation workflow
- Root cause analysis suggestions
- ROI: Reduce yield loss by catching issues within 15 minutes

**Implementation Tips:**
- Deploy model on edge for ultra-low latency
- Use lightweight autoencoder (quantized, pruned)
- Batch inference for throughput (1000 samples/sec)
- Integrate with existing SCADA/MES systems
- Multi-tier alerting: Email ‚Üí SMS ‚Üí Pager for severity levels

## üéØ Key Takeaways & Best Practices

### üìã Threshold Selection Decision Matrix

| **Scenario** | **Method** | **Rationale** | **Typical Value** |
|-------------|-----------|--------------|------------------|
| High imbalance (>99% normal) | Percentile (99-99.9%) | Robust to extreme outliers | 99th percentile of normal |
| Real-time, low latency | MAD (Median Absolute Deviation) | Fast, robust | Median + 3√óMAD |
| Time-series with drift | Adaptive (EMA) | Tracks distribution changes | Œ±=0.1, recompute every 50 samples |
| Critical applications | Ensemble voting | Reduces false positives | ‚â•2/4 methods agree |
| Known contamination rate | Contamination-based | Matches expected anomaly % | Set contamination=0.05 for 5% |

---

### üèóÔ∏è Architecture Design Principles

**1. Compression Ratio Selection:**
- **Rule of thumb:** 5-10x compression for anomaly detection
- **Example:** 50 features ‚Üí 5-10 latent dimensions
- **Too aggressive (>15x):** Loss of discriminative information
- **Too conservative (<3x):** Model memorizes anomalies

**2. Training Strategy:**
- ‚úÖ **DO:** Train only on normal data (clean baseline)
- ‚úÖ **DO:** Use validation set to tune bottleneck size
- ‚ùå **DON'T:** Include anomalies in training (contaminates baseline)
- ‚ùå **DON'T:** Overtrain (leads to memorization)

**3. Network Depth:**
- **Simple patterns:** 2-3 hidden layers (Input ‚Üí 64 ‚Üí 32 ‚Üí 8 ‚Üí 32 ‚Üí 64 ‚Üí Output)
- **Complex patterns:** 4-5 layers with skip connections
- **Time-series:** LSTM autoencoder with 2-3 LSTM layers

---

### ‚öôÔ∏è Training Best Practices

**Data Preparation:**
```python
# 1. Normalize features (critical for autoencoders)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_normal)  # Only normal data

# 2. Train/val split (only normal)
X_train, X_val = train_test_split(X_scaled, test_size=0.2, random_state=42)

# 3. Early stopping to prevent overfitting
early_stop = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
```

**Hyperparameters:**
- **Epochs:** 50-100 (with early stopping)
- **Batch size:** 32-128 (larger for stable gradients)
- **Learning rate:** 0.001 (Adam optimizer)
- **Activation:** ReLU for hidden layers, linear for output
- **Loss:** MSE for reconstruction

**Monitoring:**
- Track validation loss (should plateau, not increase)
- Visualize reconstruction quality on normal samples
- Check latent space distribution (should be compact)

---

### ‚ö†Ô∏è Common Pitfalls & Solutions

**Pitfall 1: Training on contaminated data**
- **Symptom:** Low recall, model learns anomalies as normal
- **Solution:** Carefully clean training data, use domain knowledge to filter outliers
- **Code:**
```python
# Remove outliers from training using IQR
Q1 = np.percentile(X_train, 25, axis=0)
Q3 = np.percentile(X_train, 75, axis=0)
IQR = Q3 - Q1
mask = np.all((X_train > Q1 - 1.5*IQR) & (X_train < Q3 + 1.5*IQR), axis=1)
X_train_clean = X_train[mask]
```

**Pitfall 2: Fixed threshold in production**
- **Symptom:** Increasing false positives over time (concept drift)
- **Solution:** Implement adaptive thresholding with periodic retraining
- **Code:**
```python
# Exponential moving average threshold
threshold_ema = alpha * new_threshold + (1 - alpha) * threshold_ema
# Retrain trigger: if drift detected or monthly schedule
```

**Pitfall 3: Ignoring imbalanced evaluation**
- **Symptom:** High accuracy but poor recall (missing anomalies)
- **Solution:** Use F1, ROC-AUC, and PR-AUC instead of accuracy
- **Code:**
```python
# Comprehensive evaluation
roc_auc = roc_auc_score(y_true, anomaly_scores)
pr_auc = average_precision_score(y_true, anomaly_scores)
f1 = f1_score(y_true, predictions)
```

**Pitfall 4: Overfitting to training data**
- **Symptom:** Low training error, high validation error
- **Solution:** Regularization, dropout, early stopping
- **Code:**
```python
model = Sequential([
    Dense(64, activation='relu', kernel_regularizer=l2(1e-4)),
    Dropout(0.2),
    Dense(32, activation='relu', kernel_regularizer=l2(1e-4)),
    # ...
])
```

**Pitfall 5: Not handling time-series dependencies**
- **Symptom:** Poor performance on temporal anomalies
- **Solution:** Use LSTM/GRU autoencoder instead of feedforward
- **Code:**
```python
# LSTM for time-series
encoder = LSTM(64, return_sequences=True)(input)
latent = LSTM(32, return_sequences=False)(encoder)
decoder = RepeatVector(sequence_length)(latent)
output = LSTM(64, return_sequences=True)(decoder)
```

---

### üè≠ Post-Silicon Validation Use Cases

**1. Parametric Test Outlier Detection:**
- **Data:** Vdd, Idd, frequency, power measurements
- **Approach:** Dense autoencoder with 50+ features ‚Üí 5D latent
- **Threshold:** 99.5th percentile (high yield products)
- **Impact:** Identify marginal devices, improve guardbands

**2. Wafer Map Spatial Anomalies:**
- **Data:** Die-level pass/fail + parametric data
- **Approach:** CNN autoencoder for spatial patterns + dense for parametrics
- **Threshold:** Per-wafer adaptive (accounts for process variation)
- **Impact:** Early lot disposition, yield learning

**3. Test Time Anomalies:**
- **Data:** Test execution times per test
- **Approach:** Time-series LSTM autoencoder
- **Threshold:** 95th percentile (test time less critical)
- **Impact:** Detect equipment slowdowns, optimize test flow

**4. Multi-Site Correlation Anomalies:**
- **Data:** Same DUT tested on multiple test stations
- **Approach:** Variational autoencoder for cross-site patterns
- **Threshold:** Mahalanobis distance on latent space
- **Impact:** Identify rogue test stations, improve test repeatability

**5. Burn-In Failure Prediction:**
- **Data:** Power, temperature, voltage over 48-168 hours
- **Approach:** LSTM autoencoder with 1-hour windows
- **Threshold:** Adaptive (updated every 24 hours)
- **Impact:** Early termination of failing devices, reduce burn-in cost

---

### üöÄ Performance Optimization

**Inference Speed:**
```python
# 1. Model quantization (4x speedup)
converter = tf.lite.TFLiteConverter.from_keras_model(autoencoder)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

# 2. Batch inference (10x throughput)
batch_size = 256
predictions = autoencoder.predict(X_test, batch_size=batch_size)

# 3. ONNX export for production
import tf2onnx
onnx_model = tf2onnx.convert.from_keras(autoencoder)
```

**Memory Footprint:**
```python
# Pruning (remove 80% of weights with minimal accuracy loss)
import tensorflow_model_optimization as tfmot
prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude
pruned_model = prune_low_magnitude(autoencoder, pruning_schedule)
```

**Monitoring in Production:**
```python
# Track key metrics
metrics = {
    'latency_p50': np.percentile(latencies, 50),
    'latency_p99': np.percentile(latencies, 99),
    'anomaly_rate': np.mean(predictions == 1),
    'false_positive_rate': fp / (fp + tn),
    'threshold_drift': abs(current_threshold - baseline_threshold)
}
# Alert if: latency_p99 > 100ms, anomaly_rate > 10%, threshold_drift > 20%
```

---

### üí° When to Use Autoencoders vs Alternatives

**Use Autoencoders when:**
- ‚úÖ High-dimensional data (>20 features)
- ‚úÖ Complex nonlinear patterns
- ‚úÖ Unlabeled normal data available
- ‚úÖ Need to capture feature interactions
- ‚úÖ Deep learning infrastructure available

**Use alternatives when:**
- ‚ùå **Isolation Forest:** Unknown distributions, fast inference needed
- ‚ùå **LOF:** Local density anomalies, small datasets (<1000 samples)
- ‚ùå **One-Class SVM:** Need interpretability, kernel methods suitable
- ‚ùå **Statistical methods (Z-score, IQR):** Simple univariate cases, real-time constraints
- ‚ùå **Ensemble:** Critical decisions, need robustness

---

**üîó Next Steps:**
- Notebook 039: Gaussian Mixture Models for soft clustering
- Notebook 040: DBSCAN for density-based clustering
- Notebook 065: Deep Reinforcement Learning (extends to anomaly detection in control systems)

## üìä Comprehensive Visualization & Analysis

## ‚è±Ô∏è Time-Series Anomaly Detection

## üåê Multivariate Anomaly Detection Methods

## üî¨ Real-Time Anomaly Detection Pipeline

## üéØ Part 3: Advanced Anomaly Detection Techniques