# Wafer Defect Classification Tutorial

This tutorial demonstrates how to build and deploy a production-ready wafer defect classification system using classical machine learning approaches.

## Business Context

In semiconductor manufacturing, wafer defect detection is critical for:
- **Quality Control**: Early detection prevents defective dies from reaching customers
- **Cost Reduction**: Identifying process issues before they impact entire lots
- **Process Optimization**: Understanding defect patterns to improve manufacturing

## Learning Objectives

By the end of this tutorial, you will:
1. Understand semiconductor defect classification challenges
2. Build and compare multiple ML models for defect detection
3. Apply manufacturing-specific metrics (PWS, Estimated Loss)
4. Deploy models using standardized CLI interface
5. Optimize model thresholds for precision/recall constraints

## Setup and Imports

In [None]:
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Import our wafer defect pipeline
from wafer_defect_pipeline import (
    WaferDefectPipeline, 
    generate_synthetic_wafer_defects,
    load_dataset
)

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (10, 6)

## 1. Data Generation and Exploration

Let's start by generating synthetic wafer defect data to understand the problem.

### 🎯 Exercise 1.1: Generate Synthetic Data (★ Beginner)

**Task**: Generate 1000 wafer samples with 20% defect rate and explore the dataset.

**Requirements**:
1. Use `generate_synthetic_wafer_defects()` to create the dataset
2. Calculate and print the class distribution (how many defective vs non-defective)
3. Calculate the imbalance ratio (e.g., 4:1 means 4 good wafers per 1 defective)

**Hints**:
- Use `n_samples=1000` and `defect_rate=0.20`
- Use `.value_counts()` on the target column to get class distribution
- Imbalance ratio = count of class 0 / count of class 1

**TODO**: Complete the code below 👇

In [None]:
# TODO: Generate synthetic wafer defect data with 1000 samples and 20% defect rate
# Your code here:
# df = generate_synthetic_wafer_defects(...)

# Print dataset shape
# print(f"Dataset shape: {df.shape}")

# TODO: Calculate class distribution
# Your code here:
# class_counts = ...
# print("\n=== Class Distribution ===")
# print(f"Non-defective wafers (0): ...")
# print(f"Defective wafers (1): ...")

# TODO: Calculate imbalance ratio
# imbalance_ratio = ...
# print(f"\nImbalance ratio: {imbalance_ratio:.2f}:1")

# ✅ Self-check: 
# - Do you have exactly 1000 samples?
# - Is the defect rate approximately 20%?
# - Is the imbalance ratio approximately 4:1?

### 🔍 Self-Check Exercise 1.1

Before moving on, verify:
- [ ] Dataset has 1000 rows
- [ ] Approximately 200 defective wafers (180-220 is normal due to randomness)
- [ ] Imbalance ratio is around 4:1
- [ ] You understand what defect_rate parameter controls

💡 **Stuck?** Check the solution notebook for the complete implementation.

### 🎯 Exercise 1.2: Visualize Feature Distributions (★★ Intermediate)

**Task**: Create visualizations to understand which features distinguish defective wafers.

**Requirements**:
1. Create violin plots comparing defective vs non-defective wafers for top 3 features
2. Calculate correlation of each feature with the target
3. Identify the 3 most discriminative features

**Hints**:
- Use `.corrwith()` to calculate correlations with target
- Take absolute value of correlations (both positive and negative correlations are important)
- Sort by correlation in descending order

**TODO**: Complete the code below 👇

In [None]:
# TODO: Get feature columns (exclude 'defect' and 'wafer_id')
# feature_cols = [col for col in df.columns if col not in ['defect', 'wafer_id']]

# TODO: Calculate correlations with target
# correlations = ...

# TODO: Get top 3 features
# top_3 = correlations.head(3)
# print("Top 3 Most Discriminative Features:")
# print(top_3)

# TODO: Create violin plots for top 3 features
# fig, axes = plt.subplots(1, 3, figsize=(15, 5))
# for idx, feature in enumerate(top_3.index):
#     ax = axes[idx]
#     # Create violin plot comparing defective vs non-defective
#     # Your plotting code here...

# plt.tight_layout()
# plt.show()

# ✅ Self-check:
# - Do the top features show clear separation between classes?
# - Are correlation values above 0.3?
# - Do violin plots show different distributions for defective vs good wafers?

### 🔍 Self-Check Exercise 1.2

Before moving on, verify:
- [ ] You identified 3 features with highest absolute correlation
- [ ] Correlation values make sense (between 0 and 1)
- [ ] Violin plots show visual differences between classes
- [ ] You understand why higher correlation = better feature

💡 **Manufacturing Insight**: In real semiconductor fabs, these correlations would guide which process parameters to monitor most closely.

## 2. Model Training and Comparison

Now let's train and compare different ML models for wafer defect classification.

## 2. Model Training and Comparison

Now let's train multiple ML models and compare their performance.

### 🎯 Exercise 2.1: Prepare Data and Train Models (★★ Intermediate)

**Task**: Split the data and train 5 different classification models.

**Requirements**:
1. Create stratified 80/20 train/test split (preserve class distribution)
2. Train these 5 models:
   - Logistic Regression
   - Linear SVM
   - Decision Tree
   - Random Forest
   - Gradient Boosting
3. Evaluate each with ROC-AUC, PR-AUC, F1, and PWS metrics
4. Store results in a DataFrame for comparison

**Hints**:
- Use `train_test_split(stratify=y)` to maintain class balance
- Instantiate each pipeline with default parameters
- Use `.fit()` to train and `.evaluate()` to get metrics
- Store results in a list of dictionaries

**TODO**: Complete the code below 👇

In [None]:
# TODO: Prepare features and target
# X = df.drop(['defect', 'wafer_id'], axis=1)
# y = df['defect']

# TODO: Create stratified train/test split (80/20)
# from sklearn.model_selection import train_test_split
# X_train, X_test, y_train, y_test = train_test_split(
#     X, y, test_size=0.2, stratify=..., random_state=42
# )

# print(f"Training set: {X_train.shape[0]} samples")
# print(f"Test set: {X_test.shape[0]} samples")
# print(f"Training defect rate: {y_train.mean():.1%}")
# print(f"Test defect rate: {y_test.mean():.1%}")

# TODO: Define model configurations
# model_configs = {
#     'logistic': 'logistic_regression',
#     'linear_svm': 'linear_svm',
#     'tree': 'decision_tree',
#     'rf': 'random_forest',
#     'gb': 'gradient_boosting'
# }

# TODO: Train and evaluate each model
# results = []
# for name, model_type in model_configs.items():
#     print(f"\nTraining {name}...")
#     # Instantiate pipeline
#     pipeline = WaferDefectPipeline(model_type=model_type)
#     
#     # Train
#     pipeline.fit(X_train, y_train)
#     
#     # Evaluate
#     metrics = pipeline.evaluate(X_test, y_test)
#     
#     # Store results
#     results.append({
#         'model': name,
#         'roc_auc': metrics['roc_auc'],
#         'pr_auc': metrics['pr_auc'],
#         'f1': metrics['f1'],
#         'pws': metrics['pws']
#     })

# TODO: Create results DataFrame
# results_df = pd.DataFrame(results)
# print("\n=== Model Comparison ===")
# print(results_df.to_string(index=False))

# ✅ Self-check:
# - Do train and test sets have similar defect rates?
# - Did all 5 models train successfully?
# - Are ROC-AUC scores above 0.5 (better than random)?
# - Do ensemble models (rf, gb) perform better than linear models?

### 🔍 Self-Check Exercise 2.1

Before moving on, verify:
- [ ] Stratified split maintains class distribution (~20% in both train and test)
- [ ] All 5 models trained without errors
- [ ] Results DataFrame has 5 rows (one per model)
- [ ] ROC-AUC scores are between 0.5 and 1.0
- [ ] You understand why stratification is critical for imbalanced data

💡 **Common Mistake**: Forgetting `stratify=y` leads to different class distributions in train/test, causing unreliable metrics.

### 🎯 Exercise 2.2: Visualize Model Performance (★★ Intermediate)

**Task**: Create visualizations to compare model performance across all metrics.

**Requirements**:
1. Create 4 bar charts (one for each metric: ROC-AUC, PR-AUC, F1, PWS)
2. Identify the best-performing model
3. Explain why certain models perform better for this task

**Hints**:
- Use `plt.subplots(2, 2)` to create a 2x2 grid
- Sort models by metric for easier comparison
- Color the best model differently

**TODO**: Complete the code below 👇

In [None]:
# TODO: Create 2x2 grid of bar charts
# fig, axes = plt.subplots(2, 2, figsize=(15, 10))
# metrics_to_plot = ['roc_auc', 'pr_auc', 'f1', 'pws']
# titles = ['ROC-AUC Score', 'PR-AUC Score', 'F1 Score', 'Prediction Within Spec (PWS)']

# for idx, (metric, title) in enumerate(zip(metrics_to_plot, titles)):
#     ax = axes[idx // 2, idx % 2]
#     # TODO: Sort results by current metric
#     sorted_results = results_df.sort_values(metric, ascending=False)
#     
#     # TODO: Create bar chart
#     # ax.bar(sorted_results['model'], sorted_results[metric])
#     # ax.set_title(title)
#     # ax.set_ylabel('Score')
#     # ax.set_xlabel('Model')
#     # ax.tick_params(axis='x', rotation=45)

# plt.tight_layout()
# plt.show()

# TODO: Identify best model
# best_model = results_df.loc[results_df['roc_auc'].idxmax(), 'model']
# print(f"\n🏆 Best Model: {best_model}")
# print(f"ROC-AUC: {results_df.loc[results_df['roc_auc'].idxmax(), 'roc_auc']:.3f}")

# ✅ Self-check:
# - Do all 4 metric charts show consistent rankings?
# - Is the best model an ensemble method?
# - Can you explain why ensemble methods often outperform single models?

### 🔍 Self-Check Exercise 2.2

Before moving on, verify:
- [ ] All 4 metric charts display correctly
- [ ] Model rankings are consistent across metrics
- [ ] You can explain why the best model performs well
- [ ] You understand the trade-offs (complexity vs performance)

💡 **Manufacturing Insight**: While Random Forest and Gradient Boosting often perform best, simpler models like Logistic Regression may be preferred in production if performance difference is small (easier to explain, faster inference, easier to debug).

📚 **See Solution**: Check `wafer_defect_solution.ipynb` Exercise 2 for complete implementation and discussion of model selection trade-offs.

In [None]:
# Visualize model comparison
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# ROC-AUC comparison
ax1 = axes[0, 0]
results_df.set_index('Model')['ROC-AUC'].plot(kind='bar', ax=ax1, color='skyblue')
ax1.set_title('ROC-AUC Comparison')
ax1.set_ylabel('ROC-AUC Score')
ax1.tick_params(axis='x', rotation=45)
ax1.grid(True, alpha=0.3)

# PR-AUC comparison
ax2 = axes[0, 1]
results_df.set_index('Model')['PR-AUC'].plot(kind='bar', ax=ax2, color='lightgreen')
ax2.set_title('PR-AUC Comparison')
ax2.set_ylabel('PR-AUC Score')
ax2.tick_params(axis='x', rotation=45)
ax2.grid(True, alpha=0.3)

# PWS comparison  
ax3 = axes[1, 0]
results_df.set_index('Model')['PWS'].plot(kind='bar', ax=ax3, color='salmon')
ax3.set_title('PWS (Prediction Within Spec) Comparison')
ax3.set_ylabel('PWS Score')
ax3.tick_params(axis='x', rotation=45)
ax3.grid(True, alpha=0.3)

# F1 Score comparison
ax4 = axes[1, 1]
results_df.set_index('Model')['F1'].plot(kind='bar', ax=ax4, color='gold')
ax4.set_title('F1 Score Comparison')
ax4.set_ylabel('F1 Score')
ax4.tick_params(axis='x', rotation=45)
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Find best model
best_model_idx = results_df['ROC-AUC'].idxmax()
best_model = results_df.loc[best_model_idx]
print(f"\n=== Best Performing Model ===")
print(f"Model: {best_model['Model']}")
print(f"ROC-AUC: {best_model['ROC-AUC']:.3f}")
print(f"PWS: {best_model['PWS']:.1%}")

## 3. Manufacturing-Specific Metrics Deep Dive

Let's explore metrics that matter in semiconductor manufacturing: PWS and cost-based optimization.

### 🎯 Exercise 3.1: Calculate Manufacturing Costs (★★★ Advanced)

**Task**: Calculate the financial impact of classification errors using real manufacturing costs.

**Requirements**:
1. Train the best model from Exercise 2 (likely Random Forest or Gradient Boosting)
2. Get predictions on the test set
3. Calculate confusion matrix (TP, TN, FP, FN)
4. Assign realistic costs:
   - False Positive (unnecessary inspection): $50
   - False Negative (missed defect): $200
5. Calculate total estimated loss and PWS

**Hints**:
- Use the best model from your earlier comparison
- Get predictions with `pipeline.predict(X_test)`
- Use `confusion_matrix(y_test, predictions)` from sklearn.metrics
- Total cost = (FP × $50) + (FN × $200)
- PWS = (TP + TN) / total_samples

**TODO**: Complete the code below 👇

In [None]:
# TODO: Train the best model (use model_type from Exercise 2)
# best_model_type = 'random_forest'  # or 'gradient_boosting'
# print(f"Training best model: {best_model_type}")
# pipeline = WaferDefectPipeline(model_type=best_model_type)
# pipeline.fit(X_train, y_train)

# TODO: Get predictions on test set
# predictions = pipeline.predict(X_test)

# TODO: Calculate confusion matrix
# from sklearn.metrics import confusion_matrix
# cm = confusion_matrix(y_test, predictions)
# tn, fp, fn, tp = cm.ravel()

# print("\n=== Confusion Matrix ===")
# print(f"True Negatives:  {tn}")
# print(f"False Positives: {fp}")
# print(f"False Negatives: {fn}")
# print(f"True Positives:  {tp}")

# TODO: Assign manufacturing costs
# FP_COST = 50   # Cost of unnecessary inspection
# FN_COST = 200  # Cost of missed defect (much higher!)

# TODO: Calculate total cost
# total_cost = (fp * FP_COST) + (fn * FN_COST)
# print(f"\n=== Financial Impact ===")
# print(f"False Positive Cost: ${fp * FP_COST:,}")
# print(f"False Negative Cost: ${fn * FN_COST:,}")
# print(f"Total Estimated Loss: ${total_cost:,}")

# TODO: Calculate PWS
# pws = (tp + tn) / len(y_test)
# print(f"\nPrediction Within Spec (PWS): {pws:.1%}")

# ✅ Self-check:
# - Is FN cost higher than FP cost? (It should be - missing defects is worse!)
# - Is PWS above 90%?
# - Can you explain why FN costs more in manufacturing?

### 🔍 Self-Check Exercise 3.1

Before moving on, verify:
- [ ] Confusion matrix calculated correctly (all 4 values: TN, FP, FN, TP)
- [ ] FN cost is 4x FP cost (reflects real manufacturing economics)
- [ ] Total cost calculated includes both FP and FN
- [ ] PWS metric makes sense (percentage of correct predictions)
- [ ] You understand why missing a defect (FN) costs more than a false alarm (FP)

💡 **Manufacturing Reality**: 
- **False Positive**: Wafer goes to unnecessary inspection ($50 in time/resources)
- **False Negative**: Defective wafer reaches customer → RMA, reputation damage, possible fab shutdown ($200+ in total impact)

This cost asymmetry drives threshold optimization!

### 🎯 Exercise 3.2: Optimize Decision Threshold (★★★ Advanced)

**Task**: Find the optimal classification threshold that minimizes total manufacturing cost.

**Requirements**:
1. Get probability predictions (not just binary 0/1)
2. Sweep thresholds from 0.1 to 0.9
3. For each threshold:
   - Convert probabilities to binary predictions
   - Calculate confusion matrix
   - Calculate total cost (FP × $50 + FN × $200)
   - Calculate precision, recall, F1, PWS
4. Find threshold that minimizes total cost
5. Compare to default 0.5 threshold

**Hints**:
- Use `pipeline.model.predict_proba(X_test)[:, 1]` to get probabilities  
- `(probabilities >= threshold).astype(int)` converts probs to binary
- Store results in a list of dictionaries for easy DataFrame creation
- Use `.idxmin()` to find row with minimum cost

**TODO**: Complete the code below 👇

In [None]:
# TODO: Get probability predictions
# probabilities = pipeline.model.predict_proba(X_test)[:, 1]

# TODO: Define threshold range
# thresholds = np.arange(0.1, 1.0, 0.05)  # 0.1, 0.15, 0.2, ..., 0.95

# TODO: Sweep thresholds and calculate metrics
# threshold_results = []
# FP_COST = 50
# FN_COST = 200

# for threshold in thresholds:
#     # TODO: Apply threshold to probabilities
#     pred_at_threshold = (probabilities >= threshold).astype(int)
#     
#     # TODO: Calculate confusion matrix
#     cm = confusion_matrix(y_test, pred_at_threshold)
#     tn, fp, fn, tp = cm.ravel()
#     
#     # TODO: Calculate costs
#     total_cost = (fp * FP_COST) + (fn * FN_COST)
#     
#     # TODO: Calculate metrics
#     precision = tp / (tp + fp) if (tp + fp) > 0 else 0
#     recall = tp / (tp + fn) if (tp + fn) > 0 else 0
#     f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
#     pws = (tp + tn) / len(y_test)
#     
#     # Store results
#     threshold_results.append({
#         'threshold': threshold,
#         'precision': precision,
#         'recall': recall,
#         'f1': f1,
#         'pws': pws,
#         'fp': fp,
#         'fn': fn,
#         'total_cost': total_cost
#     })

# TODO: Create DataFrame
# threshold_df = pd.DataFrame(threshold_results)

# TODO: Find optimal threshold (minimum cost)
# optimal_idx = threshold_df['total_cost'].idxmin()
# optimal_threshold = threshold_df.loc[optimal_idx, 'threshold']
# optimal_cost = threshold_df.loc[optimal_idx, 'total_cost']

# print("\n=== Threshold Optimization Results ===")
# print(f"Optimal Threshold: {optimal_threshold:.2f}")
# print(f"Optimal Cost: ${optimal_cost:,}")

# TODO: Compare to default threshold (0.5)
# default_idx = threshold_df[threshold_df['threshold'] == 0.5].index[0]
# default_cost = threshold_df.loc[default_idx, 'total_cost']
# cost_savings = default_cost - optimal_cost

# print(f"\nDefault Threshold (0.5) Cost: ${default_cost:,}")
# print(f"Cost Savings: ${cost_savings:,} ({cost_savings/default_cost:.1%})")

# ✅ Self-check:
# - Did you sweep at least 10 different thresholds?
# - Is optimal threshold different from 0.5?
# - Does optimizing save money compared to default?
# - Do you understand the precision/recall trade-off?

### 🔍 Self-Check Exercise 3.2

Before moving on, verify:
- [ ] You swept multiple thresholds (at least 10-20)
- [ ] Found threshold with minimum cost
- [ ] Optimal threshold is likely lower than 0.5 (favoring recall over precision)
- [ ] Cost savings are significant (usually 10-30%)
- [ ] You understand why lower threshold = higher recall = fewer missed defects

💡 **Key Insight**: Because FN costs 4x more than FP, we want HIGH RECALL (catch all defects) even if it means MORE FALSE POSITIVES. This shifts optimal threshold below 0.5!

📊 **Trade-off**: 
- **Higher threshold** (e.g., 0.7) → Higher precision, lower recall → Miss more defects (expensive!)
- **Lower threshold** (e.g., 0.3) → Lower precision, higher recall → More false alarms (cheaper!)

📚 **See Solution**: Check `wafer_defect_solution.ipynb` Exercise 3 for complete threshold optimization with visualizations.

In [None]:
# SOLUTION CODE - Try the exercises above first!
# This is reference code showing one way to solve the exercises

# Train the best performing model for detailed analysis
best_pipeline = WaferDefectPipeline(
    model_name='rf',  # Usually performs well
    handle_imbalance='class_weight'
)
best_pipeline.fit(X, y)

# Get predictions and probabilities
y_pred = best_pipeline.predict(X)
y_proba = best_pipeline.predict_proba(X)[:, 1]  # Probability of defect

# Analyze manufacturing costs at different thresholds
thresholds = np.arange(0.1, 0.9, 0.05)
threshold_analysis = []

print("=== Threshold Analysis for Manufacturing Optimization ===")
for threshold in thresholds:
    # Apply threshold
    y_pred_thresh = (y_proba >= threshold).astype(int)
    
    # Calculate metrics with manufacturing parameters
    metrics = WaferDefectPipeline.compute_metrics(
        y, y_pred_thresh, y_proba,
        cost_false_positive=10.0,  # Cost of scrapping good wafer
        cost_false_negative=100.0,  # Cost of shipping defective wafer
        tolerance=0.05
    )
    
    threshold_analysis.append({
        'Threshold': threshold,
        'Precision': metrics['Precision'],
        'Recall': metrics['Recall'],
        'F1': metrics['F1'],
        'PWS': metrics['PWS'],
        'Estimated_Loss': metrics['Estimated_Loss']
    })

threshold_df = pd.DataFrame(threshold_analysis)
print(threshold_df.head(10).round(3))

In [None]:
# Visualize threshold optimization
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Precision vs Recall vs Threshold
ax1 = axes[0, 0]
ax1.plot(threshold_df['Threshold'], threshold_df['Precision'], 'b-', label='Precision', linewidth=2)
ax1.plot(threshold_df['Threshold'], threshold_df['Recall'], 'r-', label='Recall', linewidth=2)
ax1.plot(threshold_df['Threshold'], threshold_df['F1'], 'g--', label='F1', linewidth=2)
ax1.set_xlabel('Decision Threshold')
ax1.set_ylabel('Score')
ax1.set_title('Precision-Recall vs Threshold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# PWS vs Threshold
ax2 = axes[0, 1]
ax2.plot(threshold_df['Threshold'], threshold_df['PWS'], 'purple', linewidth=2)
ax2.set_xlabel('Decision Threshold')
ax2.set_ylabel('PWS (Prediction Within Spec)')
ax2.set_title('Manufacturing PWS vs Threshold')
ax2.grid(True, alpha=0.3)

# Estimated Loss vs Threshold
ax3 = axes[1, 0]
ax3.plot(threshold_df['Threshold'], threshold_df['Estimated_Loss'], 'orange', linewidth=2)
ax3.set_xlabel('Decision Threshold')
ax3.set_ylabel('Estimated Loss ($)')
ax3.set_title('Manufacturing Cost vs Threshold')
ax3.grid(True, alpha=0.3)

# Find optimal threshold (minimum loss)
optimal_idx = threshold_df['Estimated_Loss'].idxmin()
optimal_threshold = threshold_df.loc[optimal_idx, 'Threshold']
optimal_loss = threshold_df.loc[optimal_idx, 'Estimated_Loss']

ax3.axvline(x=optimal_threshold, color='red', linestyle='--', alpha=0.7)
ax3.text(optimal_threshold + 0.05, optimal_loss, 
         f'Optimal: {optimal_threshold:.2f}', 
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))

# ROC Curve
from sklearn.metrics import roc_curve
fpr, tpr, _ = roc_curve(y, y_proba)
ax4 = axes[1, 1]
ax4.plot(fpr, tpr, 'b-', linewidth=2, label=f'ROC Curve')
ax4.plot([0, 1], [0, 1], 'k--', alpha=0.5, label='Random Classifier')
ax4.set_xlabel('False Positive Rate')
ax4.set_ylabel('True Positive Rate')
ax4.set_title('ROC Curve')
ax4.legend()
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\n=== Optimal Operating Point ===")
print(f"Optimal Threshold: {optimal_threshold:.3f}")
print(f"At this threshold:")
print(f"  Precision: {threshold_df.loc[optimal_idx, 'Precision']:.3f}")
print(f"  Recall: {threshold_df.loc[optimal_idx, 'Recall']:.3f}")
print(f"  PWS: {threshold_df.loc[optimal_idx, 'PWS']:.1%}")
print(f"  Estimated Loss: ${threshold_df.loc[optimal_idx, 'Estimated_Loss']:.2f}")

## 4. Model Deployment and CLI Usage

Now let's deploy your model for production use with the standardized CLI interface.

### 🎯 Exercise 4.1: Save Production Model (★★ Intermediate)

**Task**: Save your optimized model with metadata for production deployment.

**Requirements**:
1. Create a WaferDefectPipeline with optimal threshold from Exercise 3.2
2. Train it on the full training set
3. Save the model to disk using `.save()` method
4. Include metadata (model type, threshold, training date, performance metrics)

**Hints**:
- Use the optimal threshold you found in Exercise 3.2
- Use `pipeline.save(path)` to persist the model
- Include timestamp and key metrics in filename for tracking

**TODO**: Complete the code below 👇

In [None]:
# TODO: Create pipeline with optimal threshold
# optimal_threshold = 0.35  # Use your value from Exercise 3.2
# pipeline = WaferDefectPipeline(
#     model_type='random_forest',  # or your best model
#     threshold=optimal_threshold
# )

# TODO: Train on full training set
# pipeline.fit(X_train, y_train)

# TODO: Evaluate on test set to get final metrics
# final_metrics = pipeline.evaluate(X_test, y_test)
# print("\n=== Final Model Performance ===")
# print(f"ROC-AUC: {final_metrics['roc_auc']:.3f}")
# print(f"PWS: {final_metrics['pws']:.1%}")
# print(f"Optimal Threshold: {optimal_threshold:.2f}")

# TODO: Save model with descriptive filename
# from datetime import datetime
# timestamp = datetime.now().strftime("%Y%m%d")
# model_filename = f"wafer_defect_model_{timestamp}_roc{final_metrics['roc_auc']:.2f}.joblib"
# pipeline.save(Path(model_filename))
# print(f"\n✅ Model saved to: {model_filename}")
# print(f"File size: {Path(model_filename).stat().st_size / 1024:.1f} KB")

# ✅ Self-check:
# - Did the model save successfully?
# - Is the filename descriptive (includes date and performance)?
# - Can you locate the saved file on disk?

### 🔍 Self-Check Exercise 4.1

Before moving on, verify:
- [ ] Model saved successfully (check file exists on disk)
- [ ] Filename includes date and performance metrics
- [ ] You can explain why including metadata in filename helps production tracking
- [ ] File size is reasonable (typically <1MB for small models)

💡 **Production Tip**: Always include version info (date, metrics, threshold) in model filenames. This enables:
- Easy rollback if new model underperforms
- A/B testing between model versions
- Audit trail for regulatory compliance

### 🎯 Exercise 4.2: Test Model Loading and CLI Usage (★★ Intermediate)

**Task**: Verify model persistence by loading and testing your saved model, then use the CLI.

**Requirements**:
1. Load the saved model using `.load()` static method
2. Verify predictions match the original model
3. Test CLI commands:
   - Train a new model
   - Evaluate saved model
   - Make predictions on new wafer
4. Generate example CLI commands for your production team

**Hints**:
- Use `WaferDefectPipeline.load(path)` to load
- Compare loaded model predictions to original
- CLI uses subcommands: `train`, `evaluate`, `predict`

**TODO**: Complete the code below 👇

In [None]:
# TODO: Load the saved model
# loaded_pipeline = WaferDefectPipeline.load(Path(model_filename))
# print(f"✅ Model loaded from: {model_filename}")

# TODO: Verify predictions match original
# original_preds = pipeline.predict(X_test[:5])
# loaded_preds = loaded_pipeline.predict(X_test[:5])

# print("\n=== Round-Trip Verification ===")
# print(f"Original predictions: {original_preds}")
# print(f"Loaded predictions:   {loaded_preds}")
# print(f"Match: {np.array_equal(original_preds, loaded_preds)}")

# TODO: Test prediction on a single wafer
# sample_wafer = X_test.iloc[0:1]  # Get first wafer as DataFrame
# prediction = loaded_pipeline.predict(sample_wafer)[0]
# probability = loaded_pipeline.model.predict_proba(sample_wafer)[0, 1]

# print(f"\n=== Sample Prediction ===")
# print(f"Prediction: {'DEFECTIVE' if prediction == 1 else 'GOOD'}")
# print(f"Defect Probability: {probability:.1%}")
# print(f"Decision Threshold: {optimal_threshold:.2f}")

# ✅ Self-check:
# - Do loaded predictions match original?
# - Can you predict on new samples?
# - Do you understand the probability vs binary prediction?

### 🔍 Self-Check Exercise 4.2

Before moving on, verify:
- [ ] Model loads successfully from disk
- [ ] Predictions from loaded model match original (round-trip test passes)
- [ ] Can predict on individual wafers
- [ ] Understand difference between probability and binary prediction

💡 **CLI Commands for Production**:

```bash
# Train a new model with SMOTE for class imbalance
python wafer_defect_pipeline.py train --data wafer_data.csv --model-type random_forest --use-smote --output-model model.joblib

# Evaluate saved model on test data
python wafer_defect_pipeline.py evaluate --model model.joblib --data test_data.csv --output-json metrics.json

# Make predictions on new wafers
python wafer_defect_pipeline.py predict --model model.joblib --data new_wafers.csv --output predictions.json

# Train with high-recall constraint (minimize false negatives)
python wafer_defect_pipeline.py train --data wafer_data.csv --model-type gradient_boosting --min-recall 0.95 --output-model high_recall_model.joblib
```

📚 **See Solution**: Check `wafer_defect_solution.ipynb` Exercise 4 for complete deployment workflow and production checklist.

In [None]:
# SOLUTION CODE - Try the exercises above first!
# This is reference code showing how to deploy the model

# Save the optimized model
model_path = Path('best_wafer_defect_model.joblib')

# Create pipeline with optimal threshold
production_pipeline = WaferDefectPipeline(
    model_name='rf',
    handle_imbalance='class_weight'
)
production_pipeline.fit(X, y)

# Optimize threshold for minimum cost
production_pipeline.optimize_threshold(
    X, y, 
    min_precision=0.8,  # Require at least 80% precision
    cost_false_positive=10.0,
    cost_false_negative=100.0
)

# Save the model
production_pipeline.save(model_path)
print(f"Model saved to: {model_path}")
print(f"Optimized threshold: {production_pipeline.threshold:.3f}")

In [None]:
# Demonstrate CLI usage (these would be run from command line)
print("=== CLI Usage Examples ===")
print("\nTo train a new model:")
print("python wafer_defect_pipeline.py train --dataset synthetic_wafer --model rf --min-precision 0.8 --save model.joblib")

print("\nTo evaluate an existing model:")
print("python wafer_defect_pipeline.py evaluate --model-path model.joblib --dataset synthetic_wafer")

print("\nTo make predictions:")
prediction_example = {
    "center_density": 0.12,
    "edge_density": 0.05,
    "pattern_uniformity": 0.85,
    "thickness_variation": 0.03
}
print(f'python wafer_defect_pipeline.py predict --model-path model.joblib --input-json \'{prediction_example}\'')

# Simulate a prediction
print("\n=== Live Prediction Example ===")
sample_wafer = X.iloc[0:1]  # Take first wafer
prediction = production_pipeline.predict(sample_wafer)
probability = production_pipeline.predict_proba(sample_wafer)[0, 1]

print(f"Sample wafer features: {sample_wafer.iloc[0].to_dict()}")
print(f"Prediction: {'DEFECTIVE' if prediction[0] == 1 else 'GOOD'}")
print(f"Defect probability: {probability:.3f}")
print(f"Actual label: {'DEFECTIVE' if y[0] == 1 else 'GOOD'}")

## 5. Key Takeaways

🎉 **Congratulations!** You've completed the wafer defect classification tutorial!

### Manufacturing Insights
1. **Threshold Optimization**: The optimal decision threshold balances false positive costs (scrapping good wafers) vs false negative costs (shipping defective wafers)
   - In manufacturing, FN costs typically 4-10x more than FP costs
   - Optimal thresholds are often < 0.5 to favor recall (catch more defects)
   
2. **PWS Metric**: Prediction Within Spec measures how well predictions align with manufacturing tolerance requirements
   - Essential for quality control systems
   - Complements standard ML metrics (ROC-AUC, F1)
   
3. **Cost-Aware ML**: Manufacturing decisions should consider economic impact, not just accuracy
   - $50 FP cost vs $200 FN cost drives different optimization strategies
   - Threshold optimization can reduce costs by 10-30%

### Technical Insights
1. **Model Selection**: Random Forest and Gradient Boosting typically perform well for semiconductor defect detection
   - Handle non-linear relationships between features
   - Robust to feature scaling differences
   - Provide feature importance for root cause analysis
2. **Imbalance Handling**: Class weights and SMOTE help address the natural imbalance in defect rates (typically 5-20%)
   - SMOTE can improve recall but may introduce synthetic noise
   - Class weights simpler and often sufficient
   
3. **Feature Engineering**: Process parameters with high correlation to defects are most valuable
   - Focus on features with statistical significance (p < 0.05)
   - Domain knowledge guides feature selection

### Production Deployment
1. **Standardized CLI**: Consistent interface across all semiconductor ML projects
   - Subcommands: train, evaluate, predict
   - JSON output for system integration
   
2. **Model Persistence**: Save/load functionality preserves optimal thresholds and preprocessing
   - Include metadata (date, threshold, performance) in filenames
   - Enable model versioning and rollback
   
3. **Manufacturing Integration**: JSON output enables integration with MES/ERP systems
   - Real-time predictions via API
   - Batch processing for historical analysis

## 📚 Additional Resources

### Solution Notebook
**`wafer_defect_solution.ipynb`** contains complete solutions for all exercises:
- Exercise 1: Data generation with detailed exploration
- Exercise 2: 5-model comparison with ROC curve visualization
- Exercise 3: Cost optimization with threshold analysis
- Exercise 4: Production deployment with 40+ item checklist

### Grading Script
Run `evaluate_submission.py` to check your work:
```bash
python evaluate_submission.py --notebook-path wafer_defect_tutorial.ipynb
```

Scoring rubric (100 points total):
- Exercise 1 (Data Exploration): 20 points
- Exercise 2 (Model Training): 30 points
- Exercise 3 (Manufacturing Metrics): 25 points
- Exercise 4 (CLI Usage): 15 points
- Documentation/Comments: 10 points

## Next Steps

To extend this baseline classifier:
1. **Real Data Integration**: Connect to actual wafer map datasets (WM-811K dataset in `datasets/wm811k/`)
2. **Deep Learning**: Implement CNN models for spatial pattern recognition (see Module 6.2)
3. **Feature Engineering**: Add domain-specific features (spatial statistics, pattern descriptors)
4. **Model Ensemble**: Combine multiple model predictions for improved robustness
5. **Real-time Deployment**: Create API wrapper for production manufacturing lines (FastAPI template in `infrastructure/api/`)
6. **Drift Monitoring**: Track model performance degradation over time (see Module 5.2)

### Related Projects
- **Yield Regression** (`yield_regression/`): Predict wafer yield from process parameters
- **Equipment Drift Monitor** (`equipment_drift_monitor/`): Time series anomaly detection
- **Die Defect Segmentation** (`die_defect_segmentation/`): Computer vision for defect localization

In [None]:
# Clean up temporary files
if model_path.exists():
    model_path.unlink()
    print("✅ Cleaned up temporary model file")

print("\n" + "="*60)
print("🎉 TUTORIAL COMPLETED SUCCESSFULLY!")
print("="*60)
print("\n📊 What You've Learned:")
print("  ✓ Generate and explore semiconductor manufacturing data")
print("  ✓ Train and compare 5 classification models")
print("  ✓ Calculate manufacturing-specific metrics (PWS, costs)")
print("  ✓ Optimize decision thresholds for cost reduction")
print("  ✓ Deploy models with CLI interface")
print("\n📚 Next: Check wafer_defect_solution.ipynb for complete solutions")
print("🧪 Grade: Run evaluate_submission.py to check your work")
print("🚀 Ready: Build production wafer defect classifiers!")