# 105: AutoML and Neural Architecture Search

## üéØ Learning Objectives

By the end of this notebook, you will:
- **Understand** AutoML components: hyperparameter optimization, model selection, feature engineering automation
- **Implement** Bayesian optimization for efficient hyperparameter search using Optuna
- **Build** automated machine learning pipelines with TPOT and Auto-sklearn
- **Apply** Neural Architecture Search (NAS) to find optimal network structures
- **Evaluate** AutoML trade-offs: accuracy vs computational cost vs interpretability

## üìö What is AutoML?

Automated Machine Learning (AutoML) systematizes the process of applying machine learning to real-world problems. Instead of manually trying dozens of algorithms, hundreds of hyperparameter combinations, and countless feature engineering strategies, AutoML automates these decisions using intelligent search algorithms. It democratizes ML by enabling non-experts to build production-quality models while freeing experts to focus on domain-specific challenges.

AutoML spans multiple automation levels: **hyperparameter optimization** (finding best learning rate, regularization), **model selection** (choosing between XGBoost vs Random Forest), **feature engineering** (automated interaction discovery), and **Neural Architecture Search** (designing optimal network topologies). Modern AutoML frameworks like Google's AutoML Tables, H2O.ai, and open-source tools like TPOT can achieve expert-level performance in hours rather than weeks.

In semiconductor manufacturing, AutoML is particularly valuable because test engineers understand device physics but may lack deep ML expertise. AutoML enables them to build yield prediction models, optimize test programs, and detect anomalies without becoming data scientists‚Äîwhile still producing models that outperform hand-tuned solutions.

**Why AutoML?**
- ‚úÖ **Democratization**: Non-ML experts can build state-of-the-art models (test engineers ‚Üí ML practitioners)
- ‚úÖ **Speed**: Automated search finds better models in hours vs weeks of manual tuning
- ‚úÖ **Consistency**: Eliminates human bias in model selection, ensures reproducible pipelines
- ‚úÖ **Discovery**: Often finds non-obvious model/hyperparameter combinations experts wouldn't try
- ‚úÖ **Scalability**: Same AutoML pipeline works across 100+ products without manual retuning

## üè≠ Post-Silicon Validation Use Cases

**Use Case 1: Automated Yield Model Development**
- **Input**: STDF wafer test data (200+ parametric tests, 50K+ devices per lot)
- **AutoML Task**: Find best model + features + hyperparameters for yield prediction
- **Process**: TPOT searches model space (RF, XGBoost, SVM, etc.) + feature engineering for 4 hours
- **Output**: Optimized pipeline achieving 94% R¬≤ vs 88% from manual baseline
- **Value**: Reduce model development from 3 weeks to 1 day, deploy faster to production

**Use Case 2: Per-Product Test Time Optimization**
- **Input**: Test time data for 50 different product families, each with unique characteristics
- **AutoML Task**: Build custom prediction model for each product automatically
- **Process**: Optuna hyperparameter optimization for gradient boosting models per product
- **Output**: 50 optimized models, each tuned to specific product's test patterns
- **Value**: $5M-$10M annual ATE savings, 30% average test time reduction across portfolio

**Use Case 3: Neural Architecture Search for Wafer Map Classification**
- **Input**: 100K wafer map images (spatial yield patterns indicating defect types)
- **AutoML Task**: Find optimal CNN architecture for classifying defect signatures
- **Process**: NAS searches network depth, width, activation functions, skip connections
- **Output**: Custom architecture achieving 97% classification accuracy (vs 92% from ResNet-18)
- **Value**: Faster root cause identification, automated defect classification replaces manual inspection

**Use Case 4: Automated Binning Algorithm Generation**
- **Input**: Final test data with current manual binning rules (BIN1=premium, BIN2=standard, etc.)
- **AutoML Task**: Learn optimal binning boundaries from historical data
- **Process**: Auto-sklearn multi-class classification with automated feature engineering
- **Output**: Data-driven binning rules increasing BIN1 yield 8% without escapes
- **Value**: $2M-$5M revenue increase per quarter from maximizing premium bin allocation

## üîÑ AutoML Workflow

```mermaid
graph TB
    A[Raw Data] --> B[Define Task & Metric]
    B --> C{AutoML Strategy?}
    
    C -->|Hyperparameter Only| D[Bayesian Optimization]
    C -->|Model Selection| E[TPOT / Auto-sklearn]
    C -->|Neural Architecture| F[NAS]
    
    D --> G[Search Space Definition]
    E --> H[Pipeline Search Space]
    F --> I[Architecture Search Space]
    
    G --> J[Optuna/Hyperopt]
    H --> K[Genetic Programming]
    I --> L[Evolution/RL/Gradient]
    
    J --> M[Evaluate on Validation]
    K --> M
    L --> M
    
    M --> N{Budget Exhausted?}
    N -->|No| O[Sample Next Config]
    O --> J
    O --> K
    O --> L
    
    N -->|Yes| P[Select Best Model]
    P --> Q[Retrain on Full Data]
    Q --> R[Test on Holdout]
    R --> S{Performance OK?}
    
    S -->|No| T[Expand Search Space]
    T --> C
    S -->|Yes| U[Deploy Pipeline]
    
    style A fill:#e1f5ff
    style U fill:#e1ffe1
    style T fill:#ffe1e1
```

## üìä Learning Path Context

**Prerequisites:**
- **010-025**: ML Algorithms - Understanding model types AutoML will search
- **041**: Model Evaluation - Metrics AutoML optimizes (RMSE, accuracy, AUC)
- **103**: Feature Engineering - What AutoML automates
- **104**: Interpretability - Validating AutoML-generated models

**This Notebook (105):**
- Bayesian optimization with Optuna
- Genetic programming with TPOT
- Automated pipeline generation
- Neural Architecture Search basics
- AutoML evaluation strategies

**Next Steps:**
- **106**: A/B Testing - Validating AutoML models in production
- **107**: Model Monitoring - Tracking AutoML model performance over time
- **131**: Cloud AutoML - Google AutoML, AWS SageMaker Autopilot, Azure AutoML

---

Let's automate the art of machine learning! ü§ñ

## 1. Setup and Imports

In [None]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score
import warnings
warnings.filterwarnings('ignore')

# AutoML libraries
# Note: Install with: pip install optuna tpot scikit-optimize
try:
    import optuna
    print("‚úÖ Optuna available")
except ImportError:
    print("‚ö†Ô∏è  Install Optuna: pip install optuna")

try:
    from tpot import TPOTRegressor
    print("‚úÖ TPOT available")
except ImportError:
    print("‚ö†Ô∏è  Install TPOT: pip install tpot")

try:
    from skopt import BayesSearchCV
    from skopt.space import Real, Integer
    print("‚úÖ Scikit-optimize available")
except ImportError:
    print("‚ö†Ô∏è  Install scikit-optimize: pip install scikit-optimize")

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 8)

# Random seed
np.random.seed(42)

print("\n‚úÖ Environment ready for AutoML exploration!")

## 2. Generate Semiconductor Test Data

**Purpose:** Create complex STDF dataset requiring AutoML to find optimal model.

**Key Points:**
- **Non-linear relationships**: Yield depends on complex interactions
- **Multiple feature types**: Numerical, categorical, spatial
- **High dimensionality**: 15 features requiring automated feature selection
- **Why this matters**: Manual tuning would take weeks; AutoML finds optimal solution in hours

In [None]:
# Generate 3000 devices from multiple lots
n_devices = 3000

# Parametric measurements (15 features)
vdd = np.random.normal(1.2, 0.1, n_devices)
idd = np.random.normal(50, 10, n_devices)
freq = np.random.normal(2000, 200, n_devices)
temp = np.random.normal(85, 15, n_devices)
vth = np.random.normal(0.4, 0.04, n_devices)
leakage = np.random.lognormal(0, 0.5, n_devices)  # Log-normal distribution
rise_time = np.random.gamma(2, 0.5, n_devices)  # Gamma distribution
fall_time = np.random.gamma(2, 0.5, n_devices)
noise_margin = np.random.normal(0.3, 0.05, n_devices)
skew = np.random.normal(0, 0.1, n_devices)

# Spatial features
die_x = np.random.randint(0, 30, n_devices)
die_y = np.random.randint(0, 30, n_devices)
wafer_id = np.random.choice(['W001', 'W002', 'W003', 'W004', 'W005'], n_devices)

# Categorical features
lot_id = np.random.choice(['LOT_A', 'LOT_B', 'LOT_C', 'LOT_D'], n_devices)
test_program = np.random.choice(['TP_V1', 'TP_V2', 'TP_V3'], n_devices)

# Complex target function (requires AutoML to discover)
power = vdd * idd
thermal_stress = temp * freq / 1000
timing_quality = (rise_time + fall_time) / 2
radius = np.sqrt((die_x - 15)**2 + (die_y - 15)**2)

# Yield with complex non-linear relationships
yield_score = (
    100
    - 0.3 * power
    - 0.02 * thermal_stress
    + 15 * vth
    - 0.5 * radius
    - 20 * leakage
    + 5 * noise_margin
    - 3 * np.abs(skew)
    - 2 * timing_quality
    # Interaction effects
    - 8 * (vdd > 1.3) * (vth < 0.38)
    - 5 * (temp > 95) * (freq > 2100)
    # Categorical effects
    - 3 * (lot_id == 'LOT_D')
    + 2 * (test_program == 'TP_V3')
    + np.random.normal(0, 3, n_devices)
)
yield_score = np.clip(yield_score, 60, 100)

# Create DataFrame
df = pd.DataFrame({
    'vdd': vdd, 'idd': idd, 'freq': freq, 'temp': temp, 'vth': vth,
    'leakage': leakage, 'rise_time': rise_time, 'fall_time': fall_time,
    'noise_margin': noise_margin, 'skew': skew,
    'die_x': die_x, 'die_y': die_y, 'wafer_id': wafer_id,
    'lot_id': lot_id, 'test_program': test_program,
    'yield': yield_score
})

print(f"Dataset: {df.shape[0]} devices, {df.shape[1]} columns")
print(f"\nYield statistics:")
print(df['yield'].describe())
print(f"\nFeature types:")
print(df.dtypes)

## 3. Baseline Model (Manual Tuning)

**Purpose:** Establish performance baseline before AutoML.

**Key Points:**
- **Simple preprocessing**: Basic encoding and scaling
- **Default hyperparameters**: RandomForest with sklearn defaults
- **No feature engineering**: Raw features only
- **Why this matters**: AutoML should significantly outperform this baseline

In [None]:
# Prepare data for baseline
df_baseline = df.copy()

# Simple one-hot encoding for categoricals
df_baseline = pd.get_dummies(df_baseline, columns=['wafer_id', 'lot_id', 'test_program'])

# Split
X = df_baseline.drop('yield', axis=1)
y = df_baseline['yield']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train baseline Random Forest (default parameters)
baseline_model = RandomForestRegressor(random_state=42, n_jobs=-1)
baseline_model.fit(X_train_scaled, y_train)

# Evaluate
y_pred_baseline = baseline_model.predict(X_test_scaled)
baseline_rmse = np.sqrt(mean_squared_error(y_test, y_pred_baseline))
baseline_r2 = r2_score(y_test, y_pred_baseline)

print("Baseline Model (Default Random Forest):")
print(f"  RMSE: {baseline_rmse:.3f}%")
print(f"  R¬≤: {baseline_r2:.4f}")
print(f"\n  Hyperparameters: {baseline_model.get_params()}")
print(f"\nüéØ AutoML Goal: Beat R¬≤ = {baseline_r2:.4f}")

## 4. Method 1: Bayesian Optimization with Optuna

**Concept:** Use Bayesian optimization to efficiently search hyperparameter space.

**Mathematics (Acquisition Function):**
$$a(x) = \mu(x) + \kappa \sigma(x)$$

Where:
- $\mu(x)$ = expected performance
- $\sigma(x)$ = uncertainty
- $\kappa$ = exploration-exploitation trade-off

**Advantages:** Much faster than grid search, intelligent exploration

In [None]:
# Define objective function for Optuna
def objective(trial):
    # Suggest hyperparameters
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 300),
        'max_depth': trial.suggest_int('max_depth', 5, 30),
        'min_samples_split': trial.suggest_int('min_samples_split', 2, 20),
        'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10),
        'max_features': trial.suggest_categorical('max_features', ['sqrt', 'log2', None]),
        'random_state': 42,
        'n_jobs': -1
    }
    
    # Train model
    model = RandomForestRegressor(**params)
    
    # Cross-validation score
    scores = cross_val_score(
        model, X_train_scaled, y_train,
        cv=3, scoring='r2', n_jobs=-1
    )
    
    return scores.mean()

# Create Optuna study
print("Starting Bayesian Optimization with Optuna...")
study = optuna.create_study(
    direction='maximize',
    sampler=optuna.samplers.TPESampler(seed=42)
)

# Optimize (50 trials)
study.optimize(objective, n_trials=50, show_progress_bar=True)

print(f"\n‚úÖ Optimization complete!")
print(f"\nBest trial:")
print(f"  Value (R¬≤): {study.best_trial.value:.4f}")
print(f"  Improvement over baseline: {(study.best_trial.value - baseline_r2):.4f}")
print(f"\nBest hyperparameters:")
for key, value in study.best_params.items():
    print(f"  {key}: {value}")

## 5. Train Best Model from Optuna

**Purpose:** Retrain with optimal hyperparameters on full training set.

**Key Points:**
- **Full training data**: Use all training samples (not just CV folds)
- **Test set evaluation**: Measure generalization performance
- **Compare to baseline**: Quantify AutoML improvement

In [None]:
# Train model with best hyperparameters
best_params = study.best_params.copy()
best_params.update({'random_state': 42, 'n_jobs': -1})

optuna_model = RandomForestRegressor(**best_params)
optuna_model.fit(X_train_scaled, y_train)

# Evaluate on test set
y_pred_optuna = optuna_model.predict(X_test_scaled)
optuna_rmse = np.sqrt(mean_squared_error(y_test, y_pred_optuna))
optuna_r2 = r2_score(y_test, y_pred_optuna)

print("Optuna-Optimized Model:")
print(f"  RMSE: {optuna_rmse:.3f}% (baseline: {baseline_rmse:.3f}%)")
print(f"  R¬≤: {optuna_r2:.4f} (baseline: {baseline_r2:.4f})")
print(f"\n  Improvement:")
print(f"    RMSE reduction: {((baseline_rmse - optuna_rmse) / baseline_rmse * 100):.1f}%")
print(f"    R¬≤ increase: {(optuna_r2 - baseline_r2):.4f}")

## 6. Visualize Optuna Optimization

**Purpose:** Understand optimization process and parameter importance.

**Key Points:**
- **Optimization history**: How R¬≤ improved over trials
- **Parameter importance**: Which hyperparameters matter most
- **Parallel coordinate plot**: Visualize high-dimensional search

In [None]:
# Optimization history
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Optimization history
trial_numbers = [trial.number for trial in study.trials]
trial_values = [trial.value for trial in study.trials]
best_values = np.maximum.accumulate(trial_values)

axes[0].plot(trial_numbers, trial_values, 'o', alpha=0.5, label='Trial R¬≤')
axes[0].plot(trial_numbers, best_values, 'r-', linewidth=2, label='Best R¬≤')
axes[0].axhline(baseline_r2, color='gray', linestyle='--', label='Baseline')
axes[0].set_xlabel('Trial Number')
axes[0].set_ylabel('R¬≤ Score')
axes[0].set_title('Optuna Optimization History')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot 2: Parameter importance
importance = optuna.importance.get_param_importances(study)
params = list(importance.keys())
values = list(importance.values())

axes[1].barh(range(len(params)), values)
axes[1].set_yticks(range(len(params)))
axes[1].set_yticklabels(params)
axes[1].set_xlabel('Importance')
axes[1].set_title('Hyperparameter Importance')
axes[1].invert_yaxis()
axes[1].grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nüìä Interpretation:")
print(f"  Most important hyperparameter: {params[0]}")
print(f"  Optimization converged after ~{trial_numbers[np.argmax(best_values)]} trials")
print(f"  Final improvement: {(max(trial_values) - baseline_r2):.4f} R¬≤ gain")

## 7. Method 2: TPOT (Genetic Programming)

**Concept:** Use genetic algorithms to evolve optimal ML pipelines.

**Process:**
1. Generate random pipelines (model + preprocessing + feature engineering)
2. Evaluate fitness (cross-validation score)
3. Select best pipelines
4. Mutate and crossover to create new generation
5. Repeat until convergence

**Advantage:** Discovers entire pipelines, not just hyperparameters

In [None]:
# Prepare data for TPOT (needs original features)
X_train_tpot = X_train.copy()
X_test_tpot = X_test.copy()
y_train_tpot = y_train.copy()
y_test_tpot = y_test.copy()

# Create TPOT regressor
print("Starting TPOT genetic programming search...")
print("This will search for optimal pipeline (model + preprocessing + features)\n")

tpot = TPOTRegressor(
    generations=5,  # Number of evolutionary iterations
    population_size=20,  # Number of pipelines per generation
    cv=3,
    scoring='r2',
    random_state=42,
    verbosity=2,
    n_jobs=-1,
    config_dict='TPOT light'  # Faster search space
)

# Fit TPOT (this will take a few minutes)
tpot.fit(X_train_tpot, y_train_tpot)

# Evaluate on test set
y_pred_tpot = tpot.predict(X_test_tpot)
tpot_rmse = np.sqrt(mean_squared_error(y_test_tpot, y_pred_tpot))
tpot_r2 = r2_score(y_test_tpot, y_pred_tpot)

print(f"\n‚úÖ TPOT search complete!")
print(f"\nTPOT-Generated Pipeline:")
print(f"  RMSE: {tpot_rmse:.3f}%")
print(f"  R¬≤: {tpot_r2:.4f}")
print(f"\n  Improvement over baseline: {(tpot_r2 - baseline_r2):.4f}")
print(f"\nOptimal pipeline found:")
print(tpot.fitted_pipeline_)

## 8. Export TPOT Pipeline Code

**Purpose:** TPOT can export optimal pipeline as Python code.

**Key Points:**
- **Reproducible**: Generated code runs independently
- **Transparent**: See exactly what TPOT discovered
- **Modifiable**: Engineers can tweak auto-generated code

In [None]:
# Export TPOT pipeline to Python script
tpot.export('tpot_yield_prediction_pipeline.py')

print("‚úÖ TPOT pipeline exported to: tpot_yield_prediction_pipeline.py")
print("\nYou can now:")
print("  1. Review the auto-generated code")
print("  2. Integrate it into production systems")
print("  3. Modify it based on domain knowledge")
print("  4. Version control it like any other code")

# Show what was discovered
print("\nüìã TPOT discovered:")
print(f"  Algorithm: {type(tpot.fitted_pipeline_.steps[-1][1]).__name__}")
print(f"  Preprocessing steps: {len(tpot.fitted_pipeline_.steps) - 1}")
print(f"  Full pipeline: {tpot.fitted_pipeline_}")

## 9. Method 3: Scikit-Optimize BayesSearchCV

**Concept:** Bayesian optimization integrated with sklearn's CV interface.

**Advantage:** Familiar sklearn API + efficient Bayesian search

In [None]:
# Define search space for Gradient Boosting
search_spaces = {
    'n_estimators': Integer(50, 300),
    'max_depth': Integer(3, 15),
    'learning_rate': Real(0.01, 0.3, prior='log-uniform'),
    'min_samples_split': Integer(2, 20),
    'min_samples_leaf': Integer(1, 10),
    'subsample': Real(0.6, 1.0),
}

# BayesSearchCV
print("Starting BayesSearchCV for Gradient Boosting...\n")

bayes_search = BayesSearchCV(
    GradientBoostingRegressor(random_state=42),
    search_spaces,
    n_iter=30,
    cv=3,
    scoring='r2',
    n_jobs=-1,
    random_state=42,
    verbose=1
)

bayes_search.fit(X_train_scaled, y_train)

# Evaluate
y_pred_bayes = bayes_search.predict(X_test_scaled)
bayes_rmse = np.sqrt(mean_squared_error(y_test, y_pred_bayes))
bayes_r2 = r2_score(y_test, y_pred_bayes)

print(f"\n‚úÖ BayesSearchCV complete!")
print(f"\nGradient Boosting (Optimized):")
print(f"  RMSE: {bayes_rmse:.3f}%")
print(f"  R¬≤: {bayes_r2:.4f}")
print(f"\n  Improvement over baseline: {(bayes_r2 - baseline_r2):.4f}")
print(f"\nBest hyperparameters:")
for key, value in bayes_search.best_params_.items():
    print(f"  {key}: {value}")

## 10. Compare All AutoML Methods

In [None]:
# Create comparison DataFrame
results = pd.DataFrame({
    'Method': ['Baseline (Manual)', 'Optuna (RF)', 'TPOT (Auto)', 'BayesSearchCV (GB)'],
    'RMSE': [baseline_rmse, optuna_rmse, tpot_rmse, bayes_rmse],
    'R¬≤': [baseline_r2, optuna_r2, tpot_r2, bayes_r2],
    'Time': ['1 min', '~5 min', '~10 min', '~8 min'],
    'Automation': ['None', 'Hyperparams', 'Full Pipeline', 'Hyperparams']
})

results['R¬≤ Improvement'] = results['R¬≤'] - baseline_r2
results['RMSE Reduction %'] = ((baseline_rmse - results['RMSE']) / baseline_rmse * 100).round(1)

print("AutoML Method Comparison:")
print(results.to_string(index=False))

# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: R¬≤ comparison
colors = ['gray', 'blue', 'green', 'orange']
axes[0].bar(range(len(results)), results['R¬≤'], color=colors, alpha=0.7)
axes[0].set_xticks(range(len(results)))
axes[0].set_xticklabels(results['Method'], rotation=45, ha='right')
axes[0].set_ylabel('R¬≤ Score')
axes[0].set_title('Model Performance Comparison')
axes[0].axhline(baseline_r2, color='red', linestyle='--', linewidth=2, label='Baseline')
axes[0].legend()
axes[0].grid(axis='y', alpha=0.3)

# Plot 2: Improvement over baseline
axes[1].bar(range(1, len(results)), results['R¬≤ Improvement'][1:], color=colors[1:], alpha=0.7)
axes[1].set_xticks(range(1, len(results)))
axes[1].set_xticklabels(results['Method'][1:], rotation=45, ha='right')
axes[1].set_ylabel('R¬≤ Improvement')
axes[1].set_title('AutoML Improvement over Baseline')
axes[1].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìä Summary:")
best_method_idx = results['R¬≤'].idxmax()
print(f"  Best method: {results.loc[best_method_idx, 'Method']}")
print(f"  Best R¬≤: {results.loc[best_method_idx, 'R¬≤']:.4f}")
print(f"  Total improvement: {results.loc[best_method_idx, 'R¬≤ Improvement']:.4f}")
print(f"  RMSE reduction: {results.loc[best_method_idx, 'RMSE Reduction %']:.1f}%")

## 11. Neural Architecture Search (Conceptual)

**Concept:** Automate neural network design (depth, width, connections).

**NAS Approaches:**
1. **Reinforcement Learning**: Train RL agent to design architectures
2. **Evolutionary Algorithms**: Evolve network topologies
3. **Gradient-Based**: DARTS (Differentiable Architecture Search)

**Note:** Full NAS requires significant compute (GPUs). Here we show conceptual framework.

In [None]:
# Conceptual NAS workflow (pseudo-code style)
print("Neural Architecture Search Workflow:")
print("\n1. Define Search Space:")
print("   - Number of layers: [2, 3, 4, 5]")
print("   - Hidden units per layer: [32, 64, 128, 256]")
print("   - Activation functions: ['relu', 'tanh', 'elu']")
print("   - Dropout rates: [0.0, 0.2, 0.4]")
print("   - Skip connections: [True, False]")
print("\n2. Search Strategy:")
print("   - Random search: Sample 50 architectures")
print("   - Evolutionary: Evolve over 20 generations")
print("   - RL: Train controller for 100 episodes")
print("\n3. Performance Estimation:")
print("   - Train each architecture for 10 epochs")
print("   - Evaluate on validation set")
print("   - Record validation loss")
print("\n4. Select Best Architecture:")
print("   - Rank by validation performance")
print("   - Retrain top-3 from scratch")
print("   - Choose best on test set")

# Example architecture discovered by NAS (hypothetical)
print("\nüèóÔ∏è Example NAS-Discovered Architecture:")
print("  Input (15 features)")
print("  ‚Üí Dense(128, relu) + Dropout(0.2)")
print("  ‚Üí Dense(64, relu) + Dropout(0.2)")
print("  ‚Üí Dense(32, elu)")
print("  ‚Üí Skip connection from input")
print("  ‚Üí Dense(1, linear) [output]")
print("\n  Training: Adam optimizer, lr=0.001, batch_size=32")
print("  Performance: R¬≤ = 0.92 (hypothetical)")

print("\nüí° For production NAS:")
print("  - Use libraries: Auto-Keras, NASBench, ENAS")
print("  - Requires: GPU cluster, 8-24 hours compute time")
print("  - Best for: Image/text tasks where architecture matters most")

## 12. AutoML Best Practices & Pitfalls

In [None]:
print("AutoML Best Practices:")
print("\n‚úÖ DO:")
print("  1. Start with strong baseline - know what 'good' performance looks like")
print("  2. Set realistic time budgets - diminishing returns after certain point")
print("  3. Use proper CV - prevent overfitting during search")
print("  4. Validate on holdout - AutoML can overfit to validation set")
print("  5. Inspect results - don't blindly trust AutoML output")
print("  6. Check interpretability - ensure model makes domain sense")
print("  7. Monitor in production - AutoML models drift like any other")

print("\n‚ùå DON'T:")
print("  1. Use AutoML as black box - understand what it's optimizing")
print("  2. Ignore compute cost - some methods very expensive")
print("  3. Skip feature engineering - AutoML works better with good features")
print("  4. Forget domain knowledge - AutoML finds correlations, not causation")
print("  5. Optimize wrong metric - choose metric aligned with business goal")
print("  6. Trust first result - run multiple seeds, ensemble top models")
print("  7. Overfit search space - too many options = overfitting")

print("\n‚ö†Ô∏è Common Pitfalls:")
print("  ‚Ä¢ Data leakage: AutoML can exploit leaks you didn't notice")
print("  ‚Ä¢ Overfitting: Optimizing too long on same validation set")
print("  ‚Ä¢ Computational waste: Search space too large, inefficient")
print("  ‚Ä¢ Unstable models: High variance across different runs")
print("  ‚Ä¢ Poor generalization: Train/test distribution mismatch")

print("\nüéØ Semiconductor-Specific Tips:")
print("  ‚Ä¢ Lot stratification: Ensure CV splits preserve lot structure")
print("  ‚Ä¢ Physics constraints: Validate AutoML features make engineering sense")
print("  ‚Ä¢ Test coverage: Don't remove tests just because AutoML says they're unimportant")
print("  ‚Ä¢ Interpretability: Fab engineers must understand model decisions")
print("  ‚Ä¢ Stability: Production models must be stable across lots/weeks")

## 13. Project Templates

### Project 1: End-to-End AutoML Yield Prediction System
**Objective:** Build production AutoML pipeline for multi-product yield prediction
- Collect STDF data for 10 different product families
- Use TPOT to generate custom pipeline per product
- Export each pipeline as deployable Python script
- Create monitoring dashboard tracking AutoML model performance
- **Success Metric:** Deploy 10 models in 1 week (vs 10 weeks manual), maintain >90% R¬≤ across all products

### Project 2: Hyperparameter Optimization as a Service
**Objective:** Create internal tool for test engineers to optimize their models
- Build API accepting dataset + model type + time budget
- Use Optuna backend for Bayesian optimization
- Return best hyperparameters + performance report + optimization plots
- Track all optimizations in database for knowledge sharing
- **Success Metric:** 20+ engineers using tool monthly, average 15% R¬≤ improvement per optimization

### Project 3: NAS for Wafer Map Defect Classification
**Objective:** Find optimal CNN architecture for spatial pattern recognition
- Dataset: 50K wafer maps labeled with defect types (scratch, ring, edge)
- Search space: Layer depth [3-8], filters [16-128], kernel sizes [3,5,7]
- Use evolutionary algorithm with 100 generations
- Compare NAS result to standard architectures (ResNet, VGG, MobileNet)
- **Success Metric:** >95% classification accuracy, <10ms inference time, deployable to edge devices

### Project 4: Multi-Objective AutoML for Test Time vs Accuracy
**Objective:** Pareto-optimal models balancing prediction quality and feature cost
- Define cost per test parameter (ATE time in ms)
- Use NSGA-II multi-objective optimization
- Optimize: maximize R¬≤, minimize total test time
- Generate Pareto front of models (accuracy vs cost trade-offs)
- **Success Metric:** 10+ Pareto-optimal models, management chooses based on cost constraints

### Project 5: AutoML Model Ensemble
**Objective:** Combine multiple AutoML methods for robust predictions
- Run Optuna, TPOT, BayesSearchCV independently
- Ensemble top-5 models from each method (15 models total)
- Use stacking or weighted averaging
- Compare ensemble to individual best model
- **Success Metric:** Ensemble outperforms any single model by ‚â•2% R¬≤, lower variance across lots

### Project 6: Feature Engineering AutoML
**Objective:** Automate discovery of optimal feature transformations
- Search space: Polynomial degrees [1-3], log transforms, interactions, binning
- Use genetic programming to evolve feature engineering pipelines
- Evaluate: Feature importance + model performance + interpretability score
- Generate human-readable feature engineering code
- **Success Metric:** Discover 5+ non-obvious features improving R¬≤ >5%, validated by engineers

### Project 7: AutoML with Interpretability Constraints
**Objective:** Optimize for performance AND explainability
- Define interpretability metric: max tree depth, number of features, coefficient sparsity
- Multi-objective optimization: R¬≤ vs interpretability
- Reject models failing SHAP sanity checks (unphysical feature importance)
- Generate model cards with auto-computed explanations
- **Success Metric:** Models within 3% of best R¬≤ but 10x more interpretable

### Project 8: Continuous AutoML for Production
**Objective:** AutoML that adapts to data drift automatically
- Monitor model performance weekly on new production data
- Trigger AutoML re-optimization when R¬≤ drops >5%
- A/B test new AutoML model vs current production model
- Auto-deploy if new model wins A/B test
- **Success Metric:** Zero manual model retraining, automated adaptation to process changes

## üéì Key Takeaways

**When to Use AutoML:**
- ‚úÖ **Time-constrained projects**: Need good model fast (hours, not weeks)
- ‚úÖ **Multiple similar problems**: Same AutoML pipeline works across products
- ‚úÖ **Non-expert users**: Enable domain experts to build ML models
- ‚úÖ **Baseline establishment**: Quickly find strong starting point for manual tuning
- ‚úÖ **Exploration**: Discover non-obvious model/hyperparameter combinations

**When NOT to Use AutoML:**
- ‚ùå **Novel problems**: Highly specialized tasks needing custom architectures
- ‚ùå **Interpretability critical**: Regulatory requirements for full transparency
- ‚ùå **Limited compute**: AutoML search expensive (GPU hours, cloud costs)
- ‚ùå **Small datasets**: AutoML overfits easily with <1000 samples
- ‚ùå **Production constraints**: Strict latency/memory limits AutoML may violate

**Method Selection Guide:**
- **Optuna**: Best for hyperparameter-only optimization, fastest, sklearn-friendly
- **TPOT**: Best for discovering full pipelines, more exploration, takes longer
- **BayesSearchCV**: Best for sklearn users wanting easy Bayesian optimization
- **NAS**: Best for deep learning, image/text tasks, requires GPU cluster
- **Cloud AutoML**: Best for production scale, enterprise features, managed service

**Limitations:**
- ‚ö†Ô∏è **No free lunch**: AutoML can't compensate for bad data quality
- ‚ö†Ô∏è **Computational cost**: Bayesian optimization 10-100x slower than manual tuning
- ‚ö†Ô∏è **Overfitting risk**: Optimizing too long on validation set causes overfitting
- ‚ö†Ô∏è **Black box danger**: Must inspect and validate AutoML outputs
- ‚ö†Ô∏è **Reproducibility**: Different runs may find different "optimal" models

**Best Practices:**
1. **Start simple**: Baseline ‚Üí Optuna ‚Üí TPOT (progressive complexity)
2. **Set time budgets**: Diminishing returns after initial exploration phase
3. **Proper validation**: Holdout test set never touched during AutoML search
4. **Ensemble top models**: Average top-3 often better than single best
5. **Domain validation**: Check if AutoML features make engineering sense
6. **Monitor in production**: AutoML models drift just like manual models
7. **Version control**: Export and save all AutoML-generated code
8. **Document search space**: Record what was optimized for reproducibility

**Semiconductor Production Checklist:**
- [ ] Lot-stratified cross-validation (don't split within lots)
- [ ] Physics-based feature validation (SHAP values make sense)
- [ ] Interpretability requirements met (engineers can explain decisions)
- [ ] Inference time acceptable (<100ms for real-time, <1s for batch)
- [ ] Model stability across lots/weeks (low variance)
- [ ] A/B testing plan (validate AutoML vs current production model)
- [ ] Monitoring dashboard (track performance drift)
- [ ] Rollback plan (revert if AutoML model fails)

**Next Steps:**
- Study **106: A/B Testing** to validate AutoML models in production
- Explore **107: Model Monitoring** for continuous performance tracking
- Experiment with cloud AutoML platforms (Google, AWS, Azure)
- Read "AutoML: Methods, Systems, Challenges" book (open source)
- Try advanced NAS libraries: Auto-Keras, NASBench-101, Once-for-All Networks
- Build internal AutoML platform for your organization