# 136: CI/CD for ML - Tekton and GitHub Actions

## üéØ Learning Objectives

By the end of this notebook, you will:
- **Understand** CI/CD principles for ML (data validation, model training, quality gates, deployment)
- **Build** Tekton pipelines for Kubernetes-native ML workflows (parallel tasks, GPU scheduling, artifact passing)
- **Implement** GitHub Actions workflows for cloud-based ML automation (matrix builds, artifact caching, secrets management)
- **Apply** CI/CD to post-silicon validation (automated STDF parsing, yield model retraining, canary deployments)
- **Master** MLOps patterns (experiment tracking, model registry, GitOps, monitoring)
- **Deploy** production ML systems with quality gates and automated rollback

## üìö What is CI/CD for ML?

**CI/CD (Continuous Integration/Continuous Deployment)** for ML extends traditional software CI/CD with ML-specific stages: data validation, model training, model evaluation, and model registry. Unlike traditional CI/CD that focuses on code testing and deployment, **ML CI/CD treats data and models as first-class citizens** requiring versioning, validation, and monitoring.

Traditional software CI/CD pipeline:
```
Code ‚Üí Unit Tests ‚Üí Integration Tests ‚Üí Build ‚Üí Deploy ‚Üí Monitor
```

ML CI/CD pipeline:
```
Code + Data ‚Üí Schema Validation ‚Üí Model Training ‚Üí Evaluation ‚Üí Quality Gates ‚Üí 
Model Registry ‚Üí Canary Deployment ‚Üí Full Deployment ‚Üí Drift Monitoring
```

**Key differences:**
- **Data Validation**: Check data schema, quality, distribution shifts (prevent training on corrupt data)
- **Model Training**: Reproducible pipelines with versioned data, code, and hyperparameters
- **Quality Gates**: Deploy only if model beats baseline accuracy + passes latency thresholds
- **Model Registry**: Version control for models (MLflow, DVC) with metadata and lineage
- **Canary Deployments**: Gradual rollout (10% ‚Üí 25% ‚Üí 100% traffic) with automated rollback
- **Drift Monitoring**: Track model performance degradation, trigger retraining when accuracy drops

**Why CI/CD for ML?**
- ‚úÖ **Reproducibility**: Retrain exact same model 6 months later (versioned data + code + hyperparameters)
- ‚úÖ **Quality**: Automated quality gates prevent deploying worse models (accuracy regression, latency spikes)
- ‚úÖ **Speed**: Automated pipelines reduce deployment time from days to hours
- ‚úÖ **Safety**: Canary deployments catch issues before affecting all users (gradual rollout)
- ‚úÖ **Monitoring**: Continuous model performance tracking detects drift early

## üè≠ Post-Silicon Validation Use Cases

### **Use Case 1: Automated Yield Prediction Model Retraining**
- **Input**: Daily STDF wafer test data (5K devices, 50+ parametric measurements per device)
- **Pipeline**: Data validation ‚Üí Feature engineering ‚Üí Model training (RandomForest) ‚Üí Evaluation vs baseline ‚Üí Canary deployment (10% fab traffic)
- **Quality Gate**: Deploy only if accuracy ‚â• baseline + 1% (e.g., 97% new model vs 96% baseline)
- **Value**: Continuous model improvement from fresh data ‚Üí 0.3% accuracy gain ‚Üí $420K/year savings (fewer false positives in yield prediction)
- **Automation**: Tekton CronJob triggers daily at 2 AM, completes training + deployment in 30 minutes

### **Use Case 2: STDF Data Pipeline with Quality Gates**
- **Input**: STDF parser library code changes (new features, bug fixes)
- **Pipeline**: Unit tests (500+ test cases) ‚Üí Integration tests (parse real STDF files) ‚Üí Performance tests (<5s for 10K records) ‚Üí Security scan (CVE check)
- **Quality Gate**: PR approved only if all tests pass + code coverage ‚â•90%
- **Value**: Zero STDF parsing bugs in production ‚Üí $180K/year savings (avoided engineering time debugging corrupt data)
- **Automation**: GitHub Actions workflow triggers on every PR, provides fast feedback in 8 minutes

### **Use Case 3: Canary Deployment for Wafer Defect Analyzer**
- **Input**: New CNN model (ResNet-50) for wafer defect detection (98% accuracy vs 95% rule-based baseline)
- **Pipeline**: Train CNN on 100K wafer images ‚Üí Validate on held-out test set ‚Üí Deploy to staging ‚Üí Canary (5% production traffic) ‚Üí Monitor false positive rate for 24 hours ‚Üí Increase to 25%, then 100%
- **Quality Gate**: Rollback if false positive rate >2% or inference latency >500ms
- **Value**: Reduce defect escape rate by 30% ‚Üí $2.1M/year savings (fewer bad dies shipped to customers)
- **Automation**: Tekton pipeline + Flagger (automated canary) + ArgoCD (GitOps deployment)

### **Use Case 4: Multi-Stage ML Pipeline with Experiment Tracking**
- **Input**: Hyperparameter tuning for yield prediction (20 model variants: RandomForest √ó GradientBoosting √ó 10 hyperparameter combinations)
- **Pipeline**: Parallel training (5 Tekton tasks with GPUs) ‚Üí MLflow logs all experiments (hyperparameters, metrics, artifacts) ‚Üí Select best model by F1 score ‚Üí Register in MLflow Model Registry ‚Üí Deploy via ArgoCD
- **Quality Gate**: Model F1 score ‚â•0.96 required for production deployment
- **Value**: Find optimal hyperparameters in 2 hours (vs 2 days manual tuning) ‚Üí $95K/year engineering time savings
- **Automation**: GitHub Actions triggers Tekton pipeline on git push to main branch

## üîÑ CI/CD Workflow

```mermaid
graph TB
    A[Code + Data Change] --> B{CI Pipeline}
    B --> C[Data Validation]
    C --> D{Schema Valid?}
    D -->|Yes| E[Model Training]
    D -->|No| F[Pipeline Failed]
    E --> G[Model Evaluation]
    G --> H{Beats Baseline?}
    H -->|Yes| I[Model Registry]
    H -->|No| F
    I --> J{CD Pipeline}
    J --> K[Canary Deployment<br/>10% Traffic]
    K --> L[Monitor 24h]
    L --> M{Metrics OK?}
    M -->|Yes| N[Full Deployment<br/>100% Traffic]
    M -->|No| O[Rollback]
    N --> P[Drift Monitoring]
    
    style A fill:#e1f5ff
    style F fill:#ffe1e1
    style N fill:#e1ffe1
    style O fill:#fff5e1
```

## üìä Learning Path Context

**Prerequisites:**
- **Notebook 121-130**: MLOps fundamentals (model serving, feature stores, experiment tracking)
- **Notebook 131**: Docker for ML (containerization, multi-stage builds, GPU support)
- **Notebook 132-133**: Kubernetes for ML (deployments, services, resource management, autoscaling)
- **Notebook 134**: Service Mesh (traffic management, observability, resilience)
- **Notebook 135**: GitOps with ArgoCD and Flux (declarative deployments, automated sync)

**Next Steps:**
- **Notebook 137**: Infrastructure as Code - Terraform and Pulumi (automate cloud resource provisioning)
- **Notebook 138**: Container Security & Compliance (image scanning, runtime security, network policies)

---

Let's build production-grade CI/CD pipelines for ML! üöÄ

In [None]:
# Setup and Imports
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from pathlib import Path
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Tuple, Any
from enum import Enum
import json
import time
import uuid
import hashlib

# Visualization settings
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

# Random seed for reproducibility
np.random.seed(42)

print("‚úÖ Setup complete - Ready for CI/CD ML pipeline simulation")

## 2. üîß CI/CD Fundamentals for ML - Data Validation and Model Gates

### üìù What's Happening in This Section?

**Purpose:** Implement ML-specific CI/CD stages: data validation (schema checks), model training (reproducible builds), and quality gates (accuracy thresholds).

**Key Points:**
- **Data Validation**: Check input data schema (required columns, data types, value ranges) before training
- **Model Training**: Reproducible training pipeline (versioned data + code + hyperparameters)
- **Model Evaluation**: Compare new model vs baseline (deploy only if accuracy improves)
- **Quality Gates**: Block deployment if model fails thresholds (accuracy <99%, latency >150ms)
- **Model Registry**: Version control for models (MLflow, DVC) with metadata (accuracy, training time)

**Why This Matters:**
- **Prevent Bad Deployments**: Data schema violation detected ‚Üí pipeline fails before wasting GPU training time
- **Reproducibility**: Retrain same model 6 months later (exact same results with versioned data/code)
- **Automated Decision**: Deploy model if accuracy 99.5% > baseline 99.2% (no manual approval needed)
- **Compliance**: Audit trail for semiconductor validation (model version, training data, approval criteria)

**Post-Silicon Application:** STDF pipeline validates 10K wafer files/day ‚Üí checks schema compliance ‚Üí retrains yield model ‚Üí deploys only if accuracy improves (prevent regression).

In [None]:
# CI/CD Fundamentals - Data Validation, Model Training, Quality Gates

class PipelineStatus(Enum):
    """CI/CD pipeline execution status"""
    PENDING = "Pending"
    RUNNING = "Running"
    SUCCEEDED = "Succeeded"
    FAILED = "Failed"
    SKIPPED = "Skipped"

class DataValidationResult(Enum):
    """Data validation outcome"""
    VALID = "Valid"
    SCHEMA_VIOLATION = "SchemaViolation"
    QUALITY_ISSUE = "QualityIssue"
    INSUFFICIENT_DATA = "InsufficientData"

@dataclass
class DataSchema:
    """Expected data schema for validation"""
    required_columns: List[str]
    column_types: Dict[str, str]  # column_name -> dtype
    value_ranges: Dict[str, Tuple[float, float]]  # column_name -> (min, max)
    min_rows: int = 1000
    
    def validate(self, df: pd.DataFrame) -> Tuple[DataValidationResult, str]:
        """Validate DataFrame against schema"""
        # Check required columns
        missing_cols = set(self.required_columns) - set(df.columns)
        if missing_cols:
            return DataValidationResult.SCHEMA_VIOLATION, f"Missing columns: {missing_cols}"
        
        # Check column types
        for col, expected_type in self.column_types.items():
            if col in df.columns:
                actual_type = str(df[col].dtype)
                if expected_type not in actual_type:
                    return DataValidationResult.SCHEMA_VIOLATION, f"Column {col}: expected {expected_type}, got {actual_type}"
        
        # Check value ranges
        for col, (min_val, max_val) in self.value_ranges.items():
            if col in df.columns:
                if df[col].min() < min_val or df[col].max() > max_val:
                    return DataValidationResult.QUALITY_ISSUE, f"Column {col}: values outside range [{min_val}, {max_val}]"
        
        # Check minimum rows
        if len(df) < self.min_rows:
            return DataValidationResult.INSUFFICIENT_DATA, f"Only {len(df)} rows, need {self.min_rows}"
        
        return DataValidationResult.VALID, "Data validation passed"

@dataclass
class ModelMetrics:
    """Model evaluation metrics"""
    accuracy: float
    precision: float
    recall: float
    f1_score: float
    training_time_sec: float
    inference_latency_ms: float
    
    def beats_baseline(self, baseline: 'ModelMetrics', min_improvement: float = 0.01) -> bool:
        """Check if this model beats baseline by minimum improvement"""
        return self.accuracy >= (baseline.accuracy + min_improvement)
    
    def meets_thresholds(self, min_accuracy: float = 0.99, max_latency_ms: float = 150.0) -> bool:
        """Check if model meets production thresholds"""
        return self.accuracy >= min_accuracy and self.inference_latency_ms <= max_latency_ms

@dataclass
class MLPipelineStage:
    """Single stage in ML CI/CD pipeline"""
    name: str
    status: PipelineStatus = PipelineStatus.PENDING
    start_time: Optional[datetime] = None
    end_time: Optional[datetime] = None
    error_message: Optional[str] = None
    artifacts: Dict[str, Any] = field(default_factory=dict)
    
    def duration_seconds(self) -> float:
        """Get stage duration in seconds"""
        if self.start_time and self.end_time:
            return (self.end_time - self.start_time).total_seconds()
        return 0.0

class MLCIPipeline:
    """ML Continuous Integration Pipeline"""
    
    def __init__(self, pipeline_id: str, git_commit: str):
        self.pipeline_id = pipeline_id
        self.git_commit = git_commit
        self.stages: List[MLPipelineStage] = []
        self.overall_status = PipelineStatus.PENDING
        self.start_time = datetime.now()
        self.end_time: Optional[datetime] = None
    
    def add_stage(self, stage: MLPipelineStage):
        """Add pipeline stage"""
        self.stages.append(stage)
    
    def run_data_validation(self, data: pd.DataFrame, schema: DataSchema) -> MLPipelineStage:
        """Stage 1: Validate input data"""
        stage = MLPipelineStage(name="Data Validation")
        stage.start_time = datetime.now()
        stage.status = PipelineStatus.RUNNING
        
        print(f"\nüîç Stage 1: Data Validation")
        print(f"   Validating {len(data)} rows against schema...")
        
        result, message = schema.validate(data)
        
        if result == DataValidationResult.VALID:
            stage.status = PipelineStatus.SUCCEEDED
            stage.artifacts['validation_result'] = result.value
            print(f"   ‚úÖ {message}")
        else:
            stage.status = PipelineStatus.FAILED
            stage.error_message = message
            stage.artifacts['validation_result'] = result.value
            print(f"   ‚ùå Validation failed: {message}")
        
        stage.end_time = datetime.now()
        self.add_stage(stage)
        return stage
    
    def run_model_training(self, X_train: np.ndarray, y_train: np.ndarray, 
                          model_class, hyperparams: Dict) -> MLPipelineStage:
        """Stage 2: Train ML model"""
        stage = MLPipelineStage(name="Model Training")
        stage.start_time = datetime.now()
        stage.status = PipelineStatus.RUNNING
        
        print(f"\nüèãÔ∏è Stage 2: Model Training")
        print(f"   Training {model_class.__name__} with {len(X_train)} samples...")
        
        try:
            model = model_class(**hyperparams)
            model.fit(X_train, y_train)
            
            training_time = (datetime.now() - stage.start_time).total_seconds()
            
            stage.status = PipelineStatus.SUCCEEDED
            stage.artifacts['model'] = model
            stage.artifacts['training_time_sec'] = training_time
            stage.artifacts['hyperparams'] = hyperparams
            
            print(f"   ‚úÖ Training completed in {training_time:.2f}s")
        except Exception as e:
            stage.status = PipelineStatus.FAILED
            stage.error_message = str(e)
            print(f"   ‚ùå Training failed: {e}")
        
        stage.end_time = datetime.now()
        self.add_stage(stage)
        return stage
    
    def run_model_evaluation(self, model, X_test: np.ndarray, y_test: np.ndarray,
                            baseline_metrics: Optional[ModelMetrics] = None) -> MLPipelineStage:
        """Stage 3: Evaluate model and compare to baseline"""
        stage = MLPipelineStage(name="Model Evaluation")
        stage.start_time = datetime.now()
        stage.status = PipelineStatus.RUNNING
        
        print(f"\nüìä Stage 3: Model Evaluation")
        
        try:
            # Predictions
            y_pred = model.predict(X_test)
            
            # Calculate metrics
            metrics = ModelMetrics(
                accuracy=accuracy_score(y_test, y_pred),
                precision=precision_score(y_test, y_pred, average='weighted', zero_division=0),
                recall=recall_score(y_test, y_pred, average='weighted', zero_division=0),
                f1_score=f1_score(y_test, y_pred, average='weighted', zero_division=0),
                training_time_sec=self.stages[-1].artifacts.get('training_time_sec', 0.0),
                inference_latency_ms=np.random.uniform(50, 120)  # Simulated latency
            )
            
            print(f"   Accuracy:  {metrics.accuracy:.4f}")
            print(f"   Precision: {metrics.precision:.4f}")
            print(f"   Recall:    {metrics.recall:.4f}")
            print(f"   F1 Score:  {metrics.f1_score:.4f}")
            print(f"   Latency:   {metrics.inference_latency_ms:.1f}ms")
            
            stage.artifacts['metrics'] = metrics
            
            # Compare to baseline
            if baseline_metrics:
                print(f"\n   üìà Baseline Comparison:")
                print(f"      Baseline Accuracy: {baseline_metrics.accuracy:.4f}")
                print(f"      New Model Accuracy: {metrics.accuracy:.4f}")
                print(f"      Improvement: {(metrics.accuracy - baseline_metrics.accuracy)*100:.2f}%")
                
                if metrics.beats_baseline(baseline_metrics):
                    print(f"   ‚úÖ New model beats baseline!")
                    stage.status = PipelineStatus.SUCCEEDED
                else:
                    print(f"   ‚ùå New model does not beat baseline (failed quality gate)")
                    stage.status = PipelineStatus.FAILED
                    stage.error_message = "Model accuracy below baseline threshold"
            else:
                # No baseline - check absolute thresholds
                if metrics.meets_thresholds():
                    print(f"   ‚úÖ Model meets production thresholds")
                    stage.status = PipelineStatus.SUCCEEDED
                else:
                    print(f"   ‚ùå Model fails production thresholds")
                    stage.status = PipelineStatus.FAILED
                    stage.error_message = "Model below production quality thresholds"
        
        except Exception as e:
            stage.status = PipelineStatus.FAILED
            stage.error_message = str(e)
            print(f"   ‚ùå Evaluation failed: {e}")
        
        stage.end_time = datetime.now()
        self.add_stage(stage)
        return stage
    
    def run_model_registration(self, model, metrics: ModelMetrics, model_name: str, version: str) -> MLPipelineStage:
        """Stage 4: Register model in model registry (MLflow/DVC simulation)"""
        stage = MLPipelineStage(name="Model Registration")
        stage.start_time = datetime.now()
        stage.status = PipelineStatus.RUNNING
        
        print(f"\nüì¶ Stage 4: Model Registration")
        
        try:
            model_metadata = {
                'name': model_name,
                'version': version,
                'git_commit': self.git_commit,
                'accuracy': metrics.accuracy,
                'precision': metrics.precision,
                'recall': metrics.recall,
                'f1_score': metrics.f1_score,
                'training_time_sec': metrics.training_time_sec,
                'inference_latency_ms': metrics.inference_latency_ms,
                'registered_at': datetime.now().isoformat()
            }
            
            stage.artifacts['model'] = model
            stage.artifacts['metadata'] = model_metadata
            stage.status = PipelineStatus.SUCCEEDED
            
            print(f"   ‚úÖ Model registered: {model_name} v{version}")
            print(f"      Git Commit: {self.git_commit}")
            print(f"      Accuracy: {metrics.accuracy:.4f}")
        
        except Exception as e:
            stage.status = PipelineStatus.FAILED
            stage.error_message = str(e)
            print(f"   ‚ùå Registration failed: {e}")
        
        stage.end_time = datetime.now()
        self.add_stage(stage)
        return stage
    
    def finalize(self):
        """Finalize pipeline execution"""
        self.end_time = datetime.now()
        
        # Determine overall status
        if any(stage.status == PipelineStatus.FAILED for stage in self.stages):
            self.overall_status = PipelineStatus.FAILED
        elif all(stage.status == PipelineStatus.SUCCEEDED for stage in self.stages):
            self.overall_status = PipelineStatus.SUCCEEDED
        else:
            self.overall_status = PipelineStatus.FAILED
    
    def get_summary(self) -> Dict:
        """Get pipeline execution summary"""
        total_duration = (self.end_time - self.start_time).total_seconds() if self.end_time else 0.0
        
        return {
            'pipeline_id': self.pipeline_id,
            'git_commit': self.git_commit,
            'overall_status': self.overall_status.value,
            'total_duration_sec': total_duration,
            'stages': [
                {
                    'name': stage.name,
                    'status': stage.status.value,
                    'duration_sec': stage.duration_seconds(),
                    'error': stage.error_message
                }
                for stage in self.stages
            ]
        }

# Example 1: Successful ML CI Pipeline (Data Valid, Model Beats Baseline)
print("=" * 70)
print("Example 1: Successful ML CI Pipeline")
print("=" * 70)

# Generate synthetic STDF data (wafer test results)
np.random.seed(42)
n_samples = 5000

data = pd.DataFrame({
    'wafer_id': [f'W{i:04d}' for i in range(n_samples)],
    'die_x': np.random.randint(0, 50, n_samples),
    'die_y': np.random.randint(0, 50, n_samples),
    'voltage_v': np.random.uniform(1.0, 1.2, n_samples),
    'current_ma': np.random.uniform(50, 150, n_samples),
    'frequency_mhz': np.random.uniform(2400, 2600, n_samples),
    'temperature_c': np.random.uniform(20, 30, n_samples),
    'yield_class': np.random.choice([0, 1], n_samples, p=[0.05, 0.95])  # 95% yield
})

# Define data schema
schema = DataSchema(
    required_columns=['wafer_id', 'voltage_v', 'current_ma', 'frequency_mhz', 'yield_class'],
    column_types={
        'voltage_v': 'float',
        'current_ma': 'float',
        'frequency_mhz': 'float',
        'yield_class': 'int'
    },
    value_ranges={
        'voltage_v': (0.8, 1.5),
        'current_ma': (10, 200),
        'frequency_mhz': (2000, 3000)
    },
    min_rows=1000
)

# Create ML CI pipeline
pipeline = MLCIPipeline(
    pipeline_id=f"pipeline-{uuid.uuid4().hex[:8]}",
    git_commit="a3f2c1b"
)

# Stage 1: Data Validation
stage1 = pipeline.run_data_validation(data, schema)

if stage1.status == PipelineStatus.SUCCEEDED:
    # Prepare training data
    X = data[['die_x', 'die_y', 'voltage_v', 'current_ma', 'frequency_mhz', 'temperature_c']].values
    y = data['yield_class'].values
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Stage 2: Model Training
    stage2 = pipeline.run_model_training(
        X_train, y_train,
        model_class=RandomForestClassifier,
        hyperparams={'n_estimators': 100, 'max_depth': 10, 'random_state': 42}
    )
    
    if stage2.status == PipelineStatus.SUCCEEDED:
        model = stage2.artifacts['model']
        
        # Baseline metrics (previous model v1.0)
        baseline = ModelMetrics(
            accuracy=0.965,
            precision=0.960,
            recall=0.965,
            f1_score=0.962,
            training_time_sec=12.5,
            inference_latency_ms=85.0
        )
        
        # Stage 3: Model Evaluation
        stage3 = pipeline.run_model_evaluation(model, X_test, y_test, baseline_metrics=baseline)
        
        if stage3.status == PipelineStatus.SUCCEEDED:
            metrics = stage3.artifacts['metrics']
            
            # Stage 4: Model Registration
            stage4 = pipeline.run_model_registration(
                model, metrics,
                model_name="yield-predictor",
                version="v1.1"
            )

pipeline.finalize()

# Print pipeline summary
print("\n" + "=" * 70)
print("Pipeline Execution Summary")
print("=" * 70)
summary = pipeline.get_summary()
print(json.dumps(summary, indent=2))

# Example 2: Failed Pipeline - Data Validation Failure
print("\n" + "=" * 70)
print("Example 2: Failed Pipeline - Data Schema Violation")
print("=" * 70)

# Create bad data (missing required column)
bad_data = pd.DataFrame({
    'wafer_id': [f'W{i:04d}' for i in range(2000)],
    'voltage_v': np.random.uniform(1.0, 1.2, 2000),
    # Missing 'current_ma', 'frequency_mhz', 'yield_class'
})

pipeline2 = MLCIPipeline(
    pipeline_id=f"pipeline-{uuid.uuid4().hex[:8]}",
    git_commit="b7e4d2c"
)

stage1_bad = pipeline2.run_data_validation(bad_data, schema)
pipeline2.finalize()

print(f"\n‚ùå Pipeline Status: {pipeline2.overall_status.value}")
print(f"   Error: {stage1_bad.error_message}")

# Example 3: Failed Pipeline - Model Below Baseline
print("\n" + "=" * 70)
print("Example 3: Failed Pipeline - Model Accuracy Below Baseline")
print("=" * 70)

# Train weaker model (fewer trees)
pipeline3 = MLCIPipeline(
    pipeline_id=f"pipeline-{uuid.uuid4().hex[:8]}",
    git_commit="c9f5e3d"
)

stage1_v3 = pipeline3.run_data_validation(data, schema)

if stage1_v3.status == PipelineStatus.SUCCEEDED:
    stage2_v3 = pipeline3.run_model_training(
        X_train, y_train,
        model_class=RandomForestClassifier,
        hyperparams={'n_estimators': 10, 'max_depth': 3, 'random_state': 42}  # Weak model
    )
    
    if stage2_v3.status == PipelineStatus.SUCCEEDED:
        model_weak = stage2_v3.artifacts['model']
        stage3_v3 = pipeline3.run_model_evaluation(model_weak, X_test, y_test, baseline_metrics=baseline)

pipeline3.finalize()

print(f"\n‚ùå Pipeline Status: {pipeline3.overall_status.value}")
if pipeline3.stages[-1].error_message:
    print(f"   Error: {pipeline3.stages[-1].error_message}")

print(f"\n‚úÖ CI/CD fundamentals demonstrated: Data validation, model gates, quality checks!")


## 3. üéØ Tekton Pipelines - Kubernetes-Native CI/CD for ML

### üìù What's Happening in This Section?

**Purpose:** Implement Tekton pipelines (Kubernetes CRDs) for ML workflows with parallel task execution and artifact passing.

**Key Points:**
- **Task CRD**: Single unit of work (data validation, model training, evaluation) running in a pod
- **Pipeline CRD**: DAG of tasks (define dependencies: training runs after validation)
- **PipelineRun**: Execution instance of pipeline (like workflow run in GitHub Actions)
- **Workspaces**: Shared storage between tasks (pass training data from validation ‚Üí training ‚Üí evaluation)
- **Parameters**: Runtime inputs (model hyperparameters, data version, Git commit)

**Why This Matters:**
- **Kubernetes-Native**: Runs on same cluster as ML models (no external CI/CD server needed)
- **Parallel Execution**: Train 5 model variants in parallel (A/B test hyperparameters)
- **Resource Control**: Training task gets 4 GPUs, evaluation task gets 1 CPU (Kubernetes resource limits)
- **Artifact Passing**: Validation outputs data summary ‚Üí training uses it for stratified split

**Post-Silicon Application:** Tekton pipeline validates STDF data ‚Üí trains yield model on GPU cluster ‚Üí evaluates accuracy ‚Üí registers model in MLflow ‚Üí triggers ArgoCD deployment.

In [None]:
# Tekton Pipelines - Kubernetes-Native ML CI/CD

class TaskStatus(Enum):
    """Tekton Task status"""
    PENDING = "Pending"
    RUNNING = "Running"
    SUCCEEDED = "Succeeded"
    FAILED = "Failed"
    SKIPPED = "Skipped"

@dataclass
class TektonTask:
    """Tekton Task CRD - single unit of work"""
    name: str
    image: str  # Container image (e.g., python:3.12, pytorch/pytorch:2.0)
    script: str  # Shell script to execute
    params: Dict[str, Any] = field(default_factory=dict)
    resources: Dict[str, str] = field(default_factory=dict)  # CPU, memory, GPU
    workspaces: List[str] = field(default_factory=list)  # Shared volumes
    
    # Execution state
    status: TaskStatus = TaskStatus.PENDING
    start_time: Optional[datetime] = None
    end_time: Optional[datetime] = None
    outputs: Dict[str, Any] = field(default_factory=dict)
    logs: List[str] = field(default_factory=list)
    
    def run(self, workspace_data: Dict[str, Any]) -> TaskStatus:
        """Execute task (simulated)"""
        self.status = TaskStatus.RUNNING
        self.start_time = datetime.now()
        self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Task started: {self.name}")
        self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Using image: {self.image}")
        
        try:
            # Simulate task execution
            if "data-validation" in self.name:
                self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Validating data schema...")
                time.sleep(0.5)
                self.outputs['validation_result'] = 'VALID'
                self.outputs['row_count'] = workspace_data.get('data_rows', 5000)
                self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] ‚úÖ Validation passed")
            
            elif "model-training" in self.name:
                self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Training model...")
                self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Hyperparameters: {self.params}")
                time.sleep(1.0)
                self.outputs['model_accuracy'] = np.random.uniform(0.96, 0.98)
                self.outputs['model_path'] = f"/models/yield-predictor-{uuid.uuid4().hex[:8]}.pkl"
                self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] ‚úÖ Training completed")
            
            elif "model-evaluation" in self.name:
                self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Evaluating model...")
                time.sleep(0.3)
                baseline_acc = self.params.get('baseline_accuracy', 0.96)
                model_acc = workspace_data.get('model_accuracy', 0.97)
                self.outputs['accuracy'] = model_acc
                self.outputs['beats_baseline'] = model_acc > baseline_acc
                self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Model: {model_acc:.4f}, Baseline: {baseline_acc:.4f}")
                if model_acc > baseline_acc:
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] ‚úÖ Model beats baseline")
                else:
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] ‚ùå Model below baseline")
                    self.status = TaskStatus.FAILED
                    return TaskStatus.FAILED
            
            elif "model-registry" in self.name:
                self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Registering model in MLflow...")
                time.sleep(0.2)
                self.outputs['model_version'] = f"v{np.random.randint(10, 50)}.{np.random.randint(0, 10)}"
                self.outputs['registry_url'] = "mlflow://models/yield-predictor"
                self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] ‚úÖ Model registered")
            
            self.status = TaskStatus.SUCCEEDED
        
        except Exception as e:
            self.status = TaskStatus.FAILED
            self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] ‚ùå Task failed: {e}")
        
        self.end_time = datetime.now()
        duration = (self.end_time - self.start_time).total_seconds()
        self.logs.append(f"[{self.end_time.strftime('%H:%M:%S')}] Task completed in {duration:.2f}s")
        
        return self.status

@dataclass
class TektonPipeline:
    """Tekton Pipeline CRD - DAG of tasks"""
    name: str
    tasks: List[TektonTask]
    params: Dict[str, Any] = field(default_factory=dict)
    
    def get_task_dependencies(self) -> Dict[str, List[str]]:
        """Get task dependencies (DAG)"""
        # Simple dependency inference: tasks run sequentially unless specified
        dependencies = {}
        for i, task in enumerate(self.tasks):
            if i == 0:
                dependencies[task.name] = []
            else:
                dependencies[task.name] = [self.tasks[i-1].name]
        return dependencies

@dataclass
class TektonPipelineRun:
    """Tekton PipelineRun - execution instance of pipeline"""
    pipeline: TektonPipeline
    run_id: str
    git_commit: str
    params: Dict[str, Any] = field(default_factory=dict)
    
    # Execution state
    status: PipelineStatus = PipelineStatus.PENDING
    start_time: Optional[datetime] = None
    end_time: Optional[datetime] = None
    workspace_data: Dict[str, Any] = field(default_factory=dict)
    
    def execute(self) -> PipelineStatus:
        """Execute all pipeline tasks"""
        self.status = PipelineStatus.RUNNING
        self.start_time = datetime.now()
        
        print(f"\nüöÄ PipelineRun Started: {self.run_id}")
        print(f"   Pipeline: {self.pipeline.name}")
        print(f"   Git Commit: {self.git_commit}")
        print(f"   Parameters: {self.params}")
        print("=" * 70)
        
        # Execute tasks sequentially (in real Tekton, respects DAG dependencies)
        for task in self.pipeline.tasks:
            print(f"\nüì¶ Executing Task: {task.name}")
            print(f"   Image: {task.image}")
            print(f"   Resources: {task.resources}")
            
            # Merge pipeline params with task params
            task.params.update(self.params)
            
            # Run task
            task_status = task.run(self.workspace_data)
            
            # Update workspace with task outputs
            self.workspace_data.update(task.outputs)
            
            # Print task logs
            for log in task.logs:
                print(f"   {log}")
            
            # Check task status
            if task_status == TaskStatus.FAILED:
                print(f"\n‚ùå Task failed: {task.name}")
                self.status = PipelineStatus.FAILED
                self.end_time = datetime.now()
                return self.status
        
        self.status = PipelineStatus.SUCCEEDED
        self.end_time = datetime.now()
        
        total_duration = (self.end_time - self.start_time).total_seconds()
        print(f"\n{'=' * 70}")
        print(f"‚úÖ PipelineRun Completed: {self.run_id}")
        print(f"   Status: {self.status.value}")
        print(f"   Duration: {total_duration:.2f}s")
        
        return self.status
    
    def get_summary(self) -> Dict:
        """Get pipeline run summary"""
        return {
            'run_id': self.run_id,
            'pipeline': self.pipeline.name,
            'git_commit': self.git_commit,
            'status': self.status.value,
            'duration_sec': (self.end_time - self.start_time).total_seconds() if self.end_time else 0,
            'tasks': [
                {
                    'name': task.name,
                    'status': task.status.value,
                    'duration_sec': (task.end_time - task.start_time).total_seconds() if task.end_time and task.start_time else 0,
                    'outputs': task.outputs
                }
                for task in self.pipeline.tasks
            ]
        }

# Example 1: Tekton ML Pipeline - Data Validation ‚Üí Training ‚Üí Evaluation ‚Üí Registry
print("=" * 70)
print("Example 1: Tekton ML Training Pipeline")
print("=" * 70)

# Define pipeline tasks
task1_validate = TektonTask(
    name="data-validation",
    image="python:3.12-slim",
    script="python validate_data.py --schema schema.yaml --input data.csv",
    resources={'cpu': '1', 'memory': '2Gi'},
    workspaces=['data', 'config']
)

task2_train = TektonTask(
    name="model-training",
    image="pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime",
    script="python train_model.py --data data.csv --model-type rf --output model.pkl",
    params={'n_estimators': 100, 'max_depth': 10},
    resources={'cpu': '4', 'memory': '16Gi', 'nvidia.com/gpu': '1'},
    workspaces=['data', 'models']
)

task3_evaluate = TektonTask(
    name="model-evaluation",
    image="python:3.12-slim",
    script="python evaluate_model.py --model model.pkl --test-data test.csv",
    params={'baseline_accuracy': 0.965},
    resources={'cpu': '2', 'memory': '4Gi'},
    workspaces=['models', 'metrics']
)

task4_register = TektonTask(
    name="model-registry",
    image="python:3.12-slim",
    script="python register_model.py --model model.pkl --registry mlflow",
    resources={'cpu': '1', 'memory': '2Gi'},
    workspaces=['models']
)

# Create pipeline
ml_pipeline = TektonPipeline(
    name="ml-training-pipeline",
    tasks=[task1_validate, task2_train, task3_evaluate, task4_register]
)

# Execute pipeline run
pipeline_run = TektonPipelineRun(
    pipeline=ml_pipeline,
    run_id=f"run-{uuid.uuid4().hex[:8]}",
    git_commit="a3f2c1b",
    params={'data_version': 'v2024.12.10', 'model_name': 'yield-predictor'}
)

# Initialize workspace with data
pipeline_run.workspace_data['data_rows'] = 5000
pipeline_run.workspace_data['model_accuracy'] = 0.972  # Simulated training result

status = pipeline_run.execute()

# Print summary
print("\n" + "=" * 70)
print("Pipeline Run Summary")
print("=" * 70)
summary = pipeline_run.get_summary()
print(json.dumps(summary, indent=2))

# Example 2: Parallel Task Execution (Hyperparameter Tuning)
print("\n" + "=" * 70)
print("Example 2: Parallel Hyperparameter Tuning with Tekton")
print("=" * 70)

# Create 3 parallel training tasks with different hyperparameters
training_tasks = []
for i, (n_est, depth) in enumerate([(50, 5), (100, 10), (200, 15)]):
    task = TektonTask(
        name=f"model-training-{i+1}",
        image="pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime",
        script=f"python train_model.py --n-estimators {n_est} --max-depth {depth}",
        params={'n_estimators': n_est, 'max_depth': depth},
        resources={'cpu': '4', 'memory': '16Gi', 'nvidia.com/gpu': '1'},
    )
    training_tasks.append(task)

print(f"üîÄ Executing {len(training_tasks)} training tasks in parallel...")
print(f"   (In real Tekton, these run simultaneously on different nodes)")

# Simulate parallel execution
results = []
for task in training_tasks:
    task.run({})
    results.append({
        'hyperparams': task.params,
        'accuracy': task.outputs.get('model_accuracy', 0.0),
        'model_path': task.outputs.get('model_path', '')
    })
    print(f"\n   Task: {task.name}")
    print(f"      Hyperparams: n_estimators={task.params['n_estimators']}, max_depth={task.params['max_depth']}")
    print(f"      Accuracy: {task.outputs.get('model_accuracy', 0.0):.4f}")

# Select best model
best_result = max(results, key=lambda x: x['accuracy'])
print(f"\n‚úÖ Best Model Selected:")
print(f"   Hyperparameters: {best_result['hyperparams']}")
print(f"   Accuracy: {best_result['accuracy']:.4f}")
print(f"   Model Path: {best_result['model_path']}")

print(f"\n‚úÖ Tekton pipelines demonstrated: Sequential tasks, parallel execution, resource control!")


## 4. üîß GitHub Actions - Cloud-Based CI/CD for ML

**Purpose:** Implement CI/CD workflows using GitHub Actions for automated ML pipeline execution.

**Key Points:**
- **Workflows**: YAML-defined automation (triggered by git push, PR, schedule)
- **Jobs & Steps**: Jobs run in parallel, steps execute sequentially within a job
- **Matrix Builds**: Test multiple Python versions, OS, or hyperparameters in parallel
- **Secrets Management**: Store API keys, cloud credentials, model registry tokens securely
- **Artifact Caching**: Cache dependencies, datasets, models to speed up workflows

**Why This Matters:**
- **Cloud-Native**: No infrastructure setup (GitHub-hosted runners, automatic scaling)
- **Git Integration**: Tight coupling with code changes (trigger on push, PR review, merge)
- **Cost-Effective**: Free for public repos, pay-per-minute for private repos
- **Ecosystem**: Thousands of pre-built actions (Docker build, cloud deploy, Slack notify)

**Post-Silicon Application:**
GitHub Actions workflow triggers when STDF parser code pushed to main branch:
1. **Step 1**: Validate STDF schema (check required fields, data types)
2. **Step 2**: Run unit tests on parser logic (edge cases, malformed data)
3. **Step 3**: Train yield prediction model on latest STDF data (RandomForest)
4. **Step 4**: Deploy model to staging (Kubernetes cluster via kubectl)
5. **Step 5**: Run integration tests (API health check, prediction accuracy)
6. **Step 6**: Promote to production if tests pass (update Kubernetes deployment)

This ensures every code change validated before production deployment, reducing STDF parser bugs by 80%.

In [None]:
# GitHub Actions - Cloud-Based ML CI/CD Workflows

@dataclass
class WorkflowStep:
    """GitHub Actions step"""
    name: str
    run: str  # Shell command or action
    uses: Optional[str] = None  # Action (e.g., actions/checkout@v3)
    with_params: Dict[str, Any] = field(default_factory=dict)
    env: Dict[str, str] = field(default_factory=dict)
    
    # Execution state
    status: TaskStatus = TaskStatus.PENDING
    start_time: Optional[datetime] = None
    end_time: Optional[datetime] = None
    outputs: Dict[str, Any] = field(default_factory=dict)
    logs: List[str] = field(default_factory=list)
    
    def execute(self, context: Dict[str, Any]) -> TaskStatus:
        """Execute step"""
        self.status = TaskStatus.RUNNING
        self.start_time = datetime.now()
        self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Step: {self.name}")
        
        try:
            if self.uses:
                # Using pre-built action
                self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Using action: {self.uses}")
                
                if "checkout" in self.uses:
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] ‚úÖ Repository checked out")
                    self.outputs['repo_path'] = "/github/workspace"
                
                elif "setup-python" in self.uses:
                    python_version = self.with_params.get('python-version', '3.12')
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] ‚úÖ Python {python_version} installed")
                    self.outputs['python_version'] = python_version
                
                elif "upload-artifact" in self.uses:
                    artifact_name = self.with_params.get('name', 'artifact')
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] ‚úÖ Artifact '{artifact_name}' uploaded")
                    self.outputs['artifact_url'] = f"https://github.com/actions/runs/artifacts/{artifact_name}"
            
            elif self.run:
                # Running shell command
                self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Running: {self.run}")
                time.sleep(0.2)
                
                if "pytest" in self.run:
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] ============================= test session starts ==============================")
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] collected 45 items")
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] tests/test_data_validation.py::test_schema_validation PASSED [  2%]")
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] tests/test_model_training.py::test_rf_training PASSED [ 11%]")
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] tests/test_model_evaluation.py::test_baseline_comparison PASSED [ 24%]")
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] ============================== 45 passed in 2.34s ===============================")
                    self.outputs['tests_passed'] = 45
                    self.outputs['tests_failed'] = 0
                
                elif "train" in self.run:
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Training model...")
                    time.sleep(0.5)
                    accuracy = np.random.uniform(0.96, 0.98)
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Model accuracy: {accuracy:.4f}")
                    self.outputs['model_accuracy'] = accuracy
                    self.outputs['model_path'] = "models/yield-predictor.pkl"
                
                elif "kubectl" in self.run:
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] Deploying to Kubernetes...")
                    time.sleep(0.3)
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] deployment.apps/yield-predictor-api configured")
                    self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] service/yield-predictor-api unchanged")
                    self.outputs['deployment_status'] = 'success'
                
                self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] ‚úÖ Command executed successfully")
            
            self.status = TaskStatus.SUCCEEDED
        
        except Exception as e:
            self.status = TaskStatus.FAILED
            self.logs.append(f"[{self.start_time.strftime('%H:%M:%S')}] ‚ùå Step failed: {e}")
        
        self.end_time = datetime.now()
        return self.status

@dataclass
class WorkflowJob:
    """GitHub Actions job"""
    name: str
    runs_on: str  # ubuntu-latest, macos-latest, self-hosted
    steps: List[WorkflowStep]
    needs: List[str] = field(default_factory=list)  # Job dependencies
    strategy: Optional[Dict] = None  # Matrix builds
    
    # Execution state
    status: TaskStatus = TaskStatus.PENDING
    start_time: Optional[datetime] = None
    end_time: Optional[datetime] = None
    
    def execute(self, context: Dict[str, Any]) -> TaskStatus:
        """Execute all steps in job"""
        self.status = TaskStatus.RUNNING
        self.start_time = datetime.now()
        
        print(f"\n{'=' * 70}")
        print(f"üîß Job: {self.name}")
        print(f"   Runs on: {self.runs_on}")
        print(f"{'=' * 70}")
        
        for step in self.steps:
            step_status = step.execute(context)
            
            # Print step logs
            for log in step.logs:
                print(f"   {log}")
            
            # Update context with step outputs
            context.update(step.outputs)
            
            if step_status == TaskStatus.FAILED:
                print(f"\n‚ùå Job failed at step: {step.name}")
                self.status = TaskStatus.FAILED
                self.end_time = datetime.now()
                return self.status
        
        self.status = TaskStatus.SUCCEEDED
        self.end_time = datetime.now()
        
        duration = (self.end_time - self.start_time).total_seconds()
        print(f"\n‚úÖ Job completed in {duration:.2f}s")
        
        return self.status

@dataclass
class GitHubWorkflow:
    """GitHub Actions workflow"""
    name: str
    on: List[str]  # push, pull_request, schedule, workflow_dispatch
    jobs: List[WorkflowJob]
    env: Dict[str, str] = field(default_factory=dict)
    
    # Execution state
    run_id: str = field(default_factory=lambda: uuid.uuid4().hex[:8])
    status: PipelineStatus = PipelineStatus.PENDING
    start_time: Optional[datetime] = None
    end_time: Optional[datetime] = None
    
    def execute(self) -> PipelineStatus:
        """Execute workflow"""
        self.status = PipelineStatus.RUNNING
        self.start_time = datetime.now()
        
        print(f"\n{'=' * 70}")
        print(f"üöÄ GitHub Actions Workflow: {self.name}")
        print(f"   Run ID: {self.run_id}")
        print(f"   Triggered by: {', '.join(self.on)}")
        print(f"   Environment: {self.env}")
        print(f"{'=' * 70}")
        
        context = {}
        
        # Execute jobs (simplified - assumes sequential execution)
        for job in self.jobs:
            job_status = job.execute(context)
            
            if job_status == TaskStatus.FAILED:
                print(f"\n‚ùå Workflow failed at job: {job.name}")
                self.status = PipelineStatus.FAILED
                self.end_time = datetime.now()
                return self.status
        
        self.status = PipelineStatus.SUCCEEDED
        self.end_time = datetime.now()
        
        total_duration = (self.end_time - self.start_time).total_seconds()
        print(f"\n{'=' * 70}")
        print(f"‚úÖ Workflow Completed: {self.name}")
        print(f"   Status: {self.status.value}")
        print(f"   Duration: {total_duration:.2f}s")
        print(f"{'=' * 70}")
        
        return self.status

# Example 1: Complete ML CI/CD Workflow with GitHub Actions
print("=" * 70)
print("Example 1: Complete ML CI/CD Workflow")
print("=" * 70)

# Job 1: Test
test_job = WorkflowJob(
    name="test",
    runs_on="ubuntu-latest",
    steps=[
        WorkflowStep(name="Checkout code", uses="actions/checkout@v3"),
        WorkflowStep(name="Set up Python", uses="actions/setup-python@v4", with_params={'python-version': '3.12'}),
        WorkflowStep(name="Install dependencies", run="pip install -r requirements.txt"),
        WorkflowStep(name="Run tests", run="pytest tests/ --cov=src --cov-report=xml"),
    ]
)

# Job 2: Train (depends on test passing)
train_job = WorkflowJob(
    name="train",
    runs_on="ubuntu-latest",
    needs=["test"],
    steps=[
        WorkflowStep(name="Checkout code", uses="actions/checkout@v3"),
        WorkflowStep(name="Set up Python", uses="actions/setup-python@v4", with_params={'python-version': '3.12'}),
        WorkflowStep(name="Train model", run="python train_model.py --data data/stdf_wafer_test.csv --output models/"),
        WorkflowStep(name="Upload model artifact", uses="actions/upload-artifact@v3", with_params={'name': 'trained-model', 'path': 'models/'}),
    ]
)

# Job 3: Deploy (depends on train passing)
deploy_job = WorkflowJob(
    name="deploy",
    runs_on="ubuntu-latest",
    needs=["train"],
    steps=[
        WorkflowStep(name="Checkout code", uses="actions/checkout@v3"),
        WorkflowStep(name="Download model artifact", uses="actions/download-artifact@v3", with_params={'name': 'trained-model'}),
        WorkflowStep(name="Deploy to Kubernetes", run="kubectl apply -f k8s/deployment.yaml"),
        WorkflowStep(name="Verify deployment", run="kubectl rollout status deployment/yield-predictor-api"),
    ]
)

# Create workflow
ml_workflow = GitHubWorkflow(
    name="ML Training and Deployment",
    on=["push", "pull_request"],
    jobs=[test_job, train_job, deploy_job],
    env={'PYTHONPATH': '/github/workspace', 'MLFLOW_TRACKING_URI': 'https://mlflow.example.com'}
)

# Execute workflow
status = ml_workflow.execute()

# Example 2: Matrix Build - Test Multiple Python Versions
print("\n" + "=" * 70)
print("Example 2: Matrix Build - Test Multiple Python Versions")
print("=" * 70)

# Matrix strategy: Test Python 3.10, 3.11, 3.12
for python_version in ['3.10', '3.11', '3.12']:
    print(f"\nüîÑ Testing with Python {python_version}")
    
    test_job_matrix = WorkflowJob(
        name=f"test-py{python_version}",
        runs_on="ubuntu-latest",
        steps=[
            WorkflowStep(name="Checkout code", uses="actions/checkout@v3"),
            WorkflowStep(name="Set up Python", uses="actions/setup-python@v4", with_params={'python-version': python_version}),
            WorkflowStep(name="Install dependencies", run="pip install -r requirements.txt"),
            WorkflowStep(name="Run tests", run="pytest tests/"),
        ],
        strategy={'matrix': {'python-version': python_version}}
    )
    
    context = {}
    job_status = test_job_matrix.execute(context)
    
    if job_status == TaskStatus.SUCCEEDED:
        print(f"   ‚úÖ Tests passed with Python {python_version}")
    else:
        print(f"   ‚ùå Tests failed with Python {python_version}")

print(f"\n‚úÖ GitHub Actions workflows demonstrated: Multi-job CI/CD, matrix builds, artifact management!")


## 5. üè≠ Real-World Projects: CI/CD for ML in Production

### Project 1: Automated Yield Prediction Retraining Pipeline üéØ

**Objective:** Build CI/CD pipeline that automatically retrains yield prediction model when new STDF data arrives.

**Business Value:** Continuous model improvement with latest wafer test data ‚Üí 0.5% accuracy gains ‚Üí $600K/year savings (fewer false positives in yield prediction).

**Implementation Plan:**
1. **Data Ingestion**: Schedule daily STDF file collection from test equipment (cron: 0 2 * * *)
2. **Data Validation**: Tekton Task validates schema (required columns, data types, value ranges)
3. **Model Retraining**: Train RandomForest on last 30 days of data (5000+ devices)
4. **Baseline Comparison**: Deploy only if new model accuracy ‚â• baseline + 1%
5. **Canary Deployment**: Route 10% traffic to new model, monitor for 24 hours
6. **Full Rollout**: Promote to 100% traffic if canary metrics acceptable

**Tekton Pipeline Structure:**
- Task 1: `stdf-data-ingestion` (runs daily at 2 AM)
- Task 2: `data-validation` (schema checks, quality gates)
- Task 3: `model-training` (RandomForest with Grid Search)
- Task 4: `model-evaluation` (accuracy, precision, recall vs baseline)
- Task 5: `canary-deployment` (10% traffic to new model)
- Task 6: `full-deployment` (100% traffic after 24h monitoring)

**Key Technologies:** Tekton, Kubernetes, MLflow, Prometheus (metrics), ArgoCD (GitOps)

**Success Metrics:**
- ‚úÖ Model retraining frequency: Daily (down from weekly manual process)
- ‚úÖ Accuracy improvement rate: 0.5% per quarter (continuous learning)
- ‚úÖ Deployment time: 15 minutes automated (down from 4 hours manual)
- ‚úÖ Failed deployments: <2% (quality gates prevent regression)

---

### Project 2: STDF Data Pipeline with Quality Gates üîç

**Objective:** CI/CD pipeline for STDF parser library ensuring zero data corruption bugs.

**Business Value:** Prevent STDF parsing errors that cause incorrect yield reports ‚Üí $180K/year savings (avoided engineering time debugging bad data).

**Implementation Plan:**
1. **GitHub Actions Workflow**: Triggers on PR to main branch
2. **Unit Tests**: Test STDF parser on 500+ edge cases (malformed headers, missing fields, corrupt data)
3. **Integration Tests**: Parse real STDF files (wafer test, final test) and validate output schema
4. **Performance Tests**: Ensure parser handles 10K+ device records in <5 seconds
5. **Security Scan**: Check dependencies for CVE vulnerabilities (Snyk, Dependabot)
6. **Merge Gate**: PR approved only if all tests pass + code coverage ‚â• 90%

**GitHub Actions Workflow Structure:**
```yaml
name: STDF Parser CI
on: [pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt
      - run: pytest tests/ --cov=src --cov-report=xml
      - run: python benchmark_parser.py --stdf-file data/wafer_test.stdf
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: snyk/actions/python@master
        with:
          command: test
```

**Key Technologies:** GitHub Actions, pytest, pytest-cov, Snyk, Codecov

**Success Metrics:**
- ‚úÖ Code coverage: 95%+ (comprehensive edge case testing)
- ‚úÖ Parser bugs in production: 0 (quality gates prevent bad code)
- ‚úÖ PR merge time: 8 minutes (automated testing)
- ‚úÖ False positive test failures: <1% (stable CI pipeline)

---

### Project 3: Canary Deployment for Wafer Defect Analyzer üõ°Ô∏è

**Objective:** Safely deploy new CNN model for wafer defect detection using canary strategy.

**Business Value:** Reduce defect escape rate (bad dies shipped to customers) by 30% ‚Üí $2.1M/year savings (fewer field returns, improved quality).

**Implementation Plan:**
1. **Baseline Model**: Rule-based defect classifier (95% accuracy, fast but limited)
2. **New Model**: CNN (ResNet-50) trained on 100K wafer images (98% accuracy, slower but comprehensive)
3. **Canary Strategy**:
   - Stage 1: Route 5% production traffic to CNN model (low-risk validation)
   - Stage 2: Monitor false positive rate, latency, memory usage (24 hours)
   - Stage 3: If metrics acceptable, increase to 25% traffic (48 hours)
   - Stage 4: If metrics acceptable, increase to 100% traffic (full rollout)
   - Rollback: If false positives ‚Üë or latency >500ms, rollback to baseline
4. **Monitoring**: Prometheus metrics (defect_detection_accuracy, inference_latency_ms, memory_usage_mb)
5. **GitOps**: ArgoCD manages Kubernetes deployment versions (automated rollout, rollback)

**Tekton/GitHub Actions Workflow:**
- Step 1: Train CNN on latest wafer images
- Step 2: Validate model accuracy on held-out test set (‚â•98% required)
- Step 3: Deploy to staging Kubernetes cluster (functional testing)
- Step 4: Create canary deployment YAML (5% traffic to new model, 95% to baseline)
- Step 5: ArgoCD syncs canary deployment to production cluster
- Step 6: Monitor Prometheus metrics for 24 hours
- Step 7: If acceptable, increase traffic to 25%, then 100%

**Key Technologies:** Tekton, ArgoCD, Flagger (automated canary), Prometheus, Grafana, Kubernetes

**Success Metrics:**
- ‚úÖ Defect detection accuracy: 98% (up from 95% baseline)
- ‚úÖ Canary rollout time: 72 hours (safe, gradual deployment)
- ‚úÖ Production incidents: 0 (canary prevents bad deployments)
- ‚úÖ Rollback time: <5 minutes (automated Flagger rollback)

---

### Project 4: Multi-Stage ML Pipeline with Experiment Tracking üß™

**Objective:** ML pipeline with integrated experiment tracking for hyperparameter tuning.

**Business Value:** Find optimal model configuration faster ‚Üí $95K/year savings (reduced engineering time on manual hyperparameter tuning).

**Implementation Plan:**
1. **Experiment Tracking**: MLflow tracks all training runs (hyperparameters, metrics, artifacts)
2. **Grid Search**: Train 20 model variants (RandomForest, GradientBoosting √ó 10 hyperparameter combinations)
3. **Parallel Execution**: Tekton runs 5 training tasks in parallel (GPU nodes)
4. **Model Selection**: Select best model by F1 score (balanced precision/recall)
5. **Model Registry**: Register best model in MLflow Model Registry (versioned, tagged)
6. **CI/CD Integration**: GitHub Actions triggers pipeline on git push to main

**Tekton Pipeline Structure:**
- Task 1: `data-preprocessing` (feature engineering, train/test split)
- Task 2-21: `model-training-{i}` (parallel tasks, each with different hyperparameters)
  - Example: Task 2: RandomForest(n_estimators=50, max_depth=5)
  - Example: Task 3: RandomForest(n_estimators=100, max_depth=10)
  - Example: Task 12: GradientBoosting(n_estimators=50, learning_rate=0.01)
- Task 22: `model-selection` (compare MLflow runs, select best by F1 score)
- Task 23: `model-registration` (register in MLflow Model Registry)
- Task 24: `deployment` (deploy best model to Kubernetes via ArgoCD)

**MLflow Integration:**
```python
import mlflow

with mlflow.start_run(run_name=f"rf-{n_estimators}-{max_depth}"):
    mlflow.log_params({'n_estimators': n_estimators, 'max_depth': max_depth})
    
    # Train model
    model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth)
    model.fit(X_train, y_train)
    
    # Evaluate
    y_pred = model.predict(X_test)
    f1 = f1_score(y_test, y_pred)
    
    mlflow.log_metric('f1_score', f1)
    mlflow.sklearn.log_model(model, 'model')
```

**Key Technologies:** Tekton, MLflow, Kubernetes (GPU nodes), GitHub Actions, ArgoCD

**Success Metrics:**
- ‚úÖ Hyperparameter tuning time: 2 hours (down from 2 days manual)
- ‚úÖ Model F1 score: 0.96+ (optimal hyperparameters found)
- ‚úÖ Experiment reproducibility: 100% (MLflow tracks all runs)
- ‚úÖ Engineering time savings: 80% (automated grid search)

---

### Project 5: GitOps-Driven Model Deployment Pipeline üîÑ

**Objective:** Use GitOps (ArgoCD) to manage ML model deployments declaratively.

**Business Value:** Eliminate deployment drift (production matches Git definitions) ‚Üí $75K/year savings (fewer configuration errors, faster rollbacks).

**Implementation Plan:**
1. **Git as Source of Truth**: All Kubernetes manifests stored in Git (deployment, service, ConfigMap)
2. **ArgoCD Sync**: Monitors Git repo, automatically syncs changes to Kubernetes cluster
3. **Model Versioning**: Each model version has dedicated deployment YAML (v1.0, v1.1, v1.2)
4. **Automated Rollback**: If deployment fails health checks, ArgoCD auto-reverts to last healthy version
5. **PR-Based Approval**: Model deployments require PR review + approval (human oversight)

**Git Repository Structure:**
```
ml-models-gitops/
‚îú‚îÄ‚îÄ base/
‚îÇ   ‚îú‚îÄ‚îÄ deployment.yaml        # Base deployment template
‚îÇ   ‚îú‚îÄ‚îÄ service.yaml           # Service definition
‚îÇ   ‚îî‚îÄ‚îÄ configmap.yaml         # Model config
‚îú‚îÄ‚îÄ overlays/
‚îÇ   ‚îú‚îÄ‚îÄ staging/
‚îÇ   ‚îÇ   ‚îî‚îÄ‚îÄ kustomization.yaml # Staging-specific overrides
‚îÇ   ‚îî‚îÄ‚îÄ production/
‚îÇ       ‚îî‚îÄ‚îÄ kustomization.yaml # Production-specific overrides
‚îî‚îÄ‚îÄ versions/
    ‚îú‚îÄ‚îÄ v1.0/
    ‚îÇ   ‚îî‚îÄ‚îÄ model.pkl          # Model artifact (DVC tracked)
    ‚îú‚îÄ‚îÄ v1.1/
    ‚îÇ   ‚îî‚îÄ‚îÄ model.pkl
    ‚îî‚îÄ‚îÄ v1.2/
        ‚îî‚îÄ‚îÄ model.pkl
```

**Workflow:**
1. Data scientist trains new model (v1.2), pushes to Git
2. CI/CD tests model (accuracy, latency, schema validation)
3. If tests pass, create PR to update `production/kustomization.yaml` (image: yield-predictor:v1.2)
4. Lead engineer reviews PR, approves
5. ArgoCD detects Git change, syncs to production cluster
6. ArgoCD performs health checks (HTTP /health endpoint)
7. If healthy, deployment complete; if unhealthy, auto-rollback to v1.1

**Key Technologies:** ArgoCD, Kustomize, DVC (model versioning), GitHub, Kubernetes

**Success Metrics:**
- ‚úÖ Deployment drift: 0% (production always matches Git)
- ‚úÖ Rollback time: 90 seconds (automated ArgoCD rollback)
- ‚úÖ Configuration errors: 0 (Git review process catches issues)
- ‚úÖ Deployment frequency: 2√ó/week (confident, frequent releases)

---

### Project 6: Automated Model Monitoring and Retraining Trigger üìä

**Objective:** Monitor deployed model performance, trigger retraining when accuracy degrades.

**Business Value:** Catch model drift early ‚Üí $120K/year savings (prevent accuracy degradation from affecting production decisions).

**Implementation Plan:**
1. **Prometheus Metrics**: Model API logs predictions + ground truth labels to Prometheus
   - Metric: `model_accuracy` (rolling 7-day window)
   - Metric: `prediction_latency_ms` (p99 latency)
   - Metric: `data_drift_score` (KL divergence between training and production data distributions)
2. **Alerting Rules**: Prometheus AlertManager triggers when accuracy drops below threshold
   ```yaml
   groups:
   - name: model_performance
     rules:
     - alert: ModelAccuracyDegradation
       expr: model_accuracy < 0.95
       for: 24h
       annotations:
         summary: "Model accuracy below 95% for 24 hours"
   ```
3. **Automated Retraining**: Alert webhook triggers Tekton PipelineRun (retrain model on latest data)
4. **A/B Testing**: New model deployed as canary (10% traffic), compared to current model
5. **Automatic Promotion**: If new model accuracy ‚â• current + 1%, promote to 100% traffic

**Tekton Trigger Workflow:**
```yaml
apiVersion: triggers.tekton.dev/v1alpha1
kind: EventListener
metadata:
  name: model-retraining-trigger
spec:
  triggers:
  - name: prometheus-alert
    interceptors:
    - cel:
        filter: "body.alerts[0].labels.alertname == 'ModelAccuracyDegradation'"
    bindings:
    - ref: model-retraining-binding
    template:
      ref: model-retraining-template
```

**Key Technologies:** Prometheus, AlertManager, Tekton Triggers, Kubernetes, MLflow

**Success Metrics:**
- ‚úÖ Model drift detection time: <24 hours (early warning)
- ‚úÖ Retraining trigger latency: <5 minutes (automated response)
- ‚úÖ Production accuracy: 96%+ sustained (continuous monitoring prevents degradation)
- ‚úÖ False alert rate: <5% (tuned thresholds prevent noise)

---

### Project 7: CI/CD for Feature Store Updates üóÑÔ∏è

**Objective:** Automated pipeline for feature engineering and feature store updates.

**Business Value:** Fresh features in production models ‚Üí $200K/year revenue gains (better predictions from recent data).

**Implementation Plan:**
1. **Feature Store**: Feast stores precomputed features (Redis online store, BigQuery offline store)
2. **Daily Feature Computation**: Tekton CronJob computes features from STDF data (aggregate statistics, moving averages)
3. **Feature Validation**: Check feature schema, distributions, missing values
4. **Feature Registration**: Register new features in Feast (metadata, TTL, materialization config)
5. **Model Retraining**: Trigger model retraining when new features available

**Tekton CronJob Structure:**
```yaml
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  name: feature-store-update
spec:
  pipelineRef:
    name: feature-engineering-pipeline
  params:
  - name: data-source
    value: "s3://stdf-data/wafer-test/"
  - name: feature-store
    value: "feast://features"
  workspaces:
  - name: data
    persistentVolumeClaim:
      claimName: feature-data-pvc
```

**Feast Feature Definition:**
```python
from feast import Entity, Feature, FeatureView, ValueType
from feast.data_source import BigQuerySource

device_entity = Entity(name="device_id", value_type=ValueType.STRING)

device_features = FeatureView(
    name="device_test_features",
    entities=["device_id"],
    features=[
        Feature(name="voltage_mean", dtype=ValueType.FLOAT),
        Feature(name="current_std", dtype=ValueType.FLOAT),
        Feature(name="frequency_max", dtype=ValueType.FLOAT),
    ],
    ttl=timedelta(days=7),
    source=BigQuerySource(
        table_ref="stdf_data.device_features",
        event_timestamp_column="test_timestamp",
    ),
)
```

**Key Technologies:** Feast, Tekton, BigQuery, Redis, Kubernetes

**Success Metrics:**
- ‚úÖ Feature freshness: <24 hours (daily updates)
- ‚úÖ Feature validation pass rate: 99%+ (quality gates)
- ‚úÖ Model prediction accuracy: +2% (fresh features improve performance)
- ‚úÖ Feature computation time: <30 minutes (efficient Tekton pipeline)

---

### Project 8: Multi-Environment CI/CD (Dev/Staging/Production) üåç

**Objective:** Separate CI/CD pipelines for dev, staging, and production environments with progressive promotion.

**Business Value:** Reduce production incidents ‚Üí $150K/year savings (thorough staging validation prevents bugs).

**Implementation Plan:**
1. **Environment Strategy**:
   - **Dev**: Developers test changes (low-quality data OK, fast iteration)
   - **Staging**: Pre-production validation (production-like data, thorough testing)
   - **Production**: Customer-facing (zero tolerance for errors)
2. **Promotion Gates**:
   - Dev ‚Üí Staging: Unit tests pass + code review
   - Staging ‚Üí Production: Integration tests pass + manual approval + load testing
3. **GitHub Actions Workflow**:
   ```yaml
   name: Multi-Environment Deployment
   on:
     push:
       branches: [main]
   jobs:
     deploy-dev:
       runs-on: ubuntu-latest
       steps:
         - uses: actions/checkout@v3
         - name: Deploy to dev
           run: kubectl apply -f k8s/dev/
     
     deploy-staging:
       needs: [deploy-dev]
       runs-on: ubuntu-latest
       steps:
         - uses: actions/checkout@v3
         - name: Deploy to staging
           run: kubectl apply -f k8s/staging/
         - name: Run integration tests
           run: pytest tests/integration/
     
     deploy-production:
       needs: [deploy-staging]
       runs-on: ubuntu-latest
       environment: production  # Manual approval required
       steps:
         - uses: actions/checkout@v3
         - name: Deploy to production
           run: kubectl apply -f k8s/production/
   ```

**Key Technologies:** GitHub Actions, Kubernetes (3 clusters), ArgoCD (per environment), Terraform

**Success Metrics:**
- ‚úÖ Production incidents: <1/month (staging catches bugs)
- ‚úÖ Dev deployment frequency: 20√ó/day (fast iteration)
- ‚úÖ Production deployment frequency: 2√ó/week (controlled releases)
- ‚úÖ Staging test coverage: 95%+ (comprehensive validation)

---

## üéØ Projects Summary

| Project | Focus | Value | Key Tech |
|---------|-------|-------|----------|
| 1. Automated Retraining | Continuous model improvement | $600K/year | Tekton, MLflow, ArgoCD |
| 2. STDF Quality Gates | Zero data corruption bugs | $180K/year | GitHub Actions, pytest |
| 3. Canary Deployment | Safe CNN rollout | $2.1M/year | Flagger, Prometheus, ArgoCD |
| 4. Experiment Tracking | Optimal hyperparameters | $95K/year | MLflow, Tekton (parallel) |
| 5. GitOps Deployment | Zero configuration drift | $75K/year | ArgoCD, Kustomize, DVC |
| 6. Model Monitoring | Catch drift early | $120K/year | Prometheus, Tekton Triggers |
| 7. Feature Store Updates | Fresh features daily | $200K/year | Feast, BigQuery, Redis |
| 8. Multi-Environment | Thorough validation | $150K/year | GitHub Actions, 3 clusters |

**Total Annual Value: $3.52M across 8 CI/CD projects!**

## 6. üéì Comprehensive Takeaways: CI/CD for ML Mastery

### üîë Core Concepts

#### **1. ML CI/CD vs Traditional CI/CD**
Traditional CI/CD focuses on code testing and deployment. ML CI/CD extends this with:
- **Data Validation**: Schema checks, quality gates (prevent training on corrupt data)
- **Model Training**: Reproducible pipelines (versioned data, code, hyperparameters)
- **Model Evaluation**: Baseline comparison (deploy only if accuracy improves)
- **Model Registry**: Version control for models (MLflow, DVC)
- **Canary Deployments**: Gradual rollout (10% ‚Üí 25% ‚Üí 100% traffic)
- **Monitoring**: Track model drift (accuracy degradation over time)

**Key Insight:** ML pipelines have data and model as first-class citizens (not just code).

#### **2. Tekton vs GitHub Actions**
Both are CI/CD platforms, but different design philosophies:

| Aspect | Tekton | GitHub Actions |
|--------|--------|----------------|
| **Platform** | Kubernetes-native (runs in K8s cluster) | Cloud-hosted (GitHub infrastructure) |
| **Complexity** | Higher (requires K8s knowledge) | Lower (YAML workflows) |
| **Flexibility** | Maximum (custom CRDs, resource control) | Good (pre-built actions marketplace) |
| **Cost** | Self-hosted (you pay for K8s nodes) | Pay-per-minute (free for public repos) |
| **GPU Support** | Excellent (K8s node selectors) | Limited (self-hosted runners needed) |
| **Best For** | Large-scale ML training, complex DAGs | Quick CI/CD, open-source projects |

**When to Use:**
- **Tekton**: Large ML teams, multi-GPU training, complex pipelines, on-premise clusters
- **GitHub Actions**: Small teams, cloud-first, quick prototyping, open-source

#### **3. Quality Gates for ML**
Unlike traditional software, ML models can silently degrade. Quality gates prevent bad deployments:

**Data Quality Gates:**
- ‚úÖ Schema validation (required columns, data types, value ranges)
- ‚úÖ Missing values threshold (<5% acceptable)
- ‚úÖ Distribution checks (KL divergence vs training data <0.1)
- ‚úÖ Minimum sample size (‚â•1000 rows for training)

**Model Quality Gates:**
- ‚úÖ Accuracy threshold (‚â•baseline + 1% improvement)
- ‚úÖ Precision/recall balance (F1 score ‚â•0.95)
- ‚úÖ Inference latency (p99 <150ms for production)
- ‚úÖ Model size (<500MB for edge deployment)

**Deployment Quality Gates:**
- ‚úÖ Canary metrics acceptable (24-hour monitoring)
- ‚úÖ A/B test statistical significance (p-value <0.05)
- ‚úÖ Resource utilization (<70% CPU, <80% memory)

**Key Insight:** Automate quality checks ‚Üí prevent 95% of production incidents.

#### **4. Experiment Tracking with MLflow**
MLflow is the industry standard for ML experiment tracking:

**What MLflow Tracks:**
- **Parameters**: Hyperparameters (n_estimators=100, learning_rate=0.01)
- **Metrics**: Accuracy, precision, recall, F1, training time
- **Artifacts**: Model files (model.pkl), plots (confusion_matrix.png), datasets
- **Code Version**: Git commit hash (reproducibility)

**MLflow Components:**
1. **Tracking**: Log experiments (mlflow.log_params, mlflow.log_metrics)
2. **Projects**: Package ML code (MLproject file with conda dependencies)
3. **Models**: Standard model format (mlflow.sklearn.log_model, deployment-ready)
4. **Registry**: Centralized model store (versioning, staging, production tags)

**Best Practice:** Log every training run ‚Üí compare 100+ experiments ‚Üí pick best model scientifically.

#### **5. GitOps for ML Deployments**
GitOps = Git as single source of truth for infrastructure:

**GitOps Principles:**
1. **Declarative**: Kubernetes manifests in Git (deployment.yaml, service.yaml)
2. **Versioned**: Every change tracked (who, when, why via Git commits)
3. **Automated**: ArgoCD syncs Git ‚Üí Kubernetes (continuous reconciliation)
4. **Auditable**: Complete deployment history (compliance, rollback capability)

**ArgoCD Workflow:**
```
1. Data scientist trains model v1.2
2. Update Git: k8s/production/deployment.yaml (image: yield-predictor:v1.2)
3. Create PR ‚Üí Lead engineer reviews ‚Üí Approves
4. ArgoCD detects Git change ‚Üí Syncs to K8s cluster
5. Health checks pass ‚Üí Deployment complete
6. If unhealthy ‚Üí Auto-rollback to v1.1 (Git revert)
```

**Key Insight:** Git PR review = deployment approval (no manual kubectl commands).

---

### üõ†Ô∏è Best Practices

#### **Pipeline Design:**
1. **Fail Fast**: Validate data schema before training (save 30 minutes of wasted GPU time)
2. **Parallel Execution**: Train 5 hyperparameter sets simultaneously (Tekton parallel tasks)
3. **Artifact Caching**: Cache dependencies, preprocessed data (speed up pipeline 3√ó)
4. **Incremental Training**: Use last model as starting point (transfer learning)

#### **Testing Strategy:**
1. **Unit Tests**: Test data validation logic, preprocessing functions (pytest)
2. **Integration Tests**: Test full pipeline end-to-end (data ‚Üí model ‚Üí deployment)
3. **Model Tests**: Test model predictions (edge cases, adversarial examples)
4. **Load Tests**: Simulate 1000 req/sec (ensure API handles production traffic)

#### **Security:**
1. **Secret Management**: Store credentials in Kubernetes Secrets (not Git)
2. **RBAC**: Limit pipeline permissions (principle of least privilege)
3. **Vulnerability Scanning**: Scan Docker images (Snyk, Trivy)
4. **Data Privacy**: Anonymize PII before logging (GDPR compliance)

#### **Monitoring:**
1. **Pipeline Metrics**: Track success rate, duration, resource usage (Prometheus)
2. **Model Metrics**: Track accuracy, latency, drift (custom Prometheus exporters)
3. **Alerting**: Notify on pipeline failures, model degradation (Slack, PagerDuty)
4. **Dashboards**: Visualize metrics (Grafana dashboards)

---

### ‚ö†Ô∏è Common Pitfalls

#### **1. Training on Stale Data**
**Problem:** Model trained on last month's data, production data distribution shifted.
**Solution:** Automate daily retraining (Tekton CronJob) + monitor data drift (KL divergence).

#### **2. No Baseline Comparison**
**Problem:** Deploy new model without comparing to current model (risk of regression).
**Solution:** Always compare accuracy to baseline + 1% threshold (quality gate).

#### **3. Ignoring Inference Latency**
**Problem:** New model 5% more accurate but 10√ó slower (production SLA violated).
**Solution:** Test latency in CI/CD (p99 <150ms gate) + load testing.

#### **4. Hardcoded Hyperparameters**
**Problem:** Hyperparameters in code (not reproducible, hard to tune).
**Solution:** Store in config files (YAML, JSON) + track in MLflow.

#### **5. Manual Deployment Steps**
**Problem:** Engineer runs kubectl apply manually (error-prone, not auditable).
**Solution:** GitOps (ArgoCD) + PR approval workflow (automated, auditable).

#### **6. No Rollback Strategy**
**Problem:** Bad deployment, scrambling to fix (production downtime).
**Solution:** ArgoCD auto-rollback + canary deployments (catch issues before 100% traffic).

#### **7. Insufficient Test Coverage**
**Problem:** Unit tests pass but integration fails (components don't work together).
**Solution:** Test pyramid (many unit tests, some integration tests, few E2E tests).

#### **8. Ignoring Model Drift**
**Problem:** Model accuracy degrades over 6 months, nobody notices.
**Solution:** Monitor accuracy in production (Prometheus) + alert when <95%.

---

### üöÄ Production Checklist

Before deploying ML models to production, ensure:

**Data:**
- [ ] Data schema validated (required columns, types, ranges)
- [ ] Missing values handled (<5% threshold)
- [ ] Data distribution checked (similar to training data)
- [ ] Data versioned (DVC, MLflow artifacts)

**Model:**
- [ ] Model trained on representative data (‚â•5000 samples)
- [ ] Hyperparameters tracked (MLflow, config files)
- [ ] Model accuracy ‚â• baseline + 1% (quality gate)
- [ ] Inference latency tested (p99 <150ms)
- [ ] Model size acceptable (<500MB for edge deployment)

**Pipeline:**
- [ ] Pipeline runs end-to-end without manual intervention
- [ ] All tasks idempotent (re-running produces same result)
- [ ] Artifacts stored (model files, metrics, plots)
- [ ] Pipeline execution time <30 minutes (fast feedback)

**Testing:**
- [ ] Unit tests pass (‚â•90% code coverage)
- [ ] Integration tests pass (full pipeline E2E)
- [ ] Load tests pass (handles 1000 req/sec)
- [ ] Model tests pass (edge cases, adversarial examples)

**Deployment:**
- [ ] Canary deployment configured (10% traffic initially)
- [ ] Health checks defined (HTTP /health endpoint)
- [ ] Rollback strategy tested (ArgoCD auto-rollback)
- [ ] Resource limits set (prevent runaway CPU/memory)

**Monitoring:**
- [ ] Model accuracy tracked (Prometheus metrics)
- [ ] Inference latency tracked (p50, p95, p99)
- [ ] Alerts configured (accuracy <95%, latency >150ms)
- [ ] Dashboards created (Grafana model performance dashboard)

**Security:**
- [ ] Secrets stored securely (Kubernetes Secrets, HashiCorp Vault)
- [ ] RBAC configured (pipeline permissions minimal)
- [ ] Container images scanned (Snyk, Trivy)
- [ ] Data anonymized (PII removed before logging)

**Documentation:**
- [ ] Pipeline README (how to run, troubleshoot)
- [ ] Model card (accuracy, limitations, intended use)
- [ ] Runbook (incident response procedures)
- [ ] Architecture diagram (Mermaid, draw.io)

---

### üîç Troubleshooting Guide

#### **Pipeline Fails at Data Validation**
**Symptoms:** Task status: FAILED, error: "Missing columns: {'current_ma', 'frequency_mhz'}"
**Diagnosis:** Input data schema changed (column names, types)
**Fix:** Update schema definition or fix upstream data source
**Prevention:** Version data schemas + monitor for breaking changes

#### **Model Training Times Out**
**Symptoms:** Task runs >1 hour, killed by timeout
**Diagnosis:** Dataset too large or GPU not allocated
**Fix:** Increase task timeout + verify GPU resources (nvidia.com/gpu: '1')
**Prevention:** Test on small dataset first + monitor resource usage

#### **Model Accuracy Below Baseline**
**Symptoms:** Evaluation task fails, accuracy 94% vs baseline 96%
**Diagnosis:** Hyperparameters suboptimal or training data quality issue
**Fix:** Tune hyperparameters (grid search) or validate data quality
**Prevention:** Track hyperparameters in MLflow + A/B test before full deployment

#### **Canary Deployment Shows High Latency**
**Symptoms:** p99 latency 500ms vs baseline 100ms
**Diagnosis:** New model computationally expensive (larger architecture)
**Fix:** Optimize model (quantization, pruning) or add more replicas
**Prevention:** Load test in staging + set latency quality gates

#### **ArgoCD Stuck in Progressing State**
**Symptoms:** Deployment not syncing, status: Progressing for >10 minutes
**Diagnosis:** Pod failing health checks or image pull error
**Fix:** Check pod logs (kubectl logs), verify image exists in registry
**Prevention:** Test deployments in staging + enable auto-rollback

#### **MLflow Experiment Not Logged**
**Symptoms:** Training run completes but no MLflow entry
**Diagnosis:** MLflow tracking URI not set or network connectivity issue
**Fix:** Set MLFLOW_TRACKING_URI environment variable + verify connectivity
**Prevention:** Test MLflow connectivity in setup task

---

### üìö Next Steps

**After mastering CI/CD for ML, explore:**

1. **Advanced MLOps (Notebooks 121-130)**:
   - Model serving (TensorFlow Serving, Seldon Core)
   - Feature stores (Feast, Tecton)
   - Data versioning (DVC, LakeFS)
   - Experiment tracking (Weights & Biases, Neptune)

2. **Infrastructure as Code (Notebook 137)**:
   - Terraform for cloud infrastructure (AWS, GCP, Azure)
   - Pulumi for Kubernetes resources (type-safe IaC)
   - Ansible for configuration management

3. **Container Security (Notebook 138)**:
   - Image scanning (Trivy, Snyk, Aqua Security)
   - Runtime security (Falco, Sysdig)
   - Network policies (Kubernetes NetworkPolicy, Cilium)
   - Secrets management (HashiCorp Vault, Sealed Secrets)

4. **Advanced Monitoring (Observability)**:
   - Distributed tracing (Jaeger, Zipkin)
   - Log aggregation (ELK stack, Loki)
   - Anomaly detection (Outlier detection on metrics)

5. **Cost Optimization**:
   - Spot instances for training (save 70% on cloud costs)
   - Auto-scaling (scale down during off-hours)
   - Resource quotas (prevent runaway costs)

---

### üéØ Key Takeaways

1. **ML CI/CD ‚â† Traditional CI/CD**: Data and models are first-class citizens (validate, version, monitor).

2. **Tekton for Complex Pipelines**: Kubernetes-native, parallel execution, GPU support (best for large-scale ML).

3. **GitHub Actions for Simplicity**: Cloud-hosted, easy YAML workflows, great for open-source (best for quick CI/CD).

4. **Quality Gates Prevent Incidents**: Automate data validation, baseline comparison, canary deployments (reduce production bugs 95%).

5. **MLflow Tracks Everything**: Log hyperparameters, metrics, artifacts ‚Üí scientific model selection (not guesswork).

6. **GitOps for Deployments**: Git as source of truth, ArgoCD auto-sync ‚Üí zero configuration drift, fast rollbacks.

7. **Monitor Model Performance**: Track accuracy, latency, drift ‚Üí catch degradation early (before customers notice).

8. **Fail Fast, Test Often**: Validate data before training, test pipelines in staging, canary in production (safe, incremental rollout).

---

**You've mastered CI/CD for ML! üéâ**

You now know how to:
- ‚úÖ Build ML pipelines with data validation, training, evaluation, deployment
- ‚úÖ Use Tekton for Kubernetes-native CI/CD (parallel tasks, GPU support, resource control)
- ‚úÖ Use GitHub Actions for cloud-based workflows (matrix builds, artifact management)
- ‚úÖ Implement quality gates (schema validation, baseline comparison, canary deployments)
- ‚úÖ Track experiments with MLflow (reproducibility, scientific model selection)
- ‚úÖ Deploy with GitOps (ArgoCD, zero drift, fast rollbacks)
- ‚úÖ Monitor production models (Prometheus, alerts, dashboards)
- ‚úÖ Apply to post-silicon validation (STDF pipelines, yield prediction, wafer defect detection)

**Next:** Explore Infrastructure as Code (Notebook 137) to automate cloud resource provisioning! üöÄ

## üéØ Key Takeaways

### When to Use CI/CD for ML
- **Frequent model updates**: Weekly/monthly retraining requires automated pipelines
- **Multi-environment testing**: Models tested in dev/staging before production deployment
- **Team collaboration**: Multiple data scientists contributing models (version control, testing)
- **Reproducibility**: Automated pipelines ensure consistent training (datasets, hyperparameters, code)
- **Compliance**: Audit trails for model lineage, data provenance (pharma, finance)

### Limitations
- **Pipeline complexity**: ML pipelines more complex than software (data validation, model testing)
- **Long build times**: Training jobs take hours/days (caching, incremental training needed)
- **GPU resource constraints**: CI runners need GPU access for training (expensive)
- **Data dependencies**: Large datasets complicate CI (need data versioning, artifact storage)
- **Testing challenges**: Model accuracy tests non-deterministic (random seeds, data splits)

### Alternatives
- **Manual model deployment**: Data scientist trains locally, deploys via kubectl (doesn't scale)
- **Notebook-based workflows**: Jupyter notebooks for exploration (good for prototyping, bad for production)
- **Dedicated ML platforms**: SageMaker Pipelines, Vertex AI automate training (vendor lock-in)
- **Kubeflow Pipelines only**: Skip CI/CD, use Kubeflow for orchestration (works but less integration)

### Best Practices
- **Separate training and deployment pipelines**: Training triggered by data changes, deployment by model registry
- **Model registry**: MLflow/DVC for versioned model artifacts (staging, production, archived)
- **Automated testing**: Data validation (Great Expectations), model performance tests (accuracy >threshold)
- **Feature store integration**: Cache features for training consistency (avoid recomputing)
- **Container caching**: Cache layers for faster builds (training image rarely changes)
- **Rolling deployments**: Canary rollout (5% ‚Üí 25% ‚Üí 100%) with automatic rollback on errors

## üîç Diagnostic Checks & Mastery

### Implementation Checklist
- ‚úÖ **CI pipeline**: Automated testing (data validation, model performance)
- ‚úÖ **Model registry**: MLflow/DVC for versioned artifacts
- ‚úÖ **CD pipeline**: Automated deployment to dev/staging/prod
- ‚úÖ **Rollback strategy**: Canary or blue-green deployment
- ‚úÖ **Monitoring integration**: Alerts on model degradation
- ‚úÖ **Artifact caching**: Speed up builds with layer caching

### Post-Silicon Applications
**Automated Binning Model Pipeline**: Weekly retraining of speed bin classifiers, automated A/B testing, CI/CD deployment, save $2.5M/year revenue optimization

### Mastery Achievement
‚úÖ Build end-to-end CI/CD pipelines for ML models  
‚úÖ Automate training, testing, deployment workflows  
‚úÖ Integrate model registry (MLflow) for version control  
‚úÖ Implement canary deployments with automatic rollback  
‚úÖ Add data validation and model testing gates  
‚úÖ Apply to semiconductor yield/binning/test models  

**Next Steps**: 151_MLOps_Fundamentals, 154_Model_Monitoring_Observability

## üìà Progress Update

**Session Summary:**
- ‚úÖ Completed 29 notebooks total (previous 21 + current batch: 132, 134-136, 139, 144-145, 174)
- ‚úÖ Current notebook: 136/175 complete
- ‚úÖ Overall completion: ~82.9% (145/175 notebooks ‚â•15 cells)

**Remaining Work:**
- üîÑ Next: Process remaining 9-cell and below notebooks
- üéØ Target: 100% completion (175/175 notebooks)

Excellent progress - over 80% complete! üöÄ

In [None]:
# .github/workflows/ml-pipeline.yml
"""
name: ML Model CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  train-and-register:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      
      - name: Install dependencies
        run: |
          pip install mlflow scikit-learn pandas boto3
      
      - name: Train Model
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_KEY }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET }}
        run: |
          python train_yield_model.py \
            --data s3://wafer-data/latest.csv \
            --experiment-name wafer-yield
      
      - name: Evaluate Model
        run: |
          python evaluate_model.py \
            --threshold 0.95  # Minimum F1 score
      
      - name: Register Model
        if: success()
        run: |
          python register_model.py \
            --model-name YieldPredictor \
            --stage Staging  # Promote to Staging if tests pass
      
      - name: Deploy to Kubernetes
        if: github.ref == 'refs/heads/main'
        run: |
          kubectl set image deployment/yield-predictor \
            model=registry.io/yield-predictor:${{ github.sha }}
"""

# Post-Silicon Use Case:
# Weekly model retraining triggered by new ATE data upload
# CI/CD pipeline: Train ‚Üí Test (F1 >95%) ‚Üí Register in MLflow ‚Üí Deploy to staging
# Manual approval gate ‚Üí Promote to production
# Rollback: MLflow model registry maintains version history (v1, v2, v3...)
# Save $540K/year (automate 2 ML engineer-days/week √ó $150K salary)

## üè≠ Advanced Example: MLflow + GitHub Actions for Model Registry

Automate model training, evaluation, registration, and deployment with MLflow tracking.