# MLOps Best Practices Guide: Post-Deployment Workflows

## 📚 Overview

This notebook provides a comprehensive guide to MLOps best practices after your initial model has been trained and deployed. It covers:

1. **Creating Challenger Models**: How to iteratively improve your model
2. **Version Management**: When to create new versions vs. new models
3. **A/B Testing**: Comparing model performance in production
4. **Continuous Improvement**: Monitoring, retraining, and automated workflows
5. **Model Governance**: Promotion strategies and approval workflows
6. **Production Best Practices**: Scaling, monitoring, and maintenance

## Prerequisites

- An existing model registered in Unity Catalog (we'll use the wine_classifier_model from the previous notebook)
- MLflow 3.x
- Access to Unity Catalog
- Databricks Runtime ML 13.0 or higher

---


In [None]:
# Initial Setup
import mlflow
from mlflow import MlflowClient
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Set up Unity Catalog
mlflow.set_registry_uri("databricks-uc")

# Initialize MLflow client
client = MlflowClient()

# Configuration
CATALOG = "jpg_ws_us_3"
SCHEMA = "default"
MODEL_NAME = "wine_classifier_model"
FULL_MODEL_NAME = f"{CATALOG}.{SCHEMA}.{MODEL_NAME}"

# Get current user
current_user = spark.sql("SELECT current_user() as user").collect()[0]['user']

print(f"MLflow Version: {mlflow.__version__}")
print(f"Current User: {current_user}")
print(f"Working with Model: {FULL_MODEL_NAME}")


## 1. Understanding Model Versioning Strategy

### 🔑 Key Concepts

**When to create a NEW VERSION of the SAME model:**
- Same algorithm/approach with different hyperparameters
- Retrained on updated data (same features)
- Minor code optimizations
- Bug fixes that don't change the model architecture

**When to create a NEW MODEL:**
- Different algorithm (e.g., switching from Random Forest to XGBoost)
- Significant feature engineering changes
- Different problem formulation
- Major architectural changes

### Model Naming Convention

```
catalog.schema.model_name

Examples:
- jpg_ws_us_3.default.wine_classifier_rf    # Random Forest
- jpg_ws_us_3.default.wine_classifier_xgb   # XGBoost
- jpg_ws_us_3.default.wine_classifier_ensemble  # Ensemble
```


In [None]:
# Check Current Model Status
def get_model_status(model_name):
    """Get comprehensive status of a model and its versions"""
    try:
        # Get model info
        model = client.get_registered_model(model_name)
        
        # Get all versions
        versions = client.search_model_versions(f"name='{model_name}'")
        
        print(f"Model: {model_name}")
        print(f"Description: {model.description if model.description else 'No description'}")
        print(f"Total Versions: {len(versions)}")
        
        # Check for aliases
        if hasattr(model, 'aliases'):
            print("\nCurrent Aliases:")
            for alias, version in model.aliases.items():
                print(f"  {alias}: Version {version}")
        
        # Show recent versions
        print("\nRecent Versions (last 3):")
        for v in sorted(versions, key=lambda x: x.version, reverse=True)[:3]:
            print(f"  Version {v.version}:")
            print(f"    Created: {datetime.fromtimestamp(v.creation_timestamp/1000).strftime('%Y-%m-%d %H:%M')}")
            print(f"    Run ID: {v.run_id}")
            
            # Get metrics from the run
            run = client.get_run(v.run_id)
            if run.data.metrics:
                key_metrics = ['test_accuracy', 'test_f1', 'val_accuracy']
                metrics = {k: v for k, v in run.data.metrics.items() if k in key_metrics}
                if metrics:
                    print(f"    Metrics: {metrics}")
        
        return model, versions
    
    except Exception as e:
        print(f"Error accessing model: {str(e)}")
        return None, None

# Check current champion model
model, versions = get_model_status(FULL_MODEL_NAME)


## 2. Creating a Challenger Model

### 🎯 Challenger Model Strategy

A **challenger model** is a new version that competes with the current champion. The workflow is:

1. **Champion** → Current production model
2. **Challenger** → New model being tested
3. **A/B Testing** → Compare performance in production
4. **Promotion** → Challenger becomes champion if better

### Best Practice Workflow:

```mermaid
graph LR
    A[Train New Model] --> B[Register as New Version]
    B --> C[Tag as Challenger]
    C --> D[A/B Test in Production]
    D --> E{Better than Champion?}
    E -->|Yes| F[Promote to Champion]
    E -->|No| G[Archive or Iterate]
    F --> H[Previous Champion → Archived]
```


In [None]:
# Example: Creating a Challenger Model (Improved Version)

from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, f1_score
import mlflow.sklearn

# Load data (same as before)
wine = datasets.load_wine()
X = wine.data
y = wine.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# IMPORTANT: Set experiment for challenger model
experiment_name = f"/Users/{current_user}/wine_classifier_challenger_experiments"
mlflow.set_experiment(experiment_name)

print(f"Creating challenger model experiment: {experiment_name}")


In [None]:
# Train Challenger Model with Improved Hyperparameters
with mlflow.start_run(run_name="Challenger_v2_Enhanced_RF") as run:
    
    # Log experiment metadata
    mlflow.set_tag("model_type", "RandomForestClassifier")
    mlflow.set_tag("purpose", "challenger")
    mlflow.set_tag("improvement_strategy", "enhanced_hyperparameters")
    mlflow.set_tag("challenger_version", "v2")
    
    # Log what improvements we're trying
    improvement_notes = """
    Improvements over champion:
    1. Expanded hyperparameter search space
    2. Added class weight balancing
    3. Increased number of estimators
    4. Added bootstrap parameter tuning
    """
    mlflow.log_param("improvements", improvement_notes)
    
    # Enhanced hyperparameter grid
    enhanced_param_grid = {
        'n_estimators': [100, 200, 300],  # More trees
        'max_depth': [10, 20, 30, None],   # More depth options
        'min_samples_split': [2, 5, 10],    
        'min_samples_leaf': [1, 2, 4],
        'max_features': ['sqrt', 'log2'],   # Feature sampling
        'bootstrap': [True, False],         # Bootstrap sampling
        'class_weight': ['balanced', None]  # Handle imbalanced classes
    }
    
    # Train model with enhanced parameters
    rf_challenger = RandomForestClassifier(random_state=42)
    
    # Grid search with more extensive CV
    grid_search = GridSearchCV(
        rf_challenger,
        enhanced_param_grid,
        cv=10,  # More folds for better validation
        scoring='f1_weighted',  # Focus on F1 score
        n_jobs=-1,
        verbose=1
    )
    
    print("Training challenger model with enhanced hyperparameters...")
    grid_search.fit(X_train_scaled, y_train)
    
    # Get best model
    challenger_model = grid_search.best_estimator_
    
    # Evaluate
    test_predictions = challenger_model.predict(X_test_scaled)
    test_accuracy = accuracy_score(y_test, test_predictions)
    test_f1 = f1_score(y_test, test_predictions, average='weighted')
    
    # Log metrics
    mlflow.log_params(grid_search.best_params_)
    mlflow.log_metric("test_accuracy", test_accuracy)
    mlflow.log_metric("test_f1", test_f1)
    mlflow.log_metric("cv_best_score", grid_search.best_score_)
    
    # Create pipeline for deployment
    from sklearn.pipeline import Pipeline
    challenger_pipeline = Pipeline([
        ('scaler', scaler),
        ('classifier', challenger_model)
    ])
    
    # Log model
    mlflow.sklearn.log_model(
        sk_model=challenger_pipeline,
        artifact_path="model",
        signature=mlflow.models.infer_signature(X_test, test_predictions)
    )
    
    challenger_run_id = run.info.run_id
    
    print(f"\nChallenger Model Results:")
    print(f"  Test Accuracy: {test_accuracy:.4f}")
    print(f"  Test F1 Score: {test_f1:.4f}")
    print(f"  Best CV Score: {grid_search.best_score_:.4f}")
    print(f"  Run ID: {challenger_run_id}")
    print(f"\nBest Parameters: {grid_search.best_params_}")


## 3. Registering Challenger Model - Best Practices

### ⚡ IMPORTANT: Same Model, New Version!

For iterative improvements of the same algorithm, register as a **new version** of the existing model, NOT a new model.

### Decision Tree:
```
Is it the same algorithm? → YES → New Version of Same Model
                         ↓
                         NO → Create New Model
```


In [None]:
# Register Challenger as New Version of SAME Model
try:
    # Register as new version of the existing model
    challenger_version = mlflow.register_model(
        model_uri=f"runs:/{challenger_run_id}/model",
        name=FULL_MODEL_NAME  # SAME model name - creates new version
    )
    
    print(f"✓ Challenger registered as Version {challenger_version.version} of {FULL_MODEL_NAME}")
    
    # Add comprehensive metadata
    client.update_model_version(
        name=FULL_MODEL_NAME,
        version=challenger_version.version,
        description=f"""
        Challenger model created on {datetime.now().strftime('%Y-%m-%d')}
        
        Improvements:
        - Enhanced hyperparameter search space
        - Added class weight balancing
        - Increased estimators and CV folds
        
        Performance:
        - Test Accuracy: {test_accuracy:.4f}
        - Test F1 Score: {test_f1:.4f}
        
        Status: Ready for A/B testing against champion
        """
    )
    
    # Add version tags
    tags_to_add = {
        "model_type": "RandomForestClassifier",
        "purpose": "challenger",
        "test_accuracy": str(test_accuracy),
        "test_f1": str(test_f1),
        "training_date": datetime.now().strftime('%Y-%m-%d'),
        "created_by": current_user
    }
    
    for key, value in tags_to_add.items():
        client.set_model_version_tag(
            name=FULL_MODEL_NAME,
            version=challenger_version.version,
            key=key,
            value=value
        )
    
    # IMPORTANT: Set challenger alias
    client.set_registered_model_alias(
        name=FULL_MODEL_NAME,
        alias="challenger",
        version=challenger_version.version
    )
    
    print(f"✓ Version {challenger_version.version} tagged as 'challenger'")
    print(f"✓ Ready for A/B testing")
    
except Exception as e:
    print(f"Error registering challenger: {str(e)}")


## 4. Comparing Champion vs Challenger Models

### 📊 Model Comparison Framework

Before promoting a challenger to champion, you need comprehensive comparison:

1. **Statistical Performance**: Accuracy, Precision, Recall, F1
2. **Business Metrics**: Revenue impact, user satisfaction
3. **Operational Metrics**: Latency, throughput, resource usage
4. **Robustness**: Performance on edge cases, data drift


In [None]:
# Compare Champion vs Challenger Models
def compare_models(model_name, champion_alias="champion", challenger_alias="challenger"):
    """Compare performance of champion and challenger models"""
    
    try:
        # Load both models
        print("Loading models...")
        champion_model = mlflow.pyfunc.load_model(f"models:/{model_name}@{champion_alias}")
        challenger_model = mlflow.pyfunc.load_model(f"models:/{model_name}@{challenger_alias}")
        
        # Get model versions
        model_info = client.get_registered_model(model_name)
        champion_version = model_info.aliases.get(champion_alias)
        challenger_version = model_info.aliases.get(challenger_alias)
        
        print(f"Champion: Version {champion_version}")
        print(f"Challenger: Version {challenger_version}")
        
        # Prepare test data
        test_data = pd.DataFrame(X_test, columns=wine.feature_names)
        
        # Get predictions
        import time
        
        # Champion predictions and timing
        start_time = time.time()
        champion_predictions = champion_model.predict(test_data)
        champion_inference_time = time.time() - start_time
        
        # Challenger predictions and timing
        start_time = time.time()
        challenger_predictions = challenger_model.predict(test_data)
        challenger_inference_time = time.time() - start_time
        
        # Calculate metrics
        from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
        
        metrics = {
            'Champion': {
                'version': champion_version,
                'accuracy': accuracy_score(y_test, champion_predictions),
                'precision': precision_score(y_test, champion_predictions, average='weighted'),
                'recall': recall_score(y_test, champion_predictions, average='weighted'),
                'f1': f1_score(y_test, champion_predictions, average='weighted'),
                'inference_time': champion_inference_time,
                'avg_latency_ms': (champion_inference_time / len(test_data)) * 1000
            },
            'Challenger': {
                'version': challenger_version,
                'accuracy': accuracy_score(y_test, challenger_predictions),
                'precision': precision_score(y_test, challenger_predictions, average='weighted'),
                'recall': recall_score(y_test, challenger_predictions, average='weighted'),
                'f1': f1_score(y_test, challenger_predictions, average='weighted'),
                'inference_time': challenger_inference_time,
                'avg_latency_ms': (challenger_inference_time / len(test_data)) * 1000
            }
        }
        
        # Create comparison DataFrame
        comparison_df = pd.DataFrame(metrics).T
        comparison_df['improvement_%'] = ((comparison_df.loc['Challenger'] - comparison_df.loc['Champion']) / comparison_df.loc['Champion'] * 100)
        
        # Display results
        print("\n" + "="*60)
        print("MODEL COMPARISON RESULTS")
        print("="*60)
        display(comparison_df)
        
        # Determine winner
        print("\n" + "="*60)
        print("RECOMMENDATION")
        print("="*60)
        
        if metrics['Challenger']['f1'] > metrics['Champion']['f1']:
            improvement = (metrics['Challenger']['f1'] - metrics['Champion']['f1']) * 100
            print(f"✓ Challenger shows {improvement:.2f}% improvement in F1 score")
            print(f"✓ Recommendation: PROMOTE challenger to champion")
        else:
            print(f"✗ Challenger does not improve upon champion")
            print(f"✗ Recommendation: KEEP current champion")
        
        return comparison_df
        
    except Exception as e:
        print(f"Error comparing models: {str(e)}")
        return None

# Compare the models
comparison_results = compare_models(FULL_MODEL_NAME)


## 5. A/B Testing in Production

### 🔀 Traffic Splitting Strategies

A/B testing allows you to safely test your challenger model with real production traffic:

1. **Canary Deployment**: Start with 5% traffic, gradually increase
2. **50/50 Split**: Equal traffic for statistical significance
3. **Multi-Armed Bandit**: Dynamically adjust traffic based on performance

### Implementation Approaches:

1. **Model Serving Endpoints**: Use Databricks Model Serving with traffic routing
2. **Application-Level**: Implement routing logic in your application
3. **Feature Flags**: Use feature flag services for dynamic control


In [None]:
# A/B Testing Simulation
class ABTestingFramework:
    """Simulate A/B testing between champion and challenger models"""
    
    def __init__(self, model_name, test_duration_hours=24):
        self.model_name = model_name
        self.test_duration_hours = test_duration_hours
        self.results = {'champion': [], 'challenger': []}
        
    def route_traffic(self, traffic_split=0.5):
        """Route traffic based on split percentage"""
        return 'challenger' if np.random.random() < traffic_split else 'champion'
    
    def simulate_production_traffic(self, n_requests=1000, challenger_traffic=0.2):
        """Simulate production traffic with A/B split"""
        
        print(f"Simulating A/B Test: {challenger_traffic*100:.0f}% challenger, {(1-challenger_traffic)*100:.0f}% champion")
        print(f"Total requests: {n_requests}")
        print("-" * 60)
        
        # Load models
        champion = mlflow.pyfunc.load_model(f"models:/{self.model_name}@champion")
        challenger = mlflow.pyfunc.load_model(f"models:/{self.model_name}@challenger")
        
        # Track metrics
        champion_metrics = {'predictions': [], 'latencies': [], 'errors': 0}
        challenger_metrics = {'predictions': [], 'latencies': [], 'errors': 0}
        
        # Simulate requests
        for i in range(n_requests):
            # Sample random data point
            idx = np.random.randint(0, len(X_test))
            sample = pd.DataFrame([X_test[idx]], columns=wine.feature_names)
            true_label = y_test[idx]
            
            # Route traffic
            model_choice = self.route_traffic(challenger_traffic)
            
            # Make prediction and measure latency
            import time
            try:
                if model_choice == 'challenger':
                    start = time.time()
                    pred = challenger.predict(sample)[0]
                    latency = (time.time() - start) * 1000  # ms
                    
                    challenger_metrics['predictions'].append((pred, true_label))
                    challenger_metrics['latencies'].append(latency)
                else:
                    start = time.time()
                    pred = champion.predict(sample)[0]
                    latency = (time.time() - start) * 1000  # ms
                    
                    champion_metrics['predictions'].append((pred, true_label))
                    champion_metrics['latencies'].append(latency)
            except Exception as e:
                if model_choice == 'challenger':
                    challenger_metrics['errors'] += 1
                else:
                    champion_metrics['errors'] += 1
        
        # Calculate results
        results = self._calculate_metrics(champion_metrics, challenger_metrics)
        return results
    
    def _calculate_metrics(self, champion_metrics, challenger_metrics):
        """Calculate A/B test metrics"""
        from sklearn.metrics import accuracy_score
        
        results = {}
        
        for model_name, metrics in [('Champion', champion_metrics), ('Challenger', challenger_metrics)]:
            if metrics['predictions']:
                preds, labels = zip(*metrics['predictions'])
                accuracy = accuracy_score(labels, preds)
                avg_latency = np.mean(metrics['latencies'])
                p95_latency = np.percentile(metrics['latencies'], 95)
                
                results[model_name] = {
                    'requests': len(metrics['predictions']),
                    'accuracy': accuracy,
                    'avg_latency_ms': avg_latency,
                    'p95_latency_ms': p95_latency,
                    'errors': metrics['errors'],
                    'error_rate': metrics['errors'] / (len(metrics['predictions']) + metrics['errors'])
                }
        
        return results
    
    def progressive_rollout(self, stages=[0.05, 0.20, 0.50, 1.0], requests_per_stage=500):
        """Simulate progressive rollout of challenger model"""
        print("PROGRESSIVE ROLLOUT SIMULATION")
        print("=" * 60)
        
        all_results = []
        
        for stage, traffic_pct in enumerate(stages):
            print(f"\nStage {stage + 1}: {traffic_pct*100:.0f}% traffic to challenger")
            
            results = self.simulate_production_traffic(
                n_requests=requests_per_stage,
                challenger_traffic=traffic_pct
            )
            
            # Display stage results
            stage_df = pd.DataFrame(results).T
            display(stage_df)
            
            # Check if challenger is performing well
            if 'Challenger' in results and 'Champion' in results:
                if results['Challenger']['accuracy'] < results['Champion']['accuracy'] * 0.95:
                    print("⚠️ WARNING: Challenger performing poorly, consider rollback")
                else:
                    print("✓ Challenger performing well, safe to continue")
            
            all_results.append(results)
        
        return all_results

# Run A/B Testing Simulation
ab_test = ABTestingFramework(FULL_MODEL_NAME)

# Simulate initial canary deployment (5% traffic)
print("CANARY DEPLOYMENT (5% traffic to challenger)")
print("-" * 60)
canary_results = ab_test.simulate_production_traffic(n_requests=1000, challenger_traffic=0.05)
display(pd.DataFrame(canary_results).T)


In [None]:
# Simulate Progressive Rollout
print("\n" + "="*60)
print("PROGRESSIVE ROLLOUT STRATEGY")
print("="*60)

# Define rollout stages
rollout_stages = [0.05, 0.20, 0.50, 1.0]  # 5% -> 20% -> 50% -> 100%
rollout_results = ab_test.progressive_rollout(stages=rollout_stages, requests_per_stage=200)


## 6. Model Promotion Workflow

### 📈 Promoting Challenger to Champion

Once A/B testing confirms the challenger performs better, follow this promotion workflow:

1. **Review Metrics**: Ensure all KPIs meet thresholds
2. **Approval Process**: Get stakeholder sign-off
3. **Update Aliases**: Swap champion and challenger aliases
4. **Archive Previous**: Keep previous champion for rollback
5. **Monitor**: Watch for issues post-promotion


In [None]:
# Model Promotion Workflow Implementation
class ModelPromotion:
    """Manage model promotion from challenger to champion"""
    
    def __init__(self, model_name, client):
        self.model_name = model_name
        self.client = client
        
    def validate_promotion_criteria(self, min_accuracy=0.90, min_f1=0.90, max_latency_ms=100):
        """Validate if challenger meets promotion criteria"""
        
        print("PROMOTION CRITERIA VALIDATION")
        print("="*60)
        
        criteria_met = True
        validation_results = {}
        
        # Get challenger version
        model_info = self.client.get_registered_model(self.model_name)
        challenger_version = model_info.aliases.get('challenger')
        
        if not challenger_version:
            print("❌ No challenger model found")
            return False, {}
        
        # Get run metrics
        version_info = self.client.get_model_version(self.model_name, challenger_version)
        run = self.client.get_run(version_info.run_id)
        
        # Check accuracy
        test_accuracy = run.data.metrics.get('test_accuracy', 0)
        if test_accuracy >= min_accuracy:
            validation_results['accuracy'] = f"✓ Accuracy: {test_accuracy:.4f} >= {min_accuracy}"
        else:
            validation_results['accuracy'] = f"✗ Accuracy: {test_accuracy:.4f} < {min_accuracy}"
            criteria_met = False
        
        # Check F1 score
        test_f1 = run.data.metrics.get('test_f1', 0)
        if test_f1 >= min_f1:
            validation_results['f1'] = f"✓ F1 Score: {test_f1:.4f} >= {min_f1}"
        else:
            validation_results['f1'] = f"✗ F1 Score: {test_f1:.4f} < {min_f1}"
            criteria_met = False
        
        # Display results
        for criterion, result in validation_results.items():
            print(f"  {result}")
        
        print(f"\nValidation Result: {'PASSED ✓' if criteria_met else 'FAILED ✗'}")
        
        return criteria_met, validation_results
    
    def promote_model(self, require_approval=True):
        """Promote challenger to champion with proper governance"""
        
        print("\nMODEL PROMOTION PROCESS")
        print("="*60)
        
        # Step 1: Validate criteria
        criteria_met, validation = self.validate_promotion_criteria()
        
        if not criteria_met:
            print("\n❌ Promotion blocked: Criteria not met")
            return False
        
        # Step 2: Approval process (simulated)
        if require_approval:
            print("\n📋 Approval Process:")
            print("  - Technical Review: ✓")
            print("  - Business Review: ✓")
            print("  - Risk Assessment: ✓")
            approval = True  # In production, this would be a real approval workflow
            
            if not approval:
                print("\n❌ Promotion blocked: Approval not granted")
                return False
        
        # Step 3: Execute promotion
        try:
            model_info = self.client.get_registered_model(self.model_name)
            current_champion = model_info.aliases.get('champion')
            current_challenger = model_info.aliases.get('challenger')
            
            if not current_challenger:
                print("❌ No challenger version to promote")
                return False
            
            print(f"\n🔄 Executing Promotion:")
            print(f"  Current Champion: Version {current_champion}")
            print(f"  Current Challenger: Version {current_challenger}")
            
            # Archive current champion
            if current_champion:
                self.client.set_registered_model_alias(
                    name=self.model_name,
                    alias="previous_champion",
                    version=current_champion
                )
                print(f"  ✓ Archived Version {current_champion} as 'previous_champion'")
            
            # Promote challenger to champion
            self.client.set_registered_model_alias(
                name=self.model_name,
                alias="champion",
                version=current_challenger
            )
            print(f"  ✓ Promoted Version {current_challenger} to 'champion'")
            
            # Remove challenger alias
            self.client.delete_registered_model_alias(
                name=self.model_name,
                alias="challenger"
            )
            print(f"  ✓ Removed 'challenger' alias")
            
            # Update version description
            promotion_timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
            self.client.update_model_version(
                name=self.model_name,
                version=current_challenger,
                description=f"Promoted to champion on {promotion_timestamp}. Previous champion: v{current_champion}"
            )
            
            # Log promotion event
            print(f"\n✅ PROMOTION SUCCESSFUL")
            print(f"  New Champion: Version {current_challenger}")
            print(f"  Promoted at: {promotion_timestamp}")
            
            return True
            
        except Exception as e:
            print(f"\n❌ Promotion failed: {str(e)}")
            return False
    
    def rollback_model(self):
        """Rollback to previous champion if issues detected"""
        
        print("\nMODEL ROLLBACK PROCESS")
        print("="*60)
        
        try:
            model_info = self.client.get_registered_model(self.model_name)
            current_champion = model_info.aliases.get('champion')
            previous_champion = model_info.aliases.get('previous_champion')
            
            if not previous_champion:
                print("❌ No previous champion to rollback to")
                return False
            
            print(f"⚠️ Rolling back from Version {current_champion} to Version {previous_champion}")
            
            # Set previous champion as current champion
            self.client.set_registered_model_alias(
                name=self.model_name,
                alias="champion",
                version=previous_champion
            )
            
            # Archive the failed version
            self.client.set_registered_model_alias(
                name=self.model_name,
                alias="rolled_back",
                version=current_champion
            )
            
            print(f"✅ Rollback successful")
            print(f"  Current Champion: Version {previous_champion}")
            print(f"  Rolled back version: {current_champion}")
            
            return True
            
        except Exception as e:
            print(f"❌ Rollback failed: {str(e)}")
            return False

# Example: Promote the challenger model
promoter = ModelPromotion(FULL_MODEL_NAME, client)

# Validate promotion criteria
can_promote, validation = promoter.validate_promotion_criteria(
    min_accuracy=0.85,  # Lower threshold for demo
    min_f1=0.85
)


In [None]:
# OPTIONAL: Execute promotion if criteria are met
# Uncomment the following line to actually promote the model
# if can_promote:
#     promotion_success = promoter.promote_model(require_approval=True)

print("\n💡 To promote the model, uncomment and run the promotion code above")


## 7. Model Monitoring & Retraining Strategy

### 📊 Key Monitoring Metrics

**Performance Monitoring:**
- Model accuracy over time
- Prediction latency
- Error rates
- Feature importance changes

**Data Quality Monitoring:**
- Input data distribution (drift detection)
- Missing values
- Outliers
- Schema changes

**Business Metrics:**
- Business KPIs affected by the model
- User feedback/satisfaction
- Cost per prediction

### 🔄 Retraining Triggers

1. **Scheduled Retraining**: Weekly/Monthly/Quarterly
2. **Performance-Based**: When metrics drop below threshold
3. **Data-Driven**: When drift exceeds limits
4. **Event-Based**: New data source, business changes


In [None]:
# Model Monitoring Implementation
class ModelMonitor:
    """Monitor model performance and trigger retraining when needed"""
    
    def __init__(self, model_name):
        self.model_name = model_name
        self.metrics_history = []
        
    def calculate_data_drift(self, reference_data, current_data):
        """Calculate drift between reference and current data distributions"""
        from scipy import stats
        
        drift_scores = {}
        
        for col_idx, col_name in enumerate(wine.feature_names):
            ref_col = reference_data[:, col_idx]
            curr_col = current_data[:, col_idx]
            
            # Kolmogorov-Smirnov test for distribution difference
            ks_statistic, p_value = stats.ks_2samp(ref_col, curr_col)
            
            drift_scores[col_name] = {
                'ks_statistic': ks_statistic,
                'p_value': p_value,
                'drift_detected': p_value < 0.05  # 5% significance level
            }
        
        # Overall drift score (average KS statistic)
        overall_drift = np.mean([score['ks_statistic'] for score in drift_scores.values()])
        
        return overall_drift, drift_scores
    
    def monitor_performance(self, current_accuracy, current_f1, baseline_accuracy=0.90, baseline_f1=0.90):
        """Monitor model performance against baselines"""
        
        performance_status = {
            'timestamp': datetime.now(),
            'current_accuracy': current_accuracy,
            'current_f1': current_f1,
            'baseline_accuracy': baseline_accuracy,
            'baseline_f1': baseline_f1,
            'accuracy_degradation': baseline_accuracy - current_accuracy,
            'f1_degradation': baseline_f1 - current_f1,
            'requires_retraining': False
        }
        
        # Check if retraining is needed
        if current_accuracy < baseline_accuracy * 0.95:  # 5% degradation threshold
            performance_status['requires_retraining'] = True
            performance_status['reason'] = 'Accuracy degradation > 5%'
        elif current_f1 < baseline_f1 * 0.95:
            performance_status['requires_retraining'] = True
            performance_status['reason'] = 'F1 score degradation > 5%'
        
        self.metrics_history.append(performance_status)
        return performance_status
    
    def generate_monitoring_report(self, test_data, test_labels, reference_data):
        """Generate comprehensive monitoring report"""
        
        print("MODEL MONITORING REPORT")
        print("="*60)
        print(f"Model: {self.model_name}")
        print(f"Report Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
        print("-"*60)
        
        # Load current champion model
        champion_model = mlflow.pyfunc.load_model(f"models:/{self.model_name}@champion")
        
        # Get predictions
        predictions = champion_model.predict(pd.DataFrame(test_data, columns=wine.feature_names))
        
        # Calculate performance metrics
        from sklearn.metrics import accuracy_score, f1_score
        current_accuracy = accuracy_score(test_labels, predictions)
        current_f1 = f1_score(test_labels, predictions, average='weighted')
        
        print("\n📊 Performance Metrics:")
        print(f"  Current Accuracy: {current_accuracy:.4f}")
        print(f"  Current F1 Score: {current_f1:.4f}")
        
        # Check data drift
        overall_drift, feature_drift = self.calculate_data_drift(reference_data, test_data)
        
        print(f"\n📈 Data Drift Analysis:")
        print(f"  Overall Drift Score: {overall_drift:.4f}")
        
        # Count features with significant drift
        drifted_features = [f for f, scores in feature_drift.items() if scores['drift_detected']]
        print(f"  Features with Drift: {len(drifted_features)}/{len(feature_drift)}")
        
        if drifted_features:
            print("  Drifted Features:")
            for feature in drifted_features[:5]:  # Show top 5
                print(f"    - {feature}: KS={feature_drift[feature]['ks_statistic']:.3f}")
        
        # Performance monitoring
        perf_status = self.monitor_performance(current_accuracy, current_f1)
        
        print(f"\n⚠️ Retraining Assessment:")
        if perf_status['requires_retraining']:
            print(f"  Status: RETRAINING RECOMMENDED")
            print(f"  Reason: {perf_status['reason']}")
        else:
            print(f"  Status: Model performing within acceptable limits")
        
        if overall_drift > 0.2:  # Drift threshold
            print(f"  ⚠️ High data drift detected (score: {overall_drift:.3f})")
            print(f"     Consider retraining with recent data")
        
        return {
            'performance': perf_status,
            'drift': {'overall': overall_drift, 'features': feature_drift},
            'recommendations': self._get_recommendations(perf_status, overall_drift)
        }
    
    def _get_recommendations(self, performance_status, drift_score):
        """Generate actionable recommendations"""
        recommendations = []
        
        if performance_status['requires_retraining']:
            recommendations.append("🔴 Immediate: Retrain model with recent data")
        
        if drift_score > 0.2:
            recommendations.append("🟡 Soon: Investigate data drift causes")
            recommendations.append("🟡 Soon: Update training data distribution")
        
        if drift_score > 0.1:
            recommendations.append("🟢 Monitor: Increase monitoring frequency")
        
        if not recommendations:
            recommendations.append("✅ No action needed - model healthy")
        
        return recommendations

# Run monitoring simulation
monitor = ModelMonitor(FULL_MODEL_NAME)

# Simulate monitoring with current data
monitoring_report = monitor.generate_monitoring_report(
    test_data=X_test,
    test_labels=y_test,
    reference_data=X_train
)

print("\n📋 Recommendations:")
for rec in monitoring_report['recommendations']:
    print(f"  {rec}")
