# Class II: Deep-dive into MLOps & Infrastructure

Welcome to our hands-on MLOps session! In this notebook, we'll work with:
- **MLflow**: For experiment tracking and model registry
- **Docker**: All services are containerized (no local installation needed!)
- **JupyterLab**: You're running this from a container right now

## Quick Setup Check
Make sure all containers are running:
- MLflow UI: http://localhost:5000
- JupyterLab: http://localhost:8888 (you're here!)
- API: http://localhost:8080

Let's start with experiment tracking!

In [None]:
# First, let's install MLflow in this container and set up the connection
import subprocess
import sys

# Install MLflow if not already installed
try:
    import mlflow
    print("✅ MLflow already installed")
except ImportError:
    print("📦 Installing MLflow...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "mlflow"])
    import mlflow

# Set the MLflow tracking URI to connect to our containerized MLflow server
import os
os.environ['MLFLOW_TRACKING_URI'] = 'http://mlflow:5000'
mlflow.set_tracking_uri('http://mlflow:5000')

print(f"🎯 MLflow Tracking URI: {mlflow.get_tracking_uri()}")
print("🚀 Ready to track experiments!")

# Experiment Tracking with MLflow

Now let's train a model and track the experiment. We'll log:
- **Parameters**: Model configuration (n_estimators, etc.)
- **Metrics**: Performance metrics (accuracy, etc.)
- **Artifacts**: The trained model itself

Check the MLflow UI at http://localhost:5000 to see your experiments!

In [None]:
import mlflow.sklearn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import pandas as pd

# Load the Iris dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

print("📊 Dataset loaded:")
print(f"Training samples: {X_train.shape[0]}")
print(f"Test samples: {X_test.shape[0]}")
print(f"Features: {X_train.shape[1]}")
print(f"Classes: {iris.target_names}")

# Start MLflow experiment tracking
with mlflow.start_run():
    # Model parameters
    n_estimators = 100
    max_depth = 3
    
    # Train the model
    clf = RandomForestClassifier(
        n_estimators=n_estimators, 
        max_depth=max_depth,
        random_state=42
    )
    clf.fit(X_train, y_train)
    
    # Make predictions
    preds = clf.predict(X_test)
    acc = accuracy_score(y_test, preds)
    
    # Log parameters to MLflow
    mlflow.log_param('n_estimators', n_estimators)
    mlflow.log_param('max_depth', max_depth)
    mlflow.log_param('model_type', 'RandomForestClassifier')
    
    # Log metrics to MLflow
    mlflow.log_metric('accuracy', acc)
    mlflow.log_metric('test_samples', len(y_test))
    
    # Log the trained model directly as an MLflow artifact (no manual file creation needed)
    mlflow.sklearn.log_model(clf, artifact_path="model")
    
    print(f'✅ Model trained!')
    print(f'🎯 Accuracy: {acc:.3f}')
    print(f'📝 Experiment logged to MLflow')
    print(f'🌐 View at: http://localhost:5000')

# Model Registry

Now let's register our best model in the MLflow Model Registry. This allows us to:
- Version our models
- Promote models through stages (Staging → Production)
- Track model lineage and metadata

The Model Registry is like a "model store" for your organization.

**Note**: First, let's check if our MLflow server supports Model Registry.

In [None]:
# Check MLflow server connectivity and Model Registry availability
import requests

try:
    # Test basic MLflow server connectivity
    response = requests.get('http://mlflow:5000/health', timeout=5)
    if response.status_code == 200:
        print("✅ MLflow server is running")
    else:
        print(f"⚠️ MLflow server response: {response.status_code}")
except requests.exceptions.RequestException as e:
    print(f"❌ Cannot connect to MLflow server: {e}")
    print("💡 Make sure MLflow container is running: docker-compose up -d")

# Test Model Registry API endpoint
try:
    response = requests.get('http://mlflow:5000/api/2.0/mlflow/registered-models/list', timeout=5)
    if response.status_code == 200:
        print("✅ Model Registry is available")
        models = response.json()
        print(f"📋 Registered models: {len(models.get('registered_models', []))}")
    else:
        print(f"⚠️ Model Registry response: {response.status_code}")
        print("💡 This might be expected if no models are registered yet")
except requests.exceptions.RequestException as e:
    print(f"❌ Model Registry not accessible: {e}")

In [None]:
# Register the model in MLflow Model Registry with error handling
model_name = 'Iris-RandomForest-Best'

# Let's train a slightly better model for registration
with mlflow.start_run():
    # Better hyperparameters
    n_estimators = 150
    max_depth = 5
    
    clf = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        random_state=42
    )
    clf.fit(X_train, y_train)
    preds = clf.predict(X_test)
    acc = accuracy_score(y_test, preds)
    
    # Log everything
    mlflow.log_param('n_estimators', n_estimators)
    mlflow.log_param('max_depth', max_depth)
    mlflow.log_metric('accuracy', acc)
    
    # Log and register the model (should work now with fixed permissions)
    mlflow.sklearn.log_model(
        clf, 
        'model',
        registered_model_name=model_name
    )
    
    print(f'✅ Model trained and logged!')
    print(f'🎯 Accuracy: {acc:.3f}')
    print(f' Model registered: {model_name}')
    print(f'📋 Go to MLflow UI → Models tab to promote to Production!')
    print(f'🌐 http://localhost:5000/#/models/{model_name}')

# Model Promotion Workflow

Once a model is registered, we typically move it through different stages:

1. **None** → **Staging** → **Production**
2. This allows for testing and validation before production deployment
3. Multiple versions can exist in different stages simultaneously

Let's promote our best model to Production stage!

In [None]:
# Promote model to Production (typical MLOps workflow)
from mlflow.tracking import MlflowClient

try:
    client = MlflowClient()
    
    # Check if model exists
    try:
        model_info = client.get_registered_model(model_name)
        print(f'📦 Found model: {model_name}')
    except:
        print(f'❌ Model {model_name} not found in registry')
        print('💡 Please run the previous cell to register the model first')
        raise
    
    # Get the latest version
    model_versions = client.get_latest_versions(model_name, stages=["None"])
    if not model_versions:
        print(f'❌ No versions found for model {model_name}')
        raise Exception("No model versions available")
    
    latest_version = model_versions[0].version
    print(f'🔍 Latest version: {latest_version}')
    
    # Transition to Staging first
    client.transition_model_version_stage(
        name=model_name,
        version=latest_version,
        stage="Staging"
    )
    print(f'📋 Version {latest_version} moved to Staging')
    
    # After review, promote to Production
    client.transition_model_version_stage(
        name=model_name,
        version=latest_version,
        stage="Production"
    )
    print(f'🚀 Version {latest_version} promoted to Production!')
    print(f'✅ Ready for inference!')
    print(f'🌐 Check MLflow UI: http://localhost:5000/#/models/{model_name}')
    
except Exception as e:
    print(f'⚠️ Model promotion failed: {e}')
    print('💡 Possible solutions:')
    print('   1. Make sure MLflow server is running')
    print('   2. Register a model first (run previous cells)')
    print('   3. Check MLflow UI for registered models')

# Testing Our API

Finally, let's test our model-serving API! The API is running at http://localhost:8080

**Important**: Before testing, go to the MLflow UI and promote your model to "Production" stage:
1. Go to http://localhost:5000
2. Click "Models" tab
3. Click on "Iris-RandomForest-Best"
4. Click "Stage Transition" → "Production"

In [None]:
import requests
import json

# Test data - these are actual Iris flower measurements
test_cases = [
    {
        "name": "Setosa (should predict 0)",
        "features": [5.1, 3.5, 1.4, 0.2]  # Typical Setosa measurements
    },
    {
        "name": "Versicolor (should predict 1)", 
        "features": [6.0, 2.7, 4.8, 1.4]  # Typical Versicolor measurements
    },
    {
        "name": "Virginica (should predict 2)",
        "features": [7.2, 3.6, 6.1, 2.5]  # Typical Virginica measurements
    }
]

api_url = "http://api:8080/predict"  # Using container name
# If that doesn't work, try: api_url = "http://localhost:8080/predict"

print("🧪 Testing our containerized ML API...")
print("=" * 50)

for test_case in test_cases:
    try:
        response = requests.post(
            api_url,
            json={"features": test_case["features"]},
            timeout=10
        )
        
        if response.status_code == 200:
            prediction = response.json()["prediction"]
            class_name = iris.target_names[prediction]
            print(f"✅ {test_case['name']}")
            print(f"   Features: {test_case['features']}")
            print(f"   Prediction: {prediction} ({class_name})")
        else:
            print(f"❌ Error: {response.status_code} - {response.text}")
            
    except requests.exceptions.RequestException as e:
        print(f"❌ Connection error: {e}")
        print("💡 Make sure the API container is running!")
    
    print("-" * 30)

print("🎉 That's MLOps in action!")
print("📊 Experiment tracking ✓")
print("📝 Model registry ✓") 
print("🚀 Containerized API ✓")