# Class II: Infrastructure as Code for MLOps

## 🌱 Project: Bonsai Species Classifier for Plant E-commerce

Welcome to our hands-on MLOps session! We're building a **bonsai species classifier** for a plant website that will:
- **Identify bonsai species** from plant measurements
- **Provide care recommendations** based on species type
- **Help customers** choose the right bonsai for their needs

### Infrastructure Stack (All Containerized!)
- **MLflow**: For experiment tracking and model registry
- **Docker**: All services are containerized (no local installation needed!)
- **JupyterLab**: You're running this from a container right now
- **API**: Model serving for the plant website

## Quick Setup Check
Make sure all containers are running:
- MLflow UI: http://localhost:5000 (track bonsai model experiments)
- JupyterLab: http://localhost:8888 (you're here!)
- API: http://localhost:8080 (bonsai species prediction service)

Let's start building our bonsai classifier! 🌳

In [None]:
# First, let's install MLflow in this container and set up the connection
import subprocess
import sys

# Install MLflow if not already installed
try:
    import mlflow
    print("✅ MLflow already installed")
except ImportError:
    print("📦 Installing MLflow...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "mlflow"])
    import mlflow

# Set the MLflow tracking URI to connect to our containerized MLflow server
import os
os.environ['MLFLOW_TRACKING_URI'] = 'http://mlflow:5000'
mlflow.set_tracking_uri('http://mlflow:5000')

print(f"🎯 MLflow Tracking URI: {mlflow.get_tracking_uri()}")
print("🚀 Ready to track experiments!")

# 🌳 Bonsai Species Classification with MLflow

Now let's train our bonsai species classifier and track the experiment. We'll classify 4 types of bonsai:
- **Juniper Bonsai** (0): Hardy, needle-like foliage
- **Ficus Bonsai** (1): Broad leaves, aerial roots  
- **Pine Bonsai** (2): Long needles, rugged bark
- **Maple Bonsai** (3): Distinctive lobed leaves

We'll track:
- **Parameters**: Model configuration (n_estimators, etc.)
- **Metrics**: Classification performance (accuracy, etc.)
- **Artifacts**: The trained bonsai classifier model

Check the MLflow UI at http://localhost:5000 to see your bonsai classification experiments! 🌱

In [None]:
import mlflow.sklearn
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import pandas as pd
import numpy as np

# Create a simulated bonsai dataset (replacing Iris for our bonsai classifier)
# Features: leaf_length, leaf_width, branch_thickness, height
X, y = make_classification(
    n_samples=300,
    n_features=4,
    n_classes=4,
    n_informative=4,
    n_redundant=0,
    random_state=42
)

# Add realistic feature names and scaling for bonsai measurements
feature_names = ['leaf_length_cm', 'leaf_width_cm', 'branch_thickness_mm', 'height_cm']
bonsai_species = ['Juniper', 'Ficus', 'Pine', 'Maple']

# Scale features to realistic bonsai measurements
X[:, 0] = X[:, 0] * 0.5 + 2.0  # leaf_length: 1.5-2.5 cm
X[:, 1] = X[:, 1] * 0.3 + 1.5  # leaf_width: 1.2-1.8 cm
X[:, 2] = X[:, 2] * 2.0 + 5.0  # branch_thickness: 3-7 mm
X[:, 3] = X[:, 3] * 10.0 + 25.0  # height: 15-35 cm

# Create DataFrame for better visualization
bonsai_df = pd.DataFrame(X, columns=feature_names)
bonsai_df['species'] = [bonsai_species[i] for i in y]

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("🌱 Bonsai Dataset created:")
print(f"Training samples: {X_train.shape[0]}")
print(f"Test samples: {X_test.shape[0]}")
print(f"Features: {feature_names}")
print(f"Species classes: {bonsai_species}")

# Show sample data
print("\n📊 Sample bonsai measurements:")
print(bonsai_df.head())

In [None]:
# Start MLflow experiment tracking for bonsai classification
mlflow.set_experiment("Bonsai-Species-Classification")

with mlflow.start_run():
    # Model parameters
    n_estimators = 100
    max_depth = 3
    
    # Train the bonsai classifier
    bonsai_classifier = RandomForestClassifier(
        n_estimators=n_estimators, 
        max_depth=max_depth,
        random_state=42
    )
    bonsai_classifier.fit(X_train, y_train)
    
    # Make predictions
    preds = bonsai_classifier.predict(X_test)
    acc = accuracy_score(y_test, preds)
    
    # Log parameters to MLflow
    mlflow.log_param('n_estimators', n_estimators)
    mlflow.log_param('max_depth', max_depth)
    mlflow.log_param('model_type', 'RandomForestClassifier')
    mlflow.log_param('dataset', 'Bonsai Species')
    mlflow.log_param('n_species', len(bonsai_species))
    
    # Log metrics to MLflow
    mlflow.log_metric('accuracy', acc)
    mlflow.log_metric('test_samples', len(y_test))
    
    # Log the trained bonsai classifier directly as an MLflow artifact
    mlflow.sklearn.log_model(bonsai_classifier, name="bonsai_classifier")
    
    print(f'\n✅ Bonsai classifier trained!')
    print(f'🎯 Accuracy: {acc:.3f}')
    print(f'🌳 Species classified: {bonsai_species}')
    print(f'📝 Experiment logged to MLflow')
    print(f'🌐 View at: http://localhost:5000')

# 🌱 Bonsai Model Registry

Now let's register our best bonsai classifier in the MLflow Model Registry. This allows us to:
- **Version our bonsai models** as we improve them
- **Promote models through stages** (Staging → Production) for the plant website
- **Track model lineage and metadata** for different bonsai classification approaches

The Model Registry is like a "model store" for the plant e-commerce website.

**Note**: First, let's check if our MLflow server supports Model Registry.

In [None]:
# Register the bonsai classifier in MLflow Model Registry
model_name = 'Bonsai-Species-Classifier-Production'

# Let's train a better bonsai classifier for the plant website
with mlflow.start_run():
    # Improved hyperparameters for better bonsai classification
    n_estimators = 200
    max_depth = 6
    min_samples_split = 3
    
    bonsai_classifier = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        random_state=42
    )
    bonsai_classifier.fit(X_train, y_train)
    preds = bonsai_classifier.predict(X_test)
    acc = accuracy_score(y_test, preds)
    
    # Log everything for the bonsai model
    mlflow.log_param('n_estimators', n_estimators)
    mlflow.log_param('max_depth', max_depth)
    mlflow.log_param('min_samples_split', min_samples_split)
    mlflow.log_param('model_purpose', 'Plant E-commerce Website')
    mlflow.log_metric('accuracy', acc)
    
    # Log and register the bonsai classifier
    mlflow.sklearn.log_model(
        bonsai_classifier, 
        name='bonsai_classifier',
        registered_model_name=model_name
    )
    
    print(f'✅ Bonsai classifier trained and logged!')
    print(f'🎯 Accuracy: {acc:.3f}')
    print(f'🌳 Model registered: {model_name}')
    print(f'� Go to MLflow UI → Models tab to promote to Production!')
    print(f'🌐 http://localhost:5000/#/models/{model_name}')

In [None]:
# Promote bonsai classifier to Production (typical MLOps workflow for plant website)
from mlflow.tracking import MlflowClient

try:
    client = MlflowClient()
    
    # Check if bonsai model exists
    try:
        model_info = client.get_registered_model(model_name)
        print(f'📦 Found bonsai model: {model_name}')
    except:
        print(f'❌ Bonsai model {model_name} not found in registry')
        print('💡 Please run the previous cell to register the bonsai model first')
        raise
    
    # Get the latest version
    model_versions = client.get_latest_versions(model_name)
    if not model_versions:
        print(f'❌ No versions found for bonsai model {model_name}')
        raise Exception("No bonsai model versions available")
    
    latest_version = model_versions[0].version
    print(f'🔍 Latest bonsai model version: {latest_version}')
    
    # Transition to Staging first (for plant website testing)
    client.transition_model_version_stage(
        name=model_name,
        version=latest_version,
        stage="Staging"
    )
    print(f'📋 Version {latest_version} moved to Staging for plant website testing')
    
    # After review, promote to Production (live plant website)
    client.transition_model_version_stage(
        name=model_name,
        version=latest_version,
        stage="Production"
    )
    print(f'🚀 Version {latest_version} promoted to Production!')
    print(f'✅ Bonsai classifier ready for the plant website!')
    print(f'� Check MLflow UI: http://localhost:5000/#/models/{model_name}')
    
except Exception as e:
    print(f'⚠️ Bonsai model promotion failed: {e}')
    print('� Possible solutions:')
    print('   1. Make sure MLflow server is running')
    print('   2. Register a bonsai model first (run previous cells)')
    print('   3. Check MLflow UI for registered models')

# 🌳 Testing Our Bonsai Classification API

Finally, let's test our bonsai species prediction API! The API is running at http://localhost:8080

**Important**: Before testing, go to the MLflow UI and promote your bonsai model to "Production" stage:
1. Go to http://localhost:5000
2. Click "Models" tab
3. Click on "Bonsai-Species-Classifier-Production"
4. Click "Stage Transition" → "Production"

This API will help customers identify bonsai species on the plant website! 🌱

In [None]:
# Promote model to Production (typical MLOps workflow)
from mlflow.tracking import MlflowClient

try:
    client = MlflowClient()
    
    # Check if model exists
    try:
        model_info = client.get_registered_model(model_name)
        print(f'📦 Found model: {model_name}')
    except:
        print(f'❌ Model {model_name} not found in registry')
        print('💡 Please run the previous cell to register the model first')
        raise
    
    # Get the latest version
    model_versions = client.get_latest_versions(model_name)
    if not model_versions:
        print(f'❌ No versions found for model {model_name}')
        raise Exception("No model versions available")
    
    latest_version = model_versions[0].version
    print(f'🔍 Latest version: {latest_version}')
    
    # Transition to Staging first
    client.transition_model_version_stage(
        name=model_name,
        version=latest_version,
        stage="Staging"
    )
    print(f'📋 Version {latest_version} moved to Staging')
    
    # After review, promote to Production
    client.transition_model_version_stage(
        name=model_name,
        version=latest_version,
        stage="Production"
    )
    print(f'🚀 Version {latest_version} promoted to Production!')
    print(f'✅ Ready for inference!')
    print(f'🌐 Check MLflow UI: http://localhost:5000/#/models/{model_name}')
    
except Exception as e:
    print(f'⚠️ Model promotion failed: {e}')
    print('💡 Possible solutions:')
    print('   1. Make sure MLflow server is running')
    print('   2. Register a model first (run previous cells)')
    print('   3. Check MLflow UI for registered models')

# Testing Our API

Finally, let's test our model-serving API! The API is running at http://localhost:8080

**Important**: Before testing, go to the MLflow UI and promote your model to "Production" stage:
1. Go to http://localhost:5000
2. Click "Models" tab
3. Click on "Bonsai-Species-Classification"
4. Click "Stage Transition" → "Production"

In [15]:
import requests
import json

# Test data - realistic bonsai measurements for different species
bonsai_test_cases = [
    {
        "name": "Juniper Bonsai (should predict 0)",
        "features": [1.8, 1.3, 4.2, 22.0],  # Small leaves, thin branches
        "description": "Hardy evergreen with needle-like foliage"
    },
    {
        "name": "Ficus Bonsai (should predict 1)", 
        "features": [2.3, 1.7, 6.1, 28.5],  # Broader leaves, thicker trunk
        "description": "Broad leaves with aerial root potential"
    },
    {
        "name": "Pine Bonsai (should predict 2)",
        "features": [2.1, 1.2, 5.8, 31.2],  # Long needles, moderate thickness
        "description": "Classic pine with distinctive needle clusters"
    },
    {
        "name": "Maple Bonsai (should predict 3)",
        "features": [2.4, 1.9, 5.2, 26.8],  # Distinctive lobed leaves
        "description": "Beautiful seasonal color changes"
    }
]

api_url = "http://api:8080/predict"  # Using container name
# If that doesn't work, try: api_url = "http://localhost:8080/predict"

print("🧪 Testing our bonsai classification API...")
print("=" * 60)

for test_case in bonsai_test_cases:
    try:
        response = requests.post(
            api_url,
            json={"features": test_case["features"]},
            timeout=10
        )
        
        if response.status_code == 200:
            prediction = response.json()["prediction"]
            species_name = bonsai_species[prediction]
            print(f"✅ {test_case['name']}")
            print(f"   Measurements: {test_case['features']}")
            print(f"   🌳 Predicted: {prediction} ({species_name})")
            print(f"   📝 {test_case['description']}")
        else:
            print(f"❌ Error: {response.status_code} - {response.text}")
            
    except requests.exceptions.RequestException as e:
        print(f"❌ Connection error: {e}")
        print("💡 Make sure the API container is running!")
    
    print("-" * 40)

print("🎉 That's MLOps for a Plant E-commerce Website!")
print("📊 Experiment tracking ✓")
print("📝 Model registry ✓") 
print("🚀 Containerized bonsai classification API ✓")
print("🌱 Ready to help customers identify their bonsai! ✓")

🧪 Testing our bonsai classification API...
❌ Error: 500 - {"detail":"Prediction error: 'DecisionTreeClassifier' object has no attribute 'monotonic_cst'"}
----------------------------------------
❌ Error: 500 - {"detail":"Prediction error: 'DecisionTreeClassifier' object has no attribute 'monotonic_cst'"}
----------------------------------------
❌ Error: 500 - {"detail":"Prediction error: 'DecisionTreeClassifier' object has no attribute 'monotonic_cst'"}
----------------------------------------
❌ Error: 500 - {"detail":"Prediction error: 'DecisionTreeClassifier' object has no attribute 'monotonic_cst'"}
----------------------------------------
🎉 That's MLOps for a Plant E-commerce Website!
📊 Experiment tracking ✓
📝 Model registry ✓
🚀 Containerized bonsai classification API ✓
🌱 Ready to help customers identify their bonsai! ✓
