# Class II: Infrastructure as Code for MLOps

## 🌱 Project: Bonsai Species Classifier for Plant E-commerce

Welcome to our hands-on MLOps session! We're building a **bonsai species classifier** for a plant website that will:
- **Identify bonsai species** from plant measurements
- **Provide care recommendations** based on species type
- **Help customers** choose the right bonsai for their needs

### Infrastructure Stack (All Containerized!)
- **MLflow**: For experiment tracking and model registry
- **Docker**: All services are containerized (no local installation needed!)
- **JupyterLab**: You're running this from a container right now
- **API**: Model serving for the plant website

## Quick Setup Check
Make sure all containers are running:
- MLflow UI: http://localhost:5000 (track bonsai model experiments)
- JupyterLab: http://localhost:8888 (you're here!)
- API: http://localhost:8080 (bonsai species prediction service)

Let's start building our bonsai classifier! 🌳

In [None]:
# First, let's install MLflow in this container and set up the connection
import subprocess
import sys

# Install MLflow if not already installed
try:
    import mlflow
    print("✅ MLflow already installed")
except ImportError:
    print("📦 Installing MLflow...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "mlflow"])
    import mlflow

# Set the MLflow tracking URI to connect to our containerized MLflow server
import os
os.environ['MLFLOW_TRACKING_URI'] = 'http://mlflow:5000'
mlflow.set_tracking_uri('http://mlflow:5000')

print(f"🎯 MLflow Tracking URI: {mlflow.get_tracking_uri()}")
print("🚀 Ready to track experiments!")

# 🌳 Bonsai Species Classification with MLflow

Now let's train our bonsai species classifier and track the experiment. We'll classify 4 types of bonsai:
- **Juniper Bonsai** (0): Hardy, needle-like foliage
- **Ficus Bonsai** (1): Broad leaves, aerial roots  
- **Pine Bonsai** (2): Long needles, rugged bark
- **Maple Bonsai** (3): Distinctive lobed leaves

We'll track:
- **Parameters**: Model configuration (n_estimators, etc.)
- **Metrics**: Classification performance (accuracy, etc.)
- **Artifacts**: The trained bonsai classifier model

Check the MLflow UI at http://localhost:5000 to see your bonsai classification experiments! 🌱

In [None]:
import mlflow.sklearn
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import pandas as pd
import numpy as np

# Create a simulated bonsai dataset (replacing Iris for our bonsai classifier)
# Features: leaf_length, leaf_width, branch_thickness, height
X, y = make_classification(
    n_samples=300,
    n_features=4,
    n_classes=4,
    n_informative=4,
    n_redundant=0,
    random_state=42
)

# Add realistic feature names and scaling for bonsai measurements
feature_names = ['leaf_length_cm', 'leaf_width_cm', 'branch_thickness_mm', 'height_cm']
bonsai_species = ['Juniper', 'Ficus', 'Pine', 'Maple']

# Scale features to realistic bonsai measurements
X[:, 0] = X[:, 0] * 0.5 + 2.0  # leaf_length: 1.5-2.5 cm
X[:, 1] = X[:, 1] * 0.3 + 1.5  # leaf_width: 1.2-1.8 cm
X[:, 2] = X[:, 2] * 2.0 + 5.0  # branch_thickness: 3-7 mm
X[:, 3] = X[:, 3] * 10.0 + 25.0  # height: 15-35 cm

# Create DataFrame for better visualization
bonsai_df = pd.DataFrame(X, columns=feature_names)
bonsai_df['species'] = [bonsai_species[i] for i in y]

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("🌱 Bonsai Dataset created:")
print(f"Training samples: {X_train.shape[0]}")
print(f"Test samples: {X_test.shape[0]}")
print(f"Features: {feature_names}")
print(f"Species classes: {bonsai_species}")

# Show sample data
print("\n📊 Sample bonsai measurements:")
print(bonsai_df.head())

In [None]:
# Experiências com diferentes configurações para comparar desempenho
import mlflow
import mlflow.sklearn
from sklearn.metrics import precision_score, recall_score, f1_score
import warnings
warnings.filterwarnings('ignore')

# Configurar experiência para bonsai classification
mlflow.set_experiment("Bonsai-Species-Classification")

# Lista de configurações para testar
experiment_configs = [
    {
        "name": "baseline_model",
        "n_estimators": 50,
        "max_depth": 3,
        "min_samples_split": 2,
        "description": "Modelo baseline com configurações conservadoras"
    },
    {
        "name": "balanced_model", 
        "n_estimators": 100,
        "max_depth": 5,
        "min_samples_split": 3,
        "description": "Modelo balanceado entre complexidade e performance"
    },
    {
        "name": "complex_model",
        "n_estimators": 200,
        "max_depth": 8,
        "min_samples_split": 2,
        "description": "Modelo mais complexo para máxima performance"
    },
    {
        "name": "optimized_model",
        "n_estimators": 150,
        "max_depth": 6,
        "min_samples_split": 4,
        "description": "Modelo otimizado baseado nos resultados anteriores"
    }
]

print("🧪 Executando múltiplos experiências para comparação...")
print("=" * 60)

# Executar experiências
experiment_results = []

for config in experiment_configs:
    with mlflow.start_run(run_name=config["name"]):
        # Treinar modelo com configuração específica
        bonsai_classifier = RandomForestClassifier(
            n_estimators=config["n_estimators"],
            max_depth=config["max_depth"],
            min_samples_split=config["min_samples_split"],
            random_state=42
        )
        
        bonsai_classifier.fit(X_train, y_train)
        preds = bonsai_classifier.predict(X_test)
        
        # Calcular métricas mais detalhadas
        accuracy = accuracy_score(y_test, preds)
        precision = precision_score(y_test, preds, average='weighted')
        recall = recall_score(y_test, preds, average='weighted')
        f1 = f1_score(y_test, preds, average='weighted')
        
        # Log parâmetros
        mlflow.log_params({
            "n_estimators": config["n_estimators"],
            "max_depth": config["max_depth"],
            "min_samples_split": config["min_samples_split"],
            "model_type": "RandomForestClassifier",
            "dataset": "Bonsai Species",
            "n_species": len(bonsai_species),
            "description": config["description"]
        })
        
        # Log métricas
        mlflow.log_metrics({
            "accuracy": accuracy,
            "precision": precision,
            "recall": recall,
            "f1_score": f1,
            "test_samples": len(y_test),
            "training_samples": len(y_train)
        })
        
        # Log modelo
        mlflow.sklearn.log_model(
            bonsai_classifier, 
            name="bonsai_classifier",
            signature=mlflow.models.infer_signature(X_train, y_train)
        )
        
        # Guardar resultados para comparação
        experiment_results.append({
            "name": config["name"],
            "accuracy": accuracy,
            "precision": precision,
            "recall": recall,
            "f1_score": f1,
            "description": config["description"]
        })
        
        print(f"✅ {config['name']}: Accuracy={accuracy:.3f}, F1={f1:.3f}")

print("\n📊 Resumo dos Experiências:")
print("-" * 60)
for result in sorted(experiment_results, key=lambda x: x['accuracy'], reverse=True):
    print(f"🏆 {result['name']}: {result['accuracy']:.3f} acc | {result['f1_score']:.3f} f1")
    print(f"   📝 {result['description']}")

print(f"\n🌐 Compare experiências no MLflow UI: http://localhost:5000")
print("💡 Use o 'Compare' para visualizar diferenças lado a lado!")

# 🌱 MLflow Model Registry and Experiment Comparison

## Model Registry - Model Management
The **MLflow Model Registry** (v3.3.1) is a centralized repository to manage the model lifecycle:

### Main Features:
- **🔄 Versioning**: Each registered model receives a unique version
- **📋 Stages**: None → Staging → Production → Archived
- **🏷️ Tags and Annotations**: Custom metadata for organization
- **🔍 Lineage**: Tracking model origin (experiment/run)
- **🚀 Deploy**: Integration with deployment systems

### Recommended Workflow:
1. **Experiments**: Multiple runs to find the best model
2. **Registration**: Register the best model in the registry
3. **Staging**: Test model in staging environment
4. **Production**: Promote after complete validation
5. **Monitoring**: Track performance in production

## Experiment Comparison
Use comparison features for:
- **📊 Side-by-side metrics**: Accuracy, F1-score, Precision, Recall
- **⚙️ Hyperparameters**: Compare different configurations
- **📈 Visualizations**: Automatic performance charts
- **🔄 Reproducibility**: All details to reproduce results

### MLflow 3.3.1 Tip:
Use `mlflow.autolog()` for automatic capture of metrics, parameters and artifacts!

In [None]:
# Selecionar e registrar o melhor modelo baseado nos experiências
from mlflow.tracking import MlflowClient
import mlflow.pyfunc

client = MlflowClient()

# Buscar o melhor run baseado na métrica accuracy
experiment = mlflow.get_experiment_by_name("Bonsai-Species-Classification")
best_run = client.search_runs(
    experiment_ids=[experiment.experiment_id],
    order_by=["metrics.accuracy DESC"],
    max_results=1
)[0]

print(f"🏆 Melhor modelo encontrado:")
print(f"   Run ID: {best_run.info.run_id}")
print(f"   Accuracy: {best_run.data.metrics['accuracy']:.3f}")
print(f"   F1-Score: {best_run.data.metrics['f1_score']:.3f}")
print(f"   Configuração: {best_run.data.params}")

# Registrar o melhor modelo no Model Registry
model_name = "Bonsai-Species-Classifier-Production"
model_uri = f"runs:/{best_run.info.run_id}/model"

try:
    # Registrar nova versão do modelo
    model_version = mlflow.register_model(
        model_uri=model_uri,
        name=model_name,
        description=f"Classificador de espécies de bonsai para e-commerce de plantas. "
                   f"Accuracy: {best_run.data.metrics['accuracy']:.3f}, "
                   f"F1-Score: {best_run.data.metrics['f1_score']:.3f}"
    )
    
    print(f"\n✅ Modelo registrado com sucesso!")
    print(f"📦 Nome: {model_name}")
    print(f"🔢 Versão: {model_version.version}")
    print(f"🌐 MLflow UI: http://localhost:5000/#/models/{model_name}")
    
    # Adicionar tags e anotações ao modelo registrado
    client.set_model_version_tag(
        name=model_name,
        version=model_version.version,
        key="use_case",
        value="plant_ecommerce_classification"
    )
    
    client.set_model_version_tag(
        name=model_name,
        version=model_version.version,
        key="model_type",
        value="RandomForestClassifier"
    )
    
    # Atualizar descrição com informações detalhadas
    client.update_model_version(
        name=model_name,
        version=model_version.version,
        description=f"""
        🌳 Classificador de Espécies de Bonsai para E-commerce
        
        **Performance:**
        - Accuracy: {best_run.data.metrics['accuracy']:.3f}
        - Precision: {best_run.data.metrics['precision']:.3f}
        - Recall: {best_run.data.metrics['recall']:.3f}
        - F1-Score: {best_run.data.metrics['f1_score']:.3f}
        
        **Configuração:**
        - Estimators: {best_run.data.params['n_estimators']}
        - Max Depth: {best_run.data.params['max_depth']}
        - Min Samples Split: {best_run.data.params['min_samples_split']}
        
        **Casos de Uso:**
        - Identificação automática de espécies de bonsai
        - Recomendações de cuidados baseadas na espécie
        - Suporte à decisão de compra para clientes
        
        **Classes:** Juniper (0), Ficus (1), Pine (2), Maple (3)
        """
    )
    
    print(f"�️  Tags e descrição adicionadas ao modelo")
    
except Exception as e:
    print(f"❌ Erro ao registrar modelo: {e}")
    print("💡 Verifique se o MLflow server está executando")

print(f"\n📈 Próximos passos:")
print(f"1. Revisar modelo no MLflow UI")
print(f"2. Testar modelo em staging")
print(f"3. Promover para produção quando aprovado")