## MLflow 3.3.1 Best Practices and MLOps Workflow

This notebook demonstrates a complete MLOps workflow using MLflow 3.3.1, including:

- **Experiment Tracking:** Logging parameters, metrics, and artifacts (confusion matrix, predictions CSV) for reproducibility.
- **Model Registry:** Registering models, managing lifecycle stages (Staging, Production), and promoting models when approved.
- **Artifact Logging:** Ensuring all relevant outputs are saved for future analysis and auditability.
- **Automation:** GitHub Actions for notebook cleaning and dependency updates to maintain code quality and security.

The workflow is designed for educational clarity, following best practices for experiment management, model versioning, and CI/CD automation in MLOps projects.

# Class II: Infrastructure as Code for MLOps

## 🌱 Project: Bonsai Species Classifier for Plant E-commerce

Welcome to our hands-on MLOps session! We're building a **bonsai species classifier** for a plant website that will:
- **Identify bonsai species** from plant measurements
- **Provide care recommendations** based on species type
- **Help customers** choose the right bonsai for their needs

### Infrastructure Stack (All Containerized!)
- **MLflow**: For experiment tracking and model registry
- **Docker**: All services are containerized (no local installation needed!)
- **JupyterLab**: You're running this from a container right now
- **API**: Model serving for the plant website

## Quick Setup Check
Make sure all containers are running:
- MLflow UI: http://localhost:5001 (track bonsai model experiments)
- JupyterLab: http://localhost:8888 (you're here!)
- API: http://localhost:8080 (bonsai species prediction service)

Let's start building our bonsai classifier! 🌳

In [2]:
# First, let's install MLflow in this container and set up the connection
import subprocess
import sys

# Install MLflow if not already installed
try:
    import mlflow
    print("✅ MLflow already installed")
except ImportError:
    print("📦 Installing MLflow...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "mlflow==3.3.1"])
    import mlflow

# Set the MLflow tracking URI to connect to our containerized MLflow server
import os
os.environ['MLFLOW_TRACKING_URI'] = 'http://mlflow:5000'
mlflow.set_tracking_uri('http://mlflow:5000')

print(f"🎯 MLflow Tracking URI: {mlflow.get_tracking_uri()}")
print("🚀 Ready to track experiments!")

✅ MLflow already installed
🎯 MLflow Tracking URI: http://mlflow:5000
🚀 Ready to track experiments!


# 🌳 Bonsai Species Classification with MLflow

Now let's train our bonsai species classifier and track the experiment. We'll classify 4 types of bonsai:
- **Juniper Bonsai** (0): Hardy, needle-like foliage
- **Ficus Bonsai** (1): Broad leaves, aerial roots  
- **Pine Bonsai** (2): Long needles, rugged bark
- **Maple Bonsai** (3): Distinctive lobed leaves

We'll track:
- **Parameters**: Model configuration (n_estimators, etc.)
- **Metrics**: Classification performance (accuracy, etc.)
- **Artifacts**: The trained bonsai classifier model

Check the MLflow UI at http://localhost:5001 to see your bonsai classification experiments! 🌱

In [3]:
import mlflow.sklearn
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import pandas as pd
import numpy as np

# Create a simulated bonsai dataset (replacing Iris for our bonsai classifier)
# Features: leaf_length, leaf_width, branch_thickness, height
X, y = make_classification(
    n_samples=300,
    n_features=4,
    n_classes=4,
    n_informative=4,
    n_redundant=0,
    random_state=42
)

# Add realistic feature names and scaling for bonsai measurements
feature_names = ['leaf_length_cm', 'leaf_width_cm', 'branch_thickness_mm', 'height_cm']
bonsai_species = ['Juniper', 'Ficus', 'Pine', 'Maple']

# Scale features to realistic bonsai measurements
X[:, 0] = X[:, 0] * 0.5 + 2.0  # leaf_length: 1.5-2.5 cm
X[:, 1] = X[:, 1] * 0.3 + 1.5  # leaf_width: 1.2-1.8 cm
X[:, 2] = X[:, 2] * 2.0 + 5.0  # branch_thickness: 3-7 mm
X[:, 3] = X[:, 3] * 10.0 + 25.0  # height: 15-35 cm

# Create DataFrame for better visualization
bonsai_df = pd.DataFrame(X, columns=feature_names)
bonsai_df['species'] = [bonsai_species[i] for i in y]

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("🌱 Bonsai Dataset created:")
print(f"Training samples: {X_train.shape[0]}")
print(f"Test samples: {X_test.shape[0]}")
print(f"Features: {feature_names}")
print(f"Species classes: {bonsai_species}")

# Show sample data
print("\n📊 Sample bonsai measurements:")
print(bonsai_df.head())

🌱 Bonsai Dataset created:
Training samples: 240
Test samples: 60
Features: ['leaf_length_cm', 'leaf_width_cm', 'branch_thickness_mm', 'height_cm']
Species classes: ['Juniper', 'Ficus', 'Pine', 'Maple']

📊 Sample bonsai measurements:
   leaf_length_cm  leaf_width_cm  branch_thickness_mm  height_cm species
0        1.380904       1.188396             1.650313  19.065886   Ficus
1        1.893503       1.425476             5.341917  30.000305   Maple
2        2.007036       1.334003             7.963199  24.248633   Maple
3        2.892563       1.966694             5.784386  14.109763   Maple
4        3.342822       2.035872             5.564911   9.854393   Maple


# 🆕 MLflow 3.3.1 Dataset Feature
MLflow Datasets allow you to track, version, and reuse input data for your experiments. This ensures reproducibility and makes it easy to compare results across different runs.
- **Track the exact data used for each run**
- **Version datasets for auditability**
- **Share and reuse datasets in future experiments**
Let's log our bonsai dataset using MLflow's new Dataset API.

In [None]:
"""
Log the bonsai dataset using MLflow Datasets (v3.3.1+)
import mlflow.data

bonsai_df.to_csv(f"bonsai_dataset_{date}.csv", index=False)

with mlflow.start_run() as run:
    dataset = mlflow.data.from_pandas(
        bonsai_df,
        source=f"bonsai_dataset_{date}.csv",
        name="bonsai_species_measurements"
    )
    mlflow.log_input(dataset, context="training")
    print("✅ Bonsai dataset logged as MLflow Dataset!")
    run_id = run.info.run_id
"""

In [5]:
# Experiments with different configurations to compare performance
import mlflow
import mlflow.sklearn
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import os
import mlflow.data
from datetime import datetime

date = datetime.today().strftime('%Y-%m-%d')

# First thing version our dataset
bonsai_df.to_csv(f"bonsai_dataset_{date}.csv", index=False)

dataset = mlflow.data.from_pandas(
    bonsai_df,
    source=f"bonsai_dataset_{date}.csv",
    name=f"bonsai_species_measurements_{date}",
    targets="species" if "species" in bonsai_df.columns else None,
)
# Set up experiment for bonsai classification
mlflow.set_experiment("Bonsai-Species-Classification")

# List of configurations to test
experiment_configs = [
    {
        "name": "baseline_model",
        "n_estimators": 50,
        "max_depth": 3,
        "min_samples_split": 2,
        "description": "Baseline model with conservative settings"
    },
    {
        "name": "balanced_model", 
        "n_estimators": 100,
        "max_depth": 5,
        "min_samples_split": 3,
        "description": "Balanced model between complexity and performance"
    },
    {
        "name": "complex_model",
        "n_estimators": 200,
        "max_depth": 8,
        "min_samples_split": 2,
        "description": "More complex model for maximum performance"
    },
    {
        "name": "optimized_model",
        "n_estimators": 150,
        "max_depth": 6,
        "min_samples_split": 4,
        "description": "Optimized model based on previous results"
    }
]

print("🧪 Running multiple experiments for comparison...")
print("=" * 60)

# Run experiments
experiment_results = []

for config in experiment_configs:
    with mlflow.start_run(run_name=config["name"]):

        # Log the bonsai dataset using MLflow Datasets
        mlflow.log_input(dataset, context="training")

        
        # Train model with specific configuration
        bonsai_classifier = RandomForestClassifier(
            n_estimators=config["n_estimators"],
            max_depth=config["max_depth"],
            min_samples_split=config["min_samples_split"],
            random_state=42
        )
        

        bonsai_classifier.fit(X_train, y_train)
        preds = bonsai_classifier.predict(X_test)
        print("✅ Bonsai dataset logged as MLflow Dataset!")
        
        # Calculate detailed metrics
        accuracy = accuracy_score(y_test, preds)
        precision = precision_score(y_test, preds, average='weighted')
        recall = recall_score(y_test, preds, average='weighted')
        f1 = f1_score(y_test, preds, average='weighted')
        
        # Log parameters
        mlflow.log_params({
            "n_estimators": config["n_estimators"],
            "max_depth": config["max_depth"],
            "min_samples_split": config["min_samples_split"],
            "model_type": "RandomForestClassifier",
            "dataset": "Bonsai Species",
            "n_species": len(bonsai_species),
            "description": config["description"]
        })
        
        # Log metrics
        mlflow.log_metrics({
            "accuracy": accuracy,
            "precision": precision,
            "recall": recall,
            "f1_score": f1,
            "test_samples": len(y_test),
            "training_samples": len(y_train)
        })
        
        # Log model
        mlflow.sklearn.log_model(
            bonsai_classifier, 
            name="bonsai_classifier",
            signature=mlflow.models.infer_signature(X_train, y_train)
        )
        
        # Log confusion matrix as artifact
        cm = confusion_matrix(y_test, preds)
        fig, ax = plt.subplots(figsize=(6, 4))
        im = ax.imshow(cm, cmap='Blues')
        ax.set_title('Confusion Matrix')
        ax.set_xlabel('Predicted')
        ax.set_ylabel('Actual')
        plt.colorbar(im)
        plt.xticks(np.arange(len(bonsai_species)), bonsai_species)
        plt.yticks(np.arange(len(bonsai_species)), bonsai_species)
        for i in range(len(bonsai_species)):
            for j in range(len(bonsai_species)):
                ax.text(j, i, cm[i, j], ha='center', va='center', color='black')
        plt.tight_layout()
        cm_path = f"confusion_matrix_{config['name']}.png"
        plt.savefig(cm_path)
        plt.close(fig)
        mlflow.log_artifact(cm_path)
        os.remove(cm_path)

        # Log sample predictions as CSV artifact
        sample_df = pd.DataFrame({
            'actual': [bonsai_species[i] for i in y_test],
            'predicted': [bonsai_species[i] for i in preds]
        })
        sample_path = f"sample_predictions_{config['name']}.csv"
        sample_df.to_csv(sample_path, index=False)
        mlflow.log_artifact(sample_path)
        os.remove(sample_path)

        # Save results for comparison
        experiment_results.append({
            "name": config["name"],
            "accuracy": accuracy,
            "precision": precision,
            "recall": recall,
            "f1_score": f1,
            "description": config["description"]
        })
        
        print(f"✅ {config['name']}: Accuracy={accuracy:.3f}, F1={f1:.3f}")

print("\n📊 Experiment Summary:")
print("-" * 60)
for result in sorted(experiment_results, key=lambda x: x['accuracy'], reverse=True):
    print(f"🏆 {result['name']}: {result['accuracy']:.3f} acc | {result['f1_score']:.3f} f1")
    print(f"   📝 {result['description']}")

print(f"\n🌐 Compare experiments in MLflow UI: http://localhost:5001")
print("💡 Use 'Compare' to view differences side by side!")

🧪 Running multiple experiments for comparison...
✅ Bonsai dataset logged as MLflow Dataset!
✅ baseline_model: Accuracy=0.617, F1=0.621
🏃 View run baseline_model at: http://mlflow:5000/#/experiments/1/runs/6148af1389864d35921ebde8284c98df
🧪 View experiment at: http://mlflow:5000/#/experiments/1
✅ Bonsai dataset logged as MLflow Dataset!
✅ balanced_model: Accuracy=0.750, F1=0.753
🏃 View run balanced_model at: http://mlflow:5000/#/experiments/1/runs/bbe13636b0bb4f3484026731cef86897
🧪 View experiment at: http://mlflow:5000/#/experiments/1
✅ Bonsai dataset logged as MLflow Dataset!
✅ complex_model: Accuracy=0.767, F1=0.762
🏃 View run complex_model at: http://mlflow:5000/#/experiments/1/runs/f4d891f0d0174d9086da22ab32a60435
🧪 View experiment at: http://mlflow:5000/#/experiments/1
✅ Bonsai dataset logged as MLflow Dataset!
✅ optimized_model: Accuracy=0.817, F1=0.817
🏃 View run optimized_model at: http://mlflow:5000/#/experiments/1/runs/de7169403e2243fca71c7e67af63e447
🧪 View experiment at: ht

# 🌱 MLflow Model Registry and Experiment Comparison

## Model Registry - Model Management
The **MLflow Model Registry** is a centralized repository to manage the model lifecycle:

### Main Features:
- **🔄 Versioning**: Each registered model receives a unique version
- **📋 Stages**: None → Staging → Production → Archived
- **🏷️ Tags and Annotations**: Custom metadata for organization
- **🔍 Lineage**: Tracking model origin (experiment/run)
- **🚀 Deploy**: Integration with deployment systems

### Recommended Workflow:
1. **Experiments**: Multiple runs to find the best model
2. **Registration**: Register the best model in the registry
3. **Staging**: Test model in staging environment
4. **Production**: Promote after complete validation
5. **Monitoring**: Track performance in production

## Experiment Comparison
Use comparison features for:
- **📊 Side-by-side metrics**: Accuracy, F1-score, Precision, Recall
- **⚙️ Hyperparameters**: Compare different configurations
- **📈 Visualizations**: Automatic performance charts
- **🔄 Reproducibility**: All details to reproduce results

### MLflow Tip:
Use `mlflow.autolog()` for automatic capture of metrics, parameters and artifacts!

In [6]:
# Select and register the best model based on experiments
from mlflow.tracking import MlflowClient
import mlflow.pyfunc

client = MlflowClient()

# Find the best run based on accuracy metric
experiment = mlflow.get_experiment_by_name("Bonsai-Species-Classification")
best_run = client.search_runs(
    experiment_ids=[experiment.experiment_id],
    order_by=["metrics.accuracy DESC"],
    max_results=1
)[0]

print(f"🏆 Best model found:")
print(f"   Run ID: {best_run.info.run_id}")
print(f"   Accuracy: {best_run.data.metrics['accuracy']:.3f}")
print(f"   F1-Score: {best_run.data.metrics['f1_score']:.3f}")
print(f"   Configuration: {best_run.data.params}")

# Register the best model in the Model Registry
model_name = "Bonsai-Species-Classifier-Production"
model_uri = f"runs:/{best_run.info.run_id}/bonsai_classifier"  # Use correct artifact path

try:
    # Register new model version
    model_version = mlflow.register_model(
        model_uri=model_uri,
        name=model_name
    )
    print(f"\n✅ Model registered successfully!")
    print(f"📦 Name: {model_name}")
    print(f"🔢 Version: {model_version.version}")
    print(f"🌐 MLflow UI: http://localhost:5000/#/models/{model_name}")

    # Add tags and annotations to the registered model
    client.set_model_version_tag(
        name=model_name,
        version=model_version.version,
        key="use_case",
        value="plant_ecommerce_classification"
    )
    client.set_model_version_tag(
        name=model_name,
        version=model_version.version,
        key="model_type",
        value="RandomForestClassifier"
    )
    # Update description with detailed information
    client.update_model_version(
        name=model_name,
        version=model_version.version,
        description=f"""
        🌳 Bonsai Species Classifier for E-commerce

        **Performance:**
        - Accuracy: {best_run.data.metrics['accuracy']:.3f}
        - Precision: {best_run.data.metrics['precision']:.3f}
        - Recall: {best_run.data.metrics['recall']:.3f}
        - F1-Score: {best_run.data.metrics['f1_score']:.3f}

        **Configuration:**
        - Estimators: {best_run.data.params['n_estimators']}
        - Max Depth: {best_run.data.params['max_depth']}
        - Min Samples Split: {best_run.data.params['min_samples_split']}

        **Use Cases:**
        - Automatic bonsai species identification
        - Care recommendations based on species
        - Purchase decision support for customers

        **Classes:** Juniper (0), Ficus (1), Pine (2), Maple (3)
        """
    )
    print(f"�️  Tags and description added to the model")

    # Transition model version to Staging (best practice)
    client.transition_model_version_stage(
        name=model_name,
        version=model_version.version,
        stage="Staging"
    )
    print(f"🚦 Model version {model_version.version} transitioned to Staging.")

    # To promote to Production after validation:
    # client.transition_model_version_stage(
    #     name=model_name,
    #     version=model_version.version,
    #     stage="Production"
    # )
    # print(f"🚀 Model version {model_version.version} promoted to Production.")

except Exception as e:
    print(f"❌ Error registering model: {e}")
    print("💡 Check if the MLflow server is running")

print(f"\n📈 Next steps:")
print(f"1. Review model in MLflow UI")
print(f"2. Test model in staging")
print(f"3. Promote to production when approved")

Successfully registered model 'Bonsai-Species-Classifier-Production'.


🏆 Best model found:
   Run ID: de7169403e2243fca71c7e67af63e447
   Accuracy: 0.817
   F1-Score: 0.817
   Configuration: {'n_estimators': '150', 'max_depth': '6', 'min_samples_split': '4', 'model_type': 'RandomForestClassifier', 'dataset': 'Bonsai Species', 'n_species': '4', 'description': 'Optimized model based on previous results'}


2025/09/04 19:56:09 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: Bonsai-Species-Classifier-Production, version 1
Created version '1' of model 'Bonsai-Species-Classifier-Production'.



✅ Model registered successfully!
📦 Name: Bonsai-Species-Classifier-Production
🔢 Version: 1
🌐 MLflow UI: http://localhost:5000/#/models/Bonsai-Species-Classifier-Production
�️  Tags and description added to the model
🚦 Model version 1 transitioned to Staging.

📈 Next steps:
1. Review model in MLflow UI
2. Test model in staging
3. Promote to production when approved


# MLflow 3.3.1: Experiments, Artifact Logging, and Model Registry Best Practices

**Experiments & Artifact Logging**
- Track multiple experiments (runs) with parameters, metrics, and artifacts.
- Log artifacts (models, plots, files) using `mlflow.log_artifact()` or `mlflow.<flavor>.log_model()`.
- Artifacts help reproduce results and compare runs.

**Model Registry Workflow**
- Register your best model with `mlflow.register_model(model_uri, name)`.
- Registry supports model versioning and lifecycle stages: `None`, `Staging`, `Production`, `Archived`.
- Use MLflow UI or `MlflowClient` to transition model versions:
  - `client.transition_model_version_stage(name, version, stage="Staging")`
  - `client.transition_model_version_stage(name, version, stage="Production")`

**Identifying Staging/Production Versions**
- Assign tags and descriptions to model versions for clarity.
- Promote/demote versions between stages using UI or API.
- Only one version should be in `Production` at a time for a given model.

**Best Practices**
- Always log relevant artifacts for each run.
- Use clear naming and tagging for models and versions.
- Automate stage transitions as part of your CI/CD pipeline.


## 🔬 Test Model in Staging

Before promoting a model to production, it's essential to validate its performance in a controlled staging environment. Here, we load the model from the MLflow registry (in the 'Staging' stage) and run predictions on new data. This step ensures the model meets quality standards and behaves as expected before serving real users.

In [8]:
# Test model in staging
from mlflow.pyfunc import load_model

# Get the latest model version in staging
model_name = "Bonsai-Species-Classifier-Production"
from mlflow.tracking import MlflowClient
client = MlflowClient()
staging_versions = [v for v in client.get_latest_versions(model_name, stages=["Staging"])]
if staging_versions:
    staging_model_uri = f"models:/{model_name}/Staging"
    print(f"Loading model from: {staging_model_uri}")
    model = load_model(staging_model_uri)
    # Example prediction (using first test sample)
    sample = X_test[0].reshape(1, -1)
    pred = model.predict(sample)
    print(f"Predicted class for first test sample: {bonsai_species[pred[0]]}")
else:
    print("No model in Staging stage found.")

Loading model from: models:/Bonsai-Species-Classifier-Production/Staging
Predicted class for first test sample: Maple


## 🚀 Promote Model to Production

Once the model passes all tests in staging, it's ready to be promoted to production. This step updates the model's stage in the MLflow registry, making it available for real-world predictions. Production models should be monitored for performance and retrained as needed.

In [9]:
# Promote model to production when approved
model_name = "Bonsai-Species-Classifier-Production"
from mlflow.tracking import MlflowClient
client = MlflowClient()
staging_versions = [v for v in client.get_latest_versions(model_name, stages=["Staging"])]
if staging_versions:
    version = staging_versions[0].version
    client.transition_model_version_stage(
        name=model_name,
        version=version,
        stage="Production"
    )
    print(f"🚀 Model version {version} promoted to Production.")
else:
    print("No model in Staging stage to promote.")

🚀 Model version 1 promoted to Production.


# 🎓 What Did We Learn?

In this notebook, you:
- Built and tracked a bonsai species classifier using MLflow
- Logged parameters, metrics, and artifacts for reproducibility
- Registered your model and managed its lifecycle (Staging → Production)
- Validated model performance before production deployment

## 🛠️ Next Steps
- **Monitor** your production model for drift and performance
- **Automate** retraining and deployment with CI/CD pipelines
- **Serve** your model via an API for real-time predictions
- **Integrate** with business systems for end-to-end MLOps

Explore MLflow's advanced features and try deploying your own models in a real project!

In [10]:
# Test the Bonsai Species Classification API
import requests

# Example input features: [leaf_length_cm, leaf_width_cm, branch_thickness_mm, height_cm]
features = [5.2, 3.1, 2.0, 25.0]

# API
url = "http://api:8080/predict"
payload = {"features": features}

try:
    response = requests.post(url, json=payload)
    response.raise_for_status()
    result = response.json()
    print("Prediction result:", result)
except Exception as e:
    print(f"API call failed: {e}")

Prediction result: {'prediction': 3, 'species': 'Maple', 'confidence': 'high', 'care_recommendations': 'Needs partial shade, consistent moisture, protection from wind', 'input_features': {'leaf_length_cm': 5.2, 'leaf_width_cm': 3.1, 'branch_thickness_mm': 2.0, 'height_cm': 25.0}}
