## MLflow 3.3.1 Best Practices and MLOps Workflow

This notebook demonstrates a complete MLOps workflow using MLflow 3.3.1, including:

- **Experiment Tracking:** Logging parameters, metrics, and artifacts (confusion matrix, predictions CSV) for reproducibility.
- **Model Registry:** Registering models, managing lifecycle stages (Staging, Production), and promoting models when approved.
- **Artifact Logging:** Ensuring all relevant outputs are saved for future analysis and auditability.
- **Automation:** GitHub Actions for notebook cleaning and dependency updates to maintain code quality and security.

The workflow is designed for educational clarity, following best practices for experiment management, model versioning, and CI/CD automation in MLOps projects.

# Class II: Infrastructure as Code for MLOps

## üå± Project: Bonsai Species Classifier for Plant E-commerce

Welcome to our hands-on MLOps session! We're building a **bonsai species classifier** for a plant website that will:
- **Identify bonsai species** from plant measurements
- **Provide care recommendations** based on species type
- **Help customers** choose the right bonsai for their needs

### Infrastructure Stack (All Containerized!)
- **MLflow**: For experiment tracking and model registry
- **Docker**: All services are containerized (no local installation needed!)
- **JupyterLab**: You're running this from a container right now
- **API**: Model serving for the plant website

## Quick Setup Check
Make sure all containers are running:
- MLflow UI: http://localhost:5000 (track bonsai model experiments)
- JupyterLab: http://localhost:8888 (you're here!)
- API: http://localhost:8080 (bonsai species prediction service)

Let's start building our bonsai classifier! üå≥

In [None]:
# First, let's install MLflow in this container and set up the connection
import subprocess
import sys

# Install MLflow if not already installed
try:
    import mlflow
    print("‚úÖ MLflow already installed")
except ImportError:
    print("üì¶ Installing MLflow...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "mlflow==3.3.1"])
    import mlflow

# Set the MLflow tracking URI to connect to our containerized MLflow server
import os
os.environ['MLFLOW_TRACKING_URI'] = 'http://mlflow:5000'
mlflow.set_tracking_uri('http://mlflow:5000')

print(f"üéØ MLflow Tracking URI: {mlflow.get_tracking_uri()}")
print("üöÄ Ready to track experiments!")

# üå≥ Bonsai Species Classification with MLflow

Now let's train our bonsai species classifier and track the experiment. We'll classify 4 types of bonsai:
- **Juniper Bonsai** (0): Hardy, needle-like foliage
- **Ficus Bonsai** (1): Broad leaves, aerial roots  
- **Pine Bonsai** (2): Long needles, rugged bark
- **Maple Bonsai** (3): Distinctive lobed leaves

We'll track:
- **Parameters**: Model configuration (n_estimators, etc.)
- **Metrics**: Classification performance (accuracy, etc.)
- **Artifacts**: The trained bonsai classifier model

Check the MLflow UI at http://localhost:5000 to see your bonsai classification experiments! üå±

In [None]:
import mlflow.sklearn
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import pandas as pd
import numpy as np

# Create a simulated bonsai dataset (replacing Iris for our bonsai classifier)
# Features: leaf_length, leaf_width, branch_thickness, height
X, y = make_classification(
    n_samples=300,
    n_features=4,
    n_classes=4,
    n_informative=4,
    n_redundant=0,
    random_state=42
)

# Add realistic feature names and scaling for bonsai measurements
feature_names = ['leaf_length_cm', 'leaf_width_cm', 'branch_thickness_mm', 'height_cm']
bonsai_species = ['Juniper', 'Ficus', 'Pine', 'Maple']

# Scale features to realistic bonsai measurements
X[:, 0] = X[:, 0] * 0.5 + 2.0  # leaf_length: 1.5-2.5 cm
X[:, 1] = X[:, 1] * 0.3 + 1.5  # leaf_width: 1.2-1.8 cm
X[:, 2] = X[:, 2] * 2.0 + 5.0  # branch_thickness: 3-7 mm
X[:, 3] = X[:, 3] * 10.0 + 25.0  # height: 15-35 cm

# Create DataFrame for better visualization
bonsai_df = pd.DataFrame(X, columns=feature_names)
bonsai_df['species'] = [bonsai_species[i] for i in y]

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("üå± Bonsai Dataset created:")
print(f"Training samples: {X_train.shape[0]}")
print(f"Test samples: {X_test.shape[0]}")
print(f"Features: {feature_names}")
print(f"Species classes: {bonsai_species}")

# Show sample data
print("\nüìä Sample bonsai measurements:")
print(bonsai_df.head())

In [None]:
# Experiments with different configurations to compare performance
import mlflow
import mlflow.sklearn
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import os

# Set up experiment for bonsai classification
mlflow.set_experiment("Bonsai-Species-Classification")

# List of configurations to test
experiment_configs = [
    {
        "name": "baseline_model",
        "n_estimators": 50,
        "max_depth": 3,
        "min_samples_split": 2,
        "description": "Baseline model with conservative settings"
    },
    {
        "name": "balanced_model", 
        "n_estimators": 100,
        "max_depth": 5,
        "min_samples_split": 3,
        "description": "Balanced model between complexity and performance"
    },
    {
        "name": "complex_model",
        "n_estimators": 200,
        "max_depth": 8,
        "min_samples_split": 2,
        "description": "More complex model for maximum performance"
    },
    {
        "name": "optimized_model",
        "n_estimators": 150,
        "max_depth": 6,
        "min_samples_split": 4,
        "description": "Optimized model based on previous results"
    }
]

print("üß™ Running multiple experiments for comparison...")
print("=" * 60)

# Run experiments
experiment_results = []

for config in experiment_configs:
    with mlflow.start_run(run_name=config["name"]):
        # Train model with specific configuration
        bonsai_classifier = RandomForestClassifier(
            n_estimators=config["n_estimators"],
            max_depth=config["max_depth"],
            min_samples_split=config["min_samples_split"],
            random_state=42
        )
        
        bonsai_classifier.fit(X_train, y_train)
        preds = bonsai_classifier.predict(X_test)
        
        # Calculate detailed metrics
        accuracy = accuracy_score(y_test, preds)
        precision = precision_score(y_test, preds, average='weighted')
        recall = recall_score(y_test, preds, average='weighted')
        f1 = f1_score(y_test, preds, average='weighted')
        
        # Log parameters
        mlflow.log_params({
            "n_estimators": config["n_estimators"],
            "max_depth": config["max_depth"],
            "min_samples_split": config["min_samples_split"],
            "model_type": "RandomForestClassifier",
            "dataset": "Bonsai Species",
            "n_species": len(bonsai_species),
            "description": config["description"]
        })
        
        # Log metrics
        mlflow.log_metrics({
            "accuracy": accuracy,
            "precision": precision,
            "recall": recall,
            "f1_score": f1,
            "test_samples": len(y_test),
            "training_samples": len(y_train)
        })
        
        # Log model
        mlflow.sklearn.log_model(
            bonsai_classifier, 
            name="bonsai_classifier",
            signature=mlflow.models.infer_signature(X_train, y_train)
        )
        
        # Log confusion matrix as artifact
        cm = confusion_matrix(y_test, preds)
        fig, ax = plt.subplots(figsize=(6, 4))
        im = ax.imshow(cm, cmap='Blues')
        ax.set_title('Confusion Matrix')
        ax.set_xlabel('Predicted')
        ax.set_ylabel('Actual')
        plt.colorbar(im)
        plt.xticks(np.arange(len(bonsai_species)), bonsai_species)
        plt.yticks(np.arange(len(bonsai_species)), bonsai_species)
        for i in range(len(bonsai_species)):
            for j in range(len(bonsai_species)):
                ax.text(j, i, cm[i, j], ha='center', va='center', color='black')
        plt.tight_layout()
        cm_path = f"confusion_matrix_{config['name']}.png"
        plt.savefig(cm_path)
        plt.close(fig)
        mlflow.log_artifact(cm_path)
        os.remove(cm_path)

        # Log sample predictions as CSV artifact
        sample_df = pd.DataFrame({
            'actual': [bonsai_species[i] for i in y_test],
            'predicted': [bonsai_species[i] for i in preds]
        })
        sample_path = f"sample_predictions_{config['name']}.csv"
        sample_df.to_csv(sample_path, index=False)
        mlflow.log_artifact(sample_path)
        os.remove(sample_path)

        # Save results for comparison
        experiment_results.append({
            "name": config["name"],
            "accuracy": accuracy,
            "precision": precision,
            "recall": recall,
            "f1_score": f1,
            "description": config["description"]
        })
        
        print(f"‚úÖ {config['name']}: Accuracy={accuracy:.3f}, F1={f1:.3f}")

print("\nüìä Experiment Summary:")
print("-" * 60)
for result in sorted(experiment_results, key=lambda x: x['accuracy'], reverse=True):
    print(f"üèÜ {result['name']}: {result['accuracy']:.3f} acc | {result['f1_score']:.3f} f1")
    print(f"   üìù {result['description']}")

print(f"\nüåê Compare experiments in MLflow UI: http://localhost:5000")
print("üí° Use 'Compare' to view differences side by side!")

# üå± MLflow Model Registry and Experiment Comparison

## Model Registry - Model Management
The **MLflow Model Registry** (v3.3.1) is a centralized repository to manage the model lifecycle:

### Main Features:
- **üîÑ Versioning**: Each registered model receives a unique version
- **üìã Stages**: None ‚Üí Staging ‚Üí Production ‚Üí Archived
- **üè∑Ô∏è Tags and Annotations**: Custom metadata for organization
- **üîç Lineage**: Tracking model origin (experiment/run)
- **üöÄ Deploy**: Integration with deployment systems

### Recommended Workflow:
1. **Experiments**: Multiple runs to find the best model
2. **Registration**: Register the best model in the registry
3. **Staging**: Test model in staging environment
4. **Production**: Promote after complete validation
5. **Monitoring**: Track performance in production

## Experiment Comparison
Use comparison features for:
- **üìä Side-by-side metrics**: Accuracy, F1-score, Precision, Recall
- **‚öôÔ∏è Hyperparameters**: Compare different configurations
- **üìà Visualizations**: Automatic performance charts
- **üîÑ Reproducibility**: All details to reproduce results

### MLflow 3.3.1 Tip:
Use `mlflow.autolog()` for automatic capture of metrics, parameters and artifacts!

In [None]:
# Select and register the best model based on experiments
from mlflow.tracking import MlflowClient
import mlflow.pyfunc

client = MlflowClient()

# Find the best run based on accuracy metric
experiment = mlflow.get_experiment_by_name("Bonsai-Species-Classification")
best_run = client.search_runs(
    experiment_ids=[experiment.experiment_id],
    order_by=["metrics.accuracy DESC"],
    max_results=1
)[0]

print(f"üèÜ Best model found:")
print(f"   Run ID: {best_run.info.run_id}")
print(f"   Accuracy: {best_run.data.metrics['accuracy']:.3f}")
print(f"   F1-Score: {best_run.data.metrics['f1_score']:.3f}")
print(f"   Configuration: {best_run.data.params}")

# Register the best model in the Model Registry
model_name = "Bonsai-Species-Classifier-Production"
model_uri = f"runs:/{best_run.info.run_id}/bonsai_classifier"  # Use correct artifact path

try:
    # Register new model version
    model_version = mlflow.register_model(
        model_uri=model_uri,
        name=model_name
    )
    print(f"\n‚úÖ Model registered successfully!")
    print(f"üì¶ Name: {model_name}")
    print(f"üî¢ Version: {model_version.version}")
    print(f"üåê MLflow UI: http://localhost:5000/#/models/{model_name}")

    # Add tags and annotations to the registered model
    client.set_model_version_tag(
        name=model_name,
        version=model_version.version,
        key="use_case",
        value="plant_ecommerce_classification"
    )
    client.set_model_version_tag(
        name=model_name,
        version=model_version.version,
        key="model_type",
        value="RandomForestClassifier"
    )
    # Update description with detailed information
    client.update_model_version(
        name=model_name,
        version=model_version.version,
        description=f"""
        üå≥ Bonsai Species Classifier for E-commerce

        **Performance:**
        - Accuracy: {best_run.data.metrics['accuracy']:.3f}
        - Precision: {best_run.data.metrics['precision']:.3f}
        - Recall: {best_run.data.metrics['recall']:.3f}
        - F1-Score: {best_run.data.metrics['f1_score']:.3f}

        **Configuration:**
        - Estimators: {best_run.data.params['n_estimators']}
        - Max Depth: {best_run.data.params['max_depth']}
        - Min Samples Split: {best_run.data.params['min_samples_split']}

        **Use Cases:**
        - Automatic bonsai species identification
        - Care recommendations based on species
        - Purchase decision support for customers

        **Classes:** Juniper (0), Ficus (1), Pine (2), Maple (3)
        """
    )
    print(f"ÔøΩÔ∏è  Tags and description added to the model")

    # Transition model version to Staging (best practice)
    client.transition_model_version_stage(
        name=model_name,
        version=model_version.version,
        stage="Staging"
    )
    print(f"üö¶ Model version {model_version.version} transitioned to Staging.")

    # To promote to Production after validation:
    # client.transition_model_version_stage(
    #     name=model_name,
    #     version=model_version.version,
    #     stage="Production"
    # )
    # print(f"üöÄ Model version {model_version.version} promoted to Production.")

except Exception as e:
    print(f"‚ùå Error registering model: {e}")
    print("üí° Check if the MLflow server is running")

print(f"\nüìà Next steps:")
print(f"1. Review model in MLflow UI")
print(f"2. Test model in staging")
print(f"3. Promote to production when approved")

# MLflow 3.3.1: Experiments, Artifact Logging, and Model Registry Best Practices

**Experiments & Artifact Logging**
- Track multiple experiments (runs) with parameters, metrics, and artifacts.
- Log artifacts (models, plots, files) using `mlflow.log_artifact()` or `mlflow.<flavor>.log_model()`.
- Artifacts help reproduce results and compare runs.

**Model Registry Workflow**
- Register your best model with `mlflow.register_model(model_uri, name)`.
- Registry supports model versioning and lifecycle stages: `None`, `Staging`, `Production`, `Archived`.
- Use MLflow UI or `MlflowClient` to transition model versions:
  - `client.transition_model_version_stage(name, version, stage="Staging")`
  - `client.transition_model_version_stage(name, version, stage="Production")`

**Identifying Staging/Production Versions**
- Assign tags and descriptions to model versions for clarity.
- Promote/demote versions between stages using UI or API.
- Only one version should be in `Production` at a time for a given model.

**Best Practices**
- Always log relevant artifacts for each run.
- Use clear naming and tagging for models and versions.
- Automate stage transitions as part of your CI/CD pipeline.


In [None]:
# Test model in staging
from mlflow.pyfunc import load_model

# Get the latest model version in staging
model_name = "Bonsai-Species-Classifier-Production"
from mlflow.tracking import MlflowClient
client = MlflowClient()
staging_versions = [v for v in client.get_latest_versions(model_name, stages=["Staging"])]
if staging_versions:
    staging_model_uri = f"models:/{model_name}/Staging"
    print(f"Loading model from: {staging_model_uri}")
    model = load_model(staging_model_uri)
    # Example prediction (using first test sample)
    sample = X_test[0].reshape(1, -1)
    pred = model.predict(sample)
    print(f"Predicted class for first test sample: {bonsai_species[pred[0]]}")
else:
    print("No model in Staging stage found.")

In [None]:
# Promote model to production when approved
model_name = "Bonsai-Species-Classifier-Production"
from mlflow.tracking import MlflowClient
client = MlflowClient()
staging_versions = [v for v in client.get_latest_versions(model_name, stages=["Staging"])]
if staging_versions:
    version = staging_versions[0].version
    client.transition_model_version_stage(
        name=model_name,
        version=version,
        stage="Production"
    )
    print(f"üöÄ Model version {version} promoted to Production.")
else:
    print("No model in Staging stage to promote.")