# Complete MLflow Tutorial: Learn Machine Learning Experiment Tracking

## 🎯 Learning Objectives
By the end of this notebook, you'll understand:
- How to set up and use MLflow for experiment tracking
- How to log parameters, metrics, and artifacts
- How to manage model versions with MLflow Model Registry
- How to compare experiments and make data-driven decisions
- Best practices for MLflow in production workflows

## 📖 What is MLflow?
MLflow is an open-source platform for managing the machine learning lifecycle, including:
- **Tracking**: Record and query experiments (code, data, config, results)
- **Projects**: Package data science code in a reusable, reproducible form
- **Models**: Manage and deploy models from various ML libraries
- **Registry**: Centralized model store for collaborative model management

Let's dive into each component step by step!

## 1. Import MLflow and Required Libraries

First, let's import all the necessary libraries. Each import serves a specific purpose in our ML workflow:

In [1]:
# MLflow - Main library for experiment tracking and model management
import mlflow
from mlflow.models import infer_signature  # For automatically inferring model signatures
from mlflow.tracking import MlflowClient    # Client for programmatic access to MLflow

# Data manipulation and analysis
import pandas as pd                         # For data manipulation and analysis
import numpy as np                          # For numerical operations

# Machine Learning libraries
from sklearn import datasets               # For loading sample datasets
from sklearn.model_selection import train_test_split  # For splitting data
from sklearn.linear_model import LogisticRegression   # Our ML model
from sklearn.ensemble import RandomForestClassifier   # Alternative model for comparison
from sklearn.metrics import (               # For calculating model performance metrics
    accuracy_score, 
    precision_score, 
    recall_score, 
    f1_score,
    confusion_matrix,
    classification_report
)

# Visualization libraries
import matplotlib.pyplot as plt             # For creating plots
import seaborn as sns                       # For advanced statistical visualizations

# System and utility libraries
import os                                   # For file system operations
import warnings                             # For handling warnings
warnings.filterwarnings('ignore')          # Suppress warnings for cleaner output

print("✅ All libraries imported successfully!")
print(f"MLflow version: {mlflow.__version__}")



✅ All libraries imported successfully!
MLflow version: 3.1.4


## 2. Set Up MLflow Tracking Server Connection

MLflow can store tracking data in different locations:
- **Local filesystem**: `file:./mlruns` (default)
- **Database**: `sqlite:///mlflow.db` or `postgresql://...`
- **Remote server**: `http://localhost:5000` or `http://your-server:5000`

We'll configure our setup to connect to a local MLflow tracking server:

In [2]:
# Set the MLflow tracking URI
# This tells MLflow where to store experiment data and artifacts
# Options:
# 1. Local file system: "file:./mlruns" 
# 2. Local server: "http://127.0.0.1:8080"
# 3. Remote server: "http://your-mlflow-server.com:5000"

# For this tutorial, we'll use local file-based tracking for simplicity
# You can change this to "http://127.0.0.1:8080" if you have the server running
tracking_uri = "file:./mlruns"
mlflow.set_tracking_uri(tracking_uri)

# Verify the tracking URI is set correctly
current_uri = mlflow.get_tracking_uri()
print(f"🔗 MLflow tracking URI: {current_uri}")

# Create an MLflow client for programmatic access
# This allows us to interact with MLflow programmatically
client = MlflowClient()

print("✅ MLflow tracking setup complete!")

# Let's check if we can connect and list any existing experiments
try:
    # Use search_experiments() instead of list_experiments() for newer MLflow versions
    experiments = client.search_experiments()
    print(f"📊 Found {len(experiments)} existing experiments")
    for exp in experiments:
        print(f"   - {exp.name} (ID: {exp.experiment_id})")
except AttributeError:
    # Fallback for older MLflow versions
    try:
        experiments = client.list_experiments()
        print(f"📊 Found {len(experiments)} existing experiments")
        for exp in experiments:
            print(f"   - {exp.name} (ID: {exp.experiment_id})")
    except Exception as e:
        print(f"⚠️  Note: Could not list experiments: {e}")
        print("   This is normal if no experiments exist yet or if using file-based tracking.")
except Exception as e:
    print(f"⚠️  Note: Could not connect to MLflow: {e}")
    print("   This is normal if no experiments exist yet or if using file-based tracking.")

🔗 MLflow tracking URI: file:./mlruns
✅ MLflow tracking setup complete!
📊 Found 1 existing experiments
   - iris-classification-tutorial (ID: 636605946707093340)


## 3. Create and Start an MLflow Experiment

**Experiments** in MLflow are containers for organizing related runs. Think of them as project folders where you group related model training attempts.

### Best Practices for Experiment Naming:
- Use descriptive names: `iris-classification-comparison`
- Include version or date: `model-v2-2024`
- Separate by model type: `logistic-regression-experiments`
- Group by business objective: `customer-churn-prediction`

In [3]:
# Define experiment name with descriptive naming
experiment_name = "iris-classification-tutorial"

# Create or get the experiment
# mlflow.set_experiment() either:
# 1. Creates a new experiment if it doesn't exist
# 2. Sets the active experiment if it already exists
experiment = mlflow.set_experiment(experiment_name)

print(f"🧪 Experiment: '{experiment_name}'")
print(f"📍 Experiment ID: {experiment.experiment_id}")
print(f"📂 Artifact Location: {experiment.artifact_location}")

# You can also create experiments with additional metadata
# This is useful for providing context and description
experiment_description = """
This experiment focuses on comparing different machine learning models 
for Iris flower classification. We'll track various algorithms, 
hyperparameters, and performance metrics to find the best approach.

Dataset: Iris flower dataset (150 samples, 4 features, 3 classes)
Goal: Achieve highest accuracy while maintaining good generalization
"""

# Add tags to categorize and organize experiments
experiment_tags = {
    "project": "ml-tutorial",
    "dataset": "iris",
    "task": "classification",
    "team": "data-science",
    "priority": "learning"
}

# Set experiment tags (metadata for organization)
for key, value in experiment_tags.items():
    mlflow.set_experiment_tag(key, value)
    print(f"🏷️  Tag: {key} = {value}")

print("\n✅ Experiment setup complete!")
print(f"💡 All runs in this notebook will be tracked under experiment: '{experiment_name}'")

🧪 Experiment: 'iris-classification-tutorial'
📍 Experiment ID: 636605946707093340
📂 Artifact Location: file:///Users/Farhad/Documents/ML_project/MLFLOW/mlruns/636605946707093340
🏷️  Tag: project = ml-tutorial
🏷️  Tag: dataset = iris
🏷️  Tag: task = classification
🏷️  Tag: team = data-science
🏷️  Tag: priority = learning

✅ Experiment setup complete!
💡 All runs in this notebook will be tracked under experiment: 'iris-classification-tutorial'


## 4. Load and Prepare Data

Before we start experimenting with models, let's load and explore our dataset. We'll use the classic Iris dataset for this tutorial.

In [16]:
# Load the Iris dataset from scikit-learn
# This is a classic dataset with 150 samples, 4 features, and 3 classes
print("📊 Loading Iris dataset...")
iris_data = datasets.load_iris()

# Extract features (X) and target labels (y)
X = iris_data.data          # Features: sepal length, sepal width, petal length, petal width
y = iris_data.target        # Target: iris species (0: setosa, 1: versicolor, 2: virginica)

# Get feature names and target names for better understanding
feature_names = iris_data.feature_names
target_names = iris_data.target_names

print(f"📈 Dataset shape: {X.shape}")
print(f"🎯 Target classes: {target_names}")
print(f"📋 Features: {feature_names}")

# Create a pandas DataFrame for better data exploration
# This makes it easier to visualize and understand our data
df = pd.DataFrame(X, columns=feature_names)
df['species'] = pd.Categorical.from_codes(y, target_names)

print("\n📋 First 5 rows of the dataset:")
print(df.head())

print("\n📊 Dataset statistics:")
print(df.describe())

print("\n🎯 Class distribution:")
print(df['species'].value_counts())

# Split the data into training and testing sets
# This is crucial for evaluating model performance on unseen data
test_size = 0.2         # Use 20% for testing, 80% for training
random_state = 42       # Set random seed for reproducibility

X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=test_size, 
    random_state=random_state,
    stratify=y  # Ensure balanced split across all classes
)

print(f"\n✂️  Data split complete:")
print(f"   Training samples: {X_train.shape[0]}")
print(f"   Testing samples: {X_test.shape[0]}")
print(f"   Features: {X_train.shape[1]}")

# Log data information that we'll use in our experiments
data_info = {
    "dataset_name": "iris",
    "total_samples": len(X),
    "n_features": X.shape[1],
    "n_classes": len(target_names),
    "train_samples": len(X_train),
    "test_samples": len(X_test),
    "test_size": test_size,
    "random_state": random_state
}

print("\n✅ Data preparation complete!")
print("   Ready to start model training and MLflow tracking!")

📊 Loading Iris dataset...
📈 Dataset shape: (150, 4)
🎯 Target classes: ['setosa' 'versicolor' 'virginica']
📋 Features: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

📋 First 5 rows of the dataset:
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

  species  
0  setosa  
1  setosa  
2  setosa  
3  setosa  
4  setosa  

📊 Dataset statistics:
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0 

       sepal length (cm)  sepal width (cm)  petal length (cm)  \
count         150.000000        150.000000         150.000000   
mean            5.843333          3.057333           3.758000   
std             0.828066          0.435866           1.765298   
min             4.300000          2.000000           1.000000   
25%             5.100000          2.800000           1.600000   
50%             5.800000          3.000000           4.350000   
75%             6.400000          3.300000           5.100000   
max             7.900000          4.400000           6.900000   

       petal width (cm)  
count        150.000000  
mean           1.199333  
std            0.762238  
min            0.100000  
25%            0.300000  
50%            1.300000  
75%            1.800000  
max            2.500000  

🎯 Class distribution:
species
setosa        50
versicolor    50
virginica     50
Name: count, dtype: int64

✂️  Data split complete:
   Training samples: 120
   Testing samples: 3

## 5. Log Parameters and Hyperparameters

**Parameters** in MLflow are the configuration values that control your experiment. This includes:
- **Hyperparameters**: Model-specific settings (learning rate, number of trees, etc.)
- **Data parameters**: Dataset splits, preprocessing settings
- **Environment parameters**: Library versions, hardware specs

### Why Log Parameters?
- **Reproducibility**: Recreate exact same results
- **Comparison**: Compare different configurations
- **Optimization**: Track what works best
- **Documentation**: Maintain experiment history

In [17]:
# Let's start our first MLflow run to demonstrate parameter logging
# A "run" is a single execution of your ML code (one training session)

print("🚀 Starting MLflow run for parameter logging demonstration...")

with mlflow.start_run(run_name="parameter-logging-demo") as run:
    print(f"📝 Run ID: {run.info.run_id}")
    
    # Method 1: Log individual parameters using mlflow.log_param()
    # This is useful for single values or when you want to log parameters one by one
    print("\n📋 Logging individual parameters...")
    
    mlflow.log_param("model_type", "logistic_regression")  # String parameter
    mlflow.log_param("test_size", 0.2)                     # Float parameter  
    mlflow.log_param("random_state", 42)                   # Integer parameter
    mlflow.log_param("cross_validation", True)             # Boolean parameter
    
    # Method 2: Log multiple parameters at once using mlflow.log_params()
    # This is more efficient when you have many parameters
    print("📋 Logging multiple parameters at once...")
    
    # Model hyperparameters
    model_params = {
        "solver": "lbfgs",           # Algorithm to use in optimization
        "max_iter": 1000,            # Maximum number of iterations
        "multi_class": "auto",       # How to handle multi-class classification
        "penalty": "l2",             # Regularization penalty
        "C": 1.0,                    # Inverse of regularization strength
        "fit_intercept": True,       # Whether to calculate intercept
        "class_weight": None         # Weights for balancing classes
    }
    mlflow.log_params(model_params)
    
    # Data processing parameters
    data_params = {
        "dataset_version": "1.0",
        "feature_scaling": "none",
        "feature_selection": "all",
        "data_split_strategy": "stratified"
    }
    mlflow.log_params(data_params)
    
    # Environment and system parameters
    system_params = {
        "sklearn_version": "1.3.0",
        "python_version": "3.9",
        "platform": "macOS",
        "cpu_count": os.cpu_count()
    }
    mlflow.log_params(system_params)
    
    print("✅ All parameters logged successfully!")
    print(f"\n🔍 View this run in MLflow UI:")
    print(f"   Run ID: {run.info.run_id}")
    print(f"   Experiment: {experiment_name}")
    
    # Log the data information we prepared earlier
    mlflow.log_params(data_info)
    
    print("\n💡 Parameter Logging Best Practices:")
    print("   ✓ Use descriptive parameter names")
    print("   ✓ Log all hyperparameters that affect model behavior")
    print("   ✓ Include data processing parameters")
    print("   ✓ Track environment information for reproducibility")
    print("   ✓ Use consistent naming conventions")
    
print("\n🎯 Parameter logging demonstration complete!")
print("These parameters are now stored in MLflow and can be:")
print("   - Viewed in the MLflow UI")
print("   - Queried programmatically") 
print("   - Used for experiment comparison")
print("   - Referenced for model reproduction")

🚀 Starting MLflow run for parameter logging demonstration...
📝 Run ID: 96ad5c7c035f47e9bcc70025e20b1fea

📋 Logging individual parameters...
📝 Run ID: 96ad5c7c035f47e9bcc70025e20b1fea

📋 Logging individual parameters...


📋 Logging multiple parameters at once...
✅ All parameters logged successfully!

🔍 View this run in MLflow UI:
   Run ID: 96ad5c7c035f47e9bcc70025e20b1fea
   Experiment: iris-classification-tutorial

💡 Parameter Logging Best Practices:
   ✓ Use descriptive parameter names
   ✓ Log all hyperparameters that affect model behavior
   ✓ Include data processing parameters
   ✓ Track environment information for reproducibility
   ✓ Use consistent naming conventions

🎯 Parameter logging demonstration complete!
These parameters are now stored in MLflow and can be:
   - Viewed in the MLflow UI
   - Queried programmatically
   - Used for experiment comparison
   - Referenced for model reproduction


## 6. Train a Simple Model with MLflow Tracking

Now let's train our first machine learning model while tracking everything with MLflow. We'll train a Logistic Regression model and log every step of the process.

In [18]:
# Start a new MLflow run for model training
print("🎯 Starting model training with complete MLflow tracking...")

with mlflow.start_run(run_name="logistic-regression-baseline") as run:
    
    print(f"🆔 Run ID: {run.info.run_id}")
    print(f"📅 Start Time: {run.info.start_time}")
    
    # Step 1: Define model hyperparameters
    # These parameters control how the model learns
    model_params = {
        "solver": "lbfgs",         # Optimization algorithm
        "max_iter": 1000,          # Maximum iterations for convergence
        "multi_class": "auto",     # Multi-class strategy
        "random_state": 42,        # For reproducible results
        "C": 1.0,                  # Regularization strength (inverse)
        "penalty": "l2"            # Regularization type
    }
    
    # Step 2: Log all hyperparameters to MLflow
    print("📝 Logging model hyperparameters...")
    mlflow.log_params(model_params)
    
    # Also log data-related parameters
    mlflow.log_params({
        "train_samples": len(X_train),
        "test_samples": len(X_test),
        "n_features": X_train.shape[1],
        "n_classes": len(np.unique(y_train))
    })
    
    # Step 3: Create and train the model
    print("🏋️ Training Logistic Regression model...")
    
    # Initialize the model with our hyperparameters
    model = LogisticRegression(**model_params)
    
    # Train the model on training data
    # This is where the actual machine learning happens
    model.fit(X_train, y_train)
    
    print("✅ Model training complete!")
    
    # Step 4: Make predictions on both training and test sets
    print("🔮 Making predictions...")
    
    # Predict on training data (to check for overfitting)
    y_train_pred = model.predict(X_train)
    y_train_prob = model.predict_proba(X_train)  # Get probability scores
    
    # Predict on test data (for true performance evaluation)
    y_test_pred = model.predict(X_test)
    y_test_prob = model.predict_proba(X_test)
    
    print(f"📊 Training predictions shape: {y_train_pred.shape}")
    print(f"📊 Test predictions shape: {y_test_pred.shape}")
    
    # Step 5: Calculate performance metrics
    print("📈 Calculating performance metrics...")
    
    # Training metrics (to detect overfitting)
    train_accuracy = accuracy_score(y_train, y_train_pred)
    train_precision = precision_score(y_train, y_train_pred, average='weighted')
    train_recall = recall_score(y_train, y_train_pred, average='weighted')
    train_f1 = f1_score(y_train, y_train_pred, average='weighted')
    
    # Test metrics (true performance indicators)
    test_accuracy = accuracy_score(y_test, y_test_pred)
    test_precision = precision_score(y_test, y_test_pred, average='weighted')
    test_recall = recall_score(y_test, y_test_pred, average='weighted')
    test_f1 = f1_score(y_test, y_test_pred, average='weighted')
    
    # Print results for immediate feedback
    print(f"\\n📊 Training Performance:")
    print(f"   Accuracy:  {train_accuracy:.4f}")
    print(f"   Precision: {train_precision:.4f}")
    print(f"   Recall:    {train_recall:.4f}")
    print(f"   F1-Score:  {train_f1:.4f}")
    
    print(f"\\n🎯 Test Performance:")
    print(f"   Accuracy:  {test_accuracy:.4f}")
    print(f"   Precision: {test_precision:.4f}")
    print(f"   Recall:    {test_recall:.4f}")
    print(f"   F1-Score:  {test_f1:.4f}")
    
    # Calculate overfitting indicator
    accuracy_diff = train_accuracy - test_accuracy
    print(f"\\n⚖️  Overfitting Check:")
    print(f"   Accuracy difference: {accuracy_diff:.4f}")
    if accuracy_diff > 0.05:
        print("   ⚠️  Potential overfitting detected!")
    else:
        print("   ✅ Model generalizes well!")
    
    # Step 6: Log all metrics to MLflow
    print("\\n💾 Logging metrics to MLflow...")
    
    # Log training metrics
    mlflow.log_metric("train_accuracy", train_accuracy)
    mlflow.log_metric("train_precision", train_precision)
    mlflow.log_metric("train_recall", train_recall)
    mlflow.log_metric("train_f1", train_f1)
    
    # Log test metrics  
    mlflow.log_metric("test_accuracy", test_accuracy)
    mlflow.log_metric("test_precision", test_precision)
    mlflow.log_metric("test_recall", test_recall)
    mlflow.log_metric("test_f1", test_f1)
    
    # Log derived metrics
    mlflow.log_metric("accuracy_diff", accuracy_diff)
    mlflow.log_metric("model_complexity", len(model.coef_[0]))  # Number of features
    
    # Step 7: Add run tags for organization
    print("🏷️  Adding tags for organization...")
    mlflow.set_tag("model_type", "logistic_regression")
    mlflow.set_tag("status", "completed")
    mlflow.set_tag("dataset", "iris")
    mlflow.set_tag("purpose", "baseline_model")
    
    print(f"\\n✅ Model training and tracking complete!")
    print(f"🔍 View results in MLflow UI - Run ID: {run.info.run_id}")

print("\\n🎉 Your first tracked ML experiment is done!")
print("This run now contains:")
print("   ✓ All hyperparameters")
print("   ✓ Training and test metrics") 
print("   ✓ Run metadata and tags")
print("   ✓ Timestamps and run information")

🎯 Starting model training with complete MLflow tracking...
🆔 Run ID: 1b22db2a24a34e73880e57ebc0da8a85
📅 Start Time: 1756448173293
📝 Logging model hyperparameters...
🆔 Run ID: 1b22db2a24a34e73880e57ebc0da8a85
📅 Start Time: 1756448173293
📝 Logging model hyperparameters...
🏋️ Training Logistic Regression model...
✅ Model training complete!
🔮 Making predictions...
📊 Training predictions shape: (120,)
📊 Test predictions shape: (30,)
📈 Calculating performance metrics...
\n📊 Training Performance:
   Accuracy:  0.9750
   Precision: 0.9752
   Recall:    0.9750
   F1-Score:  0.9750
\n🎯 Test Performance:
   Accuracy:  0.9667
   Precision: 0.9697
   Recall:    0.9667
   F1-Score:  0.9666
\n⚖️  Overfitting Check:
   Accuracy difference: 0.0083
   ✅ Model generalizes well!
\n💾 Logging metrics to MLflow...
🏋️ Training Logistic Regression model...
✅ Model training complete!
🔮 Making predictions...
📊 Training predictions shape: (120,)
📊 Test predictions shape: (30,)
📈 Calculating performance metrics...

## 7. Log Metrics During Training

**Metrics** in MLflow track the performance and behavior of your models over time. Unlike parameters (which are input configurations), metrics are output measurements that change during or after training.

### Types of Metrics to Track:
- **Performance metrics**: Accuracy, precision, recall, F1-score
- **Training metrics**: Loss, convergence rate, training time
- **Resource metrics**: Memory usage, CPU time, GPU utilization
- **Business metrics**: Model interpretability scores, fairness metrics

In [19]:
# Demonstrate advanced metric logging techniques
print("📊 Advanced Metrics Logging Demonstration...")

with mlflow.start_run(run_name="advanced-metrics-demo") as run:
    
    print(f"🆔 Run ID: {run.info.run_id}")
    
    # Train a model quickly for demonstration
    model = LogisticRegression(random_state=42, max_iter=1000)
    
    # Time the training process
    import time
    start_time = time.time()
    model.fit(X_train, y_train)
    training_time = time.time() - start_time
    
    # Make predictions
    y_pred = model.predict(X_test)
    y_prob = model.predict_proba(X_test)
    
    # Method 1: Log individual metrics with mlflow.log_metric()
    print("\\n📈 Logging individual metrics...")
    
    # Basic performance metrics
    accuracy = accuracy_score(y_test, y_pred)
    mlflow.log_metric("accuracy", accuracy)
    print(f"   ✓ Accuracy: {accuracy:.4f}")
    
    # Detailed per-class metrics
    precision_macro = precision_score(y_test, y_pred, average='macro')
    precision_micro = precision_score(y_test, y_pred, average='micro')
    precision_weighted = precision_score(y_test, y_pred, average='weighted')
    
    mlflow.log_metric("precision_macro", precision_macro)
    mlflow.log_metric("precision_micro", precision_micro) 
    mlflow.log_metric("precision_weighted", precision_weighted)
    
    print(f"   ✓ Precision (macro): {precision_macro:.4f}")
    print(f"   ✓ Precision (micro): {precision_micro:.4f}")
    print(f"   ✓ Precision (weighted): {precision_weighted:.4f}")
    
    # Method 2: Log metrics with step parameter (useful for iterative training)
    print("\\n📊 Logging metrics with steps (simulating iterative training)...")
    
    # Simulate logging metrics at different training epochs/steps
    # This is useful for tracking training progress over time
    for step in range(0, 6):
        # Simulate improving accuracy over training steps
        simulated_accuracy = accuracy * (0.8 + 0.04 * step)  # Gradually improve
        mlflow.log_metric("training_accuracy", simulated_accuracy, step=step)
        print(f"   Step {step}: Training accuracy = {simulated_accuracy:.4f}")
    
    # Method 3: Log system and performance metrics
    print("\\n⚡ Logging system performance metrics...")
    
    # Training time
    mlflow.log_metric("training_time_seconds", training_time)
    print(f"   ✓ Training time: {training_time:.4f} seconds")
    
    # Model complexity metrics
    n_features = X_train.shape[1]
    n_params = len(model.coef_.flatten()) + len(model.intercept_)
    mlflow.log_metric("model_parameters", n_params)
    mlflow.log_metric("feature_count", n_features)
    
    print(f"   ✓ Model parameters: {n_params}")
    print(f"   ✓ Features used: {n_features}")
    
    # Memory usage (approximate)
    model_size_bytes = model.__sizeof__()
    mlflow.log_metric("model_size_bytes", model_size_bytes)
    print(f"   ✓ Model size: {model_size_bytes} bytes")
    
    # Method 4: Log business and interpretability metrics
    print("\\n💼 Logging business-relevant metrics...")
    
    # Confidence scores (average prediction probability)
    avg_confidence = np.mean(np.max(y_prob, axis=1))
    mlflow.log_metric("average_prediction_confidence", avg_confidence)
    print(f"   ✓ Average prediction confidence: {avg_confidence:.4f}")
    
    # Class balance in predictions
    pred_class_distribution = np.bincount(y_pred) / len(y_pred)
    for i, ratio in enumerate(pred_class_distribution):
        mlflow.log_metric(f"predicted_class_{i}_ratio", ratio)
        print(f"   ✓ Predicted class {target_names[i]} ratio: {ratio:.4f}")
    
    # Method 5: Log derived and comparative metrics
    print("\\n🔍 Logging derived metrics...")
    
    # Calculate per-class metrics
    per_class_precision = precision_score(y_test, y_pred, average=None)
    per_class_recall = recall_score(y_test, y_pred, average=None)
    
    for i, (prec, rec) in enumerate(zip(per_class_precision, per_class_recall)):
        mlflow.log_metric(f"precision_class_{target_names[i]}", prec)
        mlflow.log_metric(f"recall_class_{target_names[i]}", rec)
        print(f"   ✓ {target_names[i]} - Precision: {prec:.4f}, Recall: {rec:.4f}")
    
    # Worst and best performing class
    worst_class_idx = np.argmin(per_class_precision)
    best_class_idx = np.argmax(per_class_precision)
    
    mlflow.log_metric("worst_class_precision", per_class_precision[worst_class_idx])
    mlflow.log_metric("best_class_precision", per_class_precision[best_class_idx])
    
    print(f"   📉 Worst performing class: {target_names[worst_class_idx]} ({per_class_precision[worst_class_idx]:.4f})")
    print(f"   📈 Best performing class: {target_names[best_class_idx]} ({per_class_precision[best_class_idx]:.4f})")
    
    # Add descriptive tags
    mlflow.set_tag("metrics_logged", "comprehensive")
    mlflow.set_tag("analysis_depth", "detailed")
    
    print(f"\\n✅ Advanced metrics logging complete!")
    print(f"🔍 View all metrics in MLflow UI - Run ID: {run.info.run_id}")

print("\\n💡 Metrics Logging Best Practices:")
print("   ✓ Log metrics consistently across all experiments")
print("   ✓ Use descriptive metric names")
print("   ✓ Track both training and validation metrics")
print("   ✓ Log system performance metrics for optimization")
print("   ✓ Include business-relevant metrics")
print("   ✓ Use step parameter for iterative training tracking")
print("   ✓ Log derived metrics for deeper insights")

📊 Advanced Metrics Logging Demonstration...
🆔 Run ID: 627754b683394b8c831bf35540128002
\n📈 Logging individual metrics...
   ✓ Accuracy: 0.9667


   ✓ Precision (macro): 0.9697
   ✓ Precision (micro): 0.9667
   ✓ Precision (weighted): 0.9697
\n📊 Logging metrics with steps (simulating iterative training)...
   Step 0: Training accuracy = 0.7733
   Step 1: Training accuracy = 0.8120
   Step 2: Training accuracy = 0.8507
   Step 3: Training accuracy = 0.8893
   Step 4: Training accuracy = 0.9280
   Step 5: Training accuracy = 0.9667
\n⚡ Logging system performance metrics...
   ✓ Training time: 0.0393 seconds
   ✓ Model parameters: 15
   ✓ Features used: 4
   ✓ Model size: 32 bytes
\n💼 Logging business-relevant metrics...
   ✓ Model parameters: 15
   ✓ Features used: 4
   ✓ Model size: 32 bytes
\n💼 Logging business-relevant metrics...
   ✓ Average prediction confidence: 0.8681
   ✓ Predicted class setosa ratio: 0.3333
   ✓ Predicted class versicolor ratio: 0.3000
   ✓ Predicted class virginica ratio: 0.3667
\n🔍 Logging derived metrics...
   ✓ setosa - Precision: 1.0000, Recall: 1.0000
   ✓ Average prediction confidence: 0.8681
   ✓ 

## 8. Log Model Artifacts

**Artifacts** in MLflow are files associated with your experiments. This includes:
- **Models**: Trained model files (pickle, joblib, etc.)
- **Plots**: Visualizations and charts
- **Data files**: Processed datasets, feature importance files
- **Reports**: Model evaluation reports, documentation
- **Code**: Scripts, notebooks, configuration files

Artifacts enable full reproducibility and comprehensive experiment documentation.

In [20]:
# 📦 Comprehensive Artifact Logging Demonstration
print("📦 Comprehensive Artifact Logging Demonstration...")
print(f"🆔 Run ID: {run.info.run_id}")

# Create visualizations and save as artifacts
print("\n📊 Creating and logging visualization artifacts...")

# 1. Confusion Matrix Heatmap
plt.figure(figsize=(10, 8))
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=target_names, yticklabels=target_names)
plt.title('Confusion Matrix - Test Set Performance')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
confusion_matrix_path = "confusion_matrix.png"
plt.savefig(confusion_matrix_path, dpi=300, bbox_inches='tight')
plt.close()
print(f"   ✓ Saved confusion matrix: {confusion_matrix_path}")

# 2. Feature Importance Plot
plt.figure(figsize=(10, 6))
# For logistic regression, use the absolute values of coefficients as importance
feature_importance = np.abs(model.coef_[0])
plt.barh(range(len(feature_names)), feature_importance)
plt.yticks(range(len(feature_names)), feature_names)
plt.xlabel('Feature Importance (|Coefficient|)')
plt.title('Feature Importance - Logistic Regression')
plt.tight_layout()
feature_importance_path = "feature_importance.png"
plt.savefig(feature_importance_path, dpi=300, bbox_inches='tight')
plt.close()
print(f"   ✓ Saved feature importance: {feature_importance_path}")

# 3. Prediction Analysis Plot
plt.figure(figsize=(12, 8))
# Get prediction probabilities
y_prob = model.predict_proba(X_test)
max_probs = np.max(y_prob, axis=1)

# Create subplots for detailed analysis
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))

# Plot 1: Prediction confidence distribution
ax1.hist(max_probs, bins=20, alpha=0.7, color='skyblue', edgecolor='black')
ax1.set_xlabel('Maximum Prediction Probability')
ax1.set_ylabel('Frequency')
ax1.set_title('Prediction Confidence Distribution')
ax1.axvline(np.mean(max_probs), color='red', linestyle='--', 
           label=f'Mean: {np.mean(max_probs):.3f}')
ax1.legend()

# Plot 2: Class-wise prediction accuracy
from sklearn.metrics import classification_report
class_report = classification_report(y_test, y_pred, target_names=target_names, output_dict=True)
classes = list(target_names)
precisions = [class_report[cls]['precision'] for cls in classes]
recalls = [class_report[cls]['recall'] for cls in classes]

x = np.arange(len(classes))
width = 0.35
ax2.bar(x - width/2, precisions, width, label='Precision', alpha=0.8)
ax2.bar(x + width/2, recalls, width, label='Recall', alpha=0.8)
ax2.set_xlabel('Classes')
ax2.set_ylabel('Score')
ax2.set_title('Per-Class Performance Metrics')
ax2.set_xticks(x)
ax2.set_xticklabels(classes)
ax2.legend()
ax2.set_ylim(0, 1.1)

# Plot 3: Feature correlation with predictions
ax3.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='viridis', alpha=0.6, s=50)
ax3.set_xlabel(feature_names[0])
ax3.set_ylabel(feature_names[1])
ax3.set_title('True Labels - Feature Space')

# Plot 4: Prediction errors analysis
incorrect_mask = y_test != y_pred
if np.any(incorrect_mask):
    ax4.scatter(X_test[~incorrect_mask, 0], X_test[~incorrect_mask, 1], 
               c=y_test[~incorrect_mask], cmap='viridis', alpha=0.6, s=50, label='Correct')
    ax4.scatter(X_test[incorrect_mask, 0], X_test[incorrect_mask, 1], 
               c='red', marker='x', s=100, label='Incorrect')
    ax4.set_xlabel(feature_names[0])
    ax4.set_ylabel(feature_names[1])
    ax4.set_title('Prediction Errors Analysis')
    ax4.legend()
else:
    ax4.text(0.5, 0.5, 'No prediction errors!', ha='center', va='center', 
            transform=ax4.transAxes, fontsize=16, color='green')
    ax4.set_title('Prediction Errors Analysis')

plt.tight_layout()
prediction_analysis_path = "prediction_analysis.png"
plt.savefig(prediction_analysis_path, dpi=300, bbox_inches='tight')
plt.close()
print(f"   ✓ Saved prediction analysis: {prediction_analysis_path}")

# Log all visualization artifacts to MLflow
mlflow.log_artifact(confusion_matrix_path)
mlflow.log_artifact(feature_importance_path)
mlflow.log_artifact(prediction_analysis_path)

print("\n💾 Creating and logging data artifacts...")

# 4. Classification Report as CSV
class_report_df = pd.DataFrame(classification_report(y_test, y_pred, 
                                                   target_names=target_names, 
                                                   output_dict=True)).T
class_report_path = "classification_report.csv"
class_report_df.to_csv(class_report_path)
mlflow.log_artifact(class_report_path)
print(f"   ✓ Saved classification report: {class_report_path}")

# 5. Detailed Predictions with Probabilities
results_df = pd.DataFrame({
    'true_label': y_test,
    'predicted_label': y_pred,
    'true_class_name': [target_names[i] for i in y_test],
    'predicted_class_name': [target_names[i] for i in y_pred],
    'is_correct': y_test == y_pred,
    'max_probability': np.max(y_prob, axis=1),
    'confidence': np.max(y_prob, axis=1)
})

# Add individual class probabilities
for i, class_name in enumerate(target_names):
    results_df[f'prob_{class_name}'] = y_prob[:, i]

predictions_path = "detailed_predictions.csv"
results_df.to_csv(predictions_path, index=False)
mlflow.log_artifact(predictions_path)
print(f"   ✓ Saved detailed predictions: {predictions_path}")

# Create model summary report
model_summary = {
    "model_type": "LogisticRegression",
    "training_samples": len(X_train),
    "test_samples": len(X_test),
    "features": feature_names,  # Remove .tolist() since it's already a list
    "classes": target_names.tolist(),
    "hyperparameters": model_params,
    "performance": {
        "accuracy": float(accuracy_score(y_test, y_pred)),
        "precision": float(precision_score(y_test, y_pred, average='weighted')),
        "recall": float(recall_score(y_test, y_pred, average='weighted')),
        "f1": float(f1_score(y_test, y_pred, average='weighted'))
    },
    "feature_coefficients": model.coef_.tolist(),
    "intercept": model.intercept_.tolist()
}

# Save as JSON
import json
model_summary_path = "model_summary.json"
with open(model_summary_path, 'w') as f:
    json.dump(model_summary, f, indent=2)

mlflow.log_artifact(model_summary_path)
print(f"   ✓ Saved model summary: {model_summary_path}")

# 6. Model Performance Over Time (simulated)
print("\n📈 Creating performance tracking data...")
performance_history = []
for step in range(1, 11):
    # Simulate training progress (this would come from actual training loop)
    simulated_accuracy = 0.4 + (step/10) * 0.5 + np.random.normal(0, 0.02)
    performance_history.append({
        'step': step,
        'accuracy': min(max(simulated_accuracy, 0), 1),  # Clamp between 0 and 1
        'loss': max(0.1, 2.0 - step * 0.15 + np.random.normal(0, 0.05))
    })

performance_df = pd.DataFrame(performance_history)
performance_path = "training_history.csv"
performance_df.to_csv(performance_path, index=False)
mlflow.log_artifact(performance_path)
print(f"   ✓ Saved training history: {performance_path}")

print(f"\n✅ Artifact logging complete!")
print(f"📁 All files saved locally and logged to MLflow run: {run.info.run_id}")
print(f"🔗 Artifacts will be available in the MLflow UI under this run")

# Clean up local files (optional)
import os
for file_path in [confusion_matrix_path, feature_importance_path, prediction_analysis_path,
                 class_report_path, predictions_path, model_summary_path, performance_path]:
    if os.path.exists(file_path):
        os.remove(file_path)
        
print("🧹 Local artifact files cleaned up!")

📦 Comprehensive Artifact Logging Demonstration...
🆔 Run ID: 627754b683394b8c831bf35540128002

📊 Creating and logging visualization artifacts...
   ✓ Saved confusion matrix: confusion_matrix.png
   ✓ Saved confusion matrix: confusion_matrix.png
   ✓ Saved feature importance: feature_importance.png
   ✓ Saved feature importance: feature_importance.png
   ✓ Saved prediction analysis: prediction_analysis.png

💾 Creating and logging data artifacts...
   ✓ Saved classification report: classification_report.csv
   ✓ Saved detailed predictions: detailed_predictions.csv
   ✓ Saved model summary: model_summary.json

📈 Creating performance tracking data...
   ✓ Saved training history: training_history.csv

✅ Artifact logging complete!
📁 All files saved locally and logged to MLflow run: 627754b683394b8c831bf35540128002
🔗 Artifacts will be available in the MLflow UI under this run
🧹 Local artifact files cleaned up!
   ✓ Saved prediction analysis: prediction_analysis.png

💾 Creating and logging data

<Figure size 1200x800 with 0 Axes>

## 9. Register Model in MLflow Model Registry

The **MLflow Model Registry** is a centralized model store that provides:
- **Model Versioning**: Track different versions of the same model
- **Stage Management**: Move models through Staging → Production
- **Model Lineage**: Connect models to their training runs
- **Collaboration**: Share models across teams
- **Governance**: Control model deployments and approvals

### Model Lifecycle Stages:
- **None**: Newly registered model
- **Staging**: Model being tested/validated
- **Production**: Model deployed for real use
- **Archived**: Retired model

In [21]:
# End any active MLflow runs before starting model registry demo
try:
    mlflow.end_run()
    print("✅ Ended previous MLflow run")
except Exception as e:
    print(f"ℹ️  No active run to end: {e}")

print("🔄 Ready to start model registry demonstration...")

✅ Ended previous MLflow run
🔄 Ready to start model registry demonstration...


In [22]:
# Demonstrate MLflow Model Registry functionality
print("🏪 MLflow Model Registry Demonstration...")

# Define the registered model name
registered_model_name = "iris-classifier-production"

with mlflow.start_run(run_name="model-for-registry") as run:
    
    print(f"🆔 Run ID: {run.info.run_id}")
    
    # Train a high-quality model for registration
    print("\\n🏋️ Training production-ready model...")
    
    # Use optimized hyperparameters
    production_params = {
        "solver": "lbfgs",
        "max_iter": 2000,        # More iterations for better convergence
        "multi_class": "auto",
        "random_state": 42,
        "C": 0.8,                # Slightly more regularization
        "penalty": "l2"
    }
    
    # Train the model
    production_model = LogisticRegression(**production_params)
    production_model.fit(X_train, y_train)
    
    # Evaluate performance
    y_pred = production_model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred, average='weighted')
    recall = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')
    
    # Log everything to MLflow
    mlflow.log_params(production_params)
    mlflow.log_metrics({
        "accuracy": accuracy,
        "precision": precision,
        "recall": recall,
        "f1_score": f1
    })
    
    print(f"   📊 Model Performance:")
    print(f"      Accuracy: {accuracy:.4f}")
    print(f"      F1-Score: {f1:.4f}")
    
    # Create model signature for proper schema documentation
    signature = infer_signature(X_train, production_model.predict(X_train))
    
    # === STEP 1: LOG MODEL WITH REGISTRY NAME ===
    print("\\n📝 Registering model in MLflow Model Registry...")
    
    # Log model and register it simultaneously
    model_info = mlflow.sklearn.log_model(
        sk_model=production_model,
        artifact_path="model",                           # Path within run artifacts
        signature=signature,                             # Input/output schema
        input_example=X_train[:3],                      # Sample input for testing
        registered_model_name=registered_model_name,    # Register with this name
        pip_requirements=["scikit-learn>=1.0", "pandas", "numpy"]
    )
    
    print(f"   ✅ Model registered as: {registered_model_name}")
    print(f"   📍 Model URI: {model_info.model_uri}")
    
    # Add tags for this run
    mlflow.set_tag("model_purpose", "production_candidate")
    mlflow.set_tag("quality_check", "passed")
    mlflow.set_tag("model_type", "logistic_regression")

# === STEP 2: WORK WITH REGISTERED MODEL VERSIONS ===
print("\\n🔄 Working with Model Registry...")

# Get model registry client
client = MlflowClient()

try:
    # Get information about the registered model
    registered_model = client.get_registered_model(registered_model_name)
    print(f"\\n📋 Registered Model: {registered_model.name}")
    print(f"   Description: {registered_model.description or 'No description'}")
    print(f"   Creation Time: {registered_model.creation_timestamp}")
    
    # List all versions of this model
    model_versions = client.search_model_versions(f"name='{registered_model_name}'")
    print(f"\\n📚 Model Versions ({len(model_versions)} total):")
    
    for version in model_versions:
        print(f"   Version {version.version}:")
        print(f"      Stage: {version.current_stage}")
        print(f"      Status: {version.status}")
        print(f"      Run ID: {version.run_id}")
        print(f"      Creation Time: {version.creation_timestamp}")
    
    # === STEP 3: MANAGE MODEL STAGES ===
    print("\\n🎭 Managing Model Stages...")
    
    # Get the latest version
    latest_version = model_versions[0]  # Most recent version
    current_version_number = latest_version.version
    
    print(f"\\n🎯 Working with Version {current_version_number}...")
    
    # Transition model to Staging
    print("   📤 Transitioning to Staging...")
    client.transition_model_version_stage(
        name=registered_model_name,
        version=current_version_number,
        stage="Staging",
        archive_existing_versions=False  # Keep other versions in their current stages
    )
    
    # Add annotation explaining the transition
    client.update_model_version(
        name=registered_model_name,
        version=current_version_number,
        description=f"Iris classifier trained on {len(X_train)} samples. Accuracy: {accuracy:.4f}"
    )
    
    print(f"   ✅ Model version {current_version_number} moved to Staging")
    
    # === STEP 4: ADD MODEL AND VERSION TAGS ===
    print("\\n🏷️  Adding descriptive tags...")
    
    # Add tags to the registered model (applies to all versions)
    client.set_registered_model_tag(
        name=registered_model_name,
        key="task",
        value="classification"
    )
    
    client.set_registered_model_tag(
        name=registered_model_name,
        key="dataset",
        value="iris"
    )
    
    # Add tags to specific model version
    client.set_model_version_tag(
        name=registered_model_name,
        version=current_version_number,
        key="validation_accuracy",
        value=str(round(accuracy, 4))
    )
    
    client.set_model_version_tag(
        name=registered_model_name,
        version=current_version_number,
        key="training_framework",
        value="scikit-learn"
    )
    
    print("   ✅ Tags added successfully")
    
    # === STEP 5: SIMULATE PROMOTION TO PRODUCTION ===
    print("\\n🚀 Simulating production deployment...")
    
    # In a real scenario, you would run additional validation here
    # For demo, we'll promote directly to production
    
    print("   ✔️  Model validation passed")
    print("   ✔️  Performance benchmarks met")
    print("   ✔️  Security review completed")
    
    # Promote to production
    client.transition_model_version_stage(
        name=registered_model_name,
        version=current_version_number,
        stage="Production",
        archive_existing_versions=True  # Archive previous production versions
    )
    
    print(f"   🎉 Model version {current_version_number} promoted to Production!")
    
    # === STEP 6: VIEW MODEL REGISTRY STATUS ===
    print("\\n📊 Current Model Registry Status:")
    
    # Refresh model versions list
    updated_versions = client.search_model_versions(f"name='{registered_model_name}'")
    
    for version in updated_versions:
        stage_emoji = {
            "None": "⚪",
            "Staging": "🟡", 
            "Production": "🟢",
            "Archived": "🔴"
        }
        
        emoji = stage_emoji.get(version.current_stage, "❓")
        print(f"   {emoji} Version {version.version}: {version.current_stage}")
        
        # Show tags for this version
        version_tags = client.get_model_version(registered_model_name, version.version).tags
        if version_tags:
            for tag_key, tag_value in version_tags.items():
                print(f"        🏷️  {tag_key}: {tag_value}")

except Exception as e:
    print(f"⚠️  Model registry operation failed: {e}")
    print("   This might happen if using file-based tracking without a server")
    print("   Model registry works best with MLflow tracking server")

print("\\n✅ Model Registry demonstration complete!")
print("\\n💡 Model Registry Best Practices:")
print("   ✓ Use descriptive model names and versions")
print("   ✓ Add detailed descriptions and tags")
print("   ✓ Follow proper stage transitions (None → Staging → Production)")
print("   ✓ Archive old versions when promoting new ones")
print("   ✓ Include performance metrics in model descriptions")
print("   ✓ Use tags for categorization and searchability")
print("   ✓ Implement proper validation before production promotion")
print("   ✓ Consider using MLflow tracking server for team collaboration")

🏪 MLflow Model Registry Demonstration...
🆔 Run ID: 1e6590f9b61749c782be833208a47b83
\n🏋️ Training production-ready model...


   📊 Model Performance:




      Accuracy: 0.9667
      F1-Score: 0.9666
\n📝 Registering model in MLflow Model Registry...


Registered model 'iris-classifier-production' already exists. Creating a new version of this model...
Created version '2' of model 'iris-classifier-production'.


   ✅ Model registered as: iris-classifier-production
   📍 Model URI: models:/m-f08c9eba181142e8b14799d644081f52
\n🔄 Working with Model Registry...
\n📋 Registered Model: iris-classifier-production
   Description: No description
   Creation Time: 1756448150627
\n📚 Model Versions (2 total):
   Version 2:
      Stage: None
      Status: READY
      Run ID: 1e6590f9b61749c782be833208a47b83
      Creation Time: 1756448184335
   Version 1:
      Stage: None
      Status: READY
      Run ID: 9c7f1df7423c4d63bc808120e437653e
      Creation Time: 1756448150640
\n🎭 Managing Model Stages...
\n🎯 Working with Version 2...
   📤 Transitioning to Staging...
⚠️  Model registry operation failed: ('cannot represent an object', <Metric: dataset_digest=None, dataset_name=None, key='accuracy', model_id='m-f08c9eba181142e8b14799d644081f52', run_id='1e6590f9b61749c782be833208a47b83', step=0, timestamp=1756448183564, value=0.9666666666666667>)
   This might happen if using file-based tracking without a server
 

## 10. Load and Use Registered Model

Once models are registered, you can load them for inference in various ways:
- **By version**: Load a specific model version
- **By stage**: Load the current production model
- **By run ID**: Load model from a specific training run

This enables consistent model deployment and easy rollbacks if needed.

In [23]:
# Demonstrate loading and using registered models
print("📥 Loading and Using Registered Models...")

# === METHOD 1: LOAD BY STAGE ===
print("\\n🎭 Method 1: Loading model by stage...")

try:
    # Load the current production model
    # This is the most common approach in production systems
    production_model_uri = f"models:/{registered_model_name}/Production"
    production_model = mlflow.pyfunc.load_model(production_model_uri)
    
    print(f"✅ Loaded production model from: {production_model_uri}")
    
    # Make predictions with the loaded model
    print("\\n🔮 Making predictions with production model...")
    
    # Use test data for predictions
    predictions = production_model.predict(X_test)
    
    print(f"   📊 Predictions shape: {predictions.shape}")
    print(f"   🎯 Sample predictions: {predictions[:5]}")
    
    # Calculate and display performance
    prod_accuracy = accuracy_score(y_test, predictions)
    print(f"   📈 Production Model Accuracy: {prod_accuracy:.4f}")
    
except Exception as e:
    print(f"⚠️  Could not load production model: {e}")
    print("   This is normal if no model is in Production stage yet")

# === METHOD 2: LOAD BY VERSION ===
print("\\n🔢 Method 2: Loading model by specific version...")

try:
    # Get the latest version number
    client = MlflowClient()
    latest_versions = client.get_latest_versions(registered_model_name, stages=["None", "Staging", "Production"])
    
    if latest_versions:
        latest_version = latest_versions[0].version
        
        # Load specific version
        version_model_uri = f"models:/{registered_model_name}/{latest_version}"
        version_model = mlflow.pyfunc.load_model(version_model_uri)
        
        print(f"✅ Loaded model version {latest_version} from: {version_model_uri}")
        
        # Make predictions
        version_predictions = version_model.predict(X_test)
        version_accuracy = accuracy_score(y_test, version_predictions)
        print(f"   📈 Version {latest_version} Accuracy: {version_accuracy:.4f}")
        
    else:
        print("   ⚠️  No model versions found")
        
except Exception as e:
    print(f"⚠️  Could not load model by version: {e}")

# === METHOD 3: LOAD BY RUN ID ===
print("\\n🆔 Method 3: Loading model by run ID...")

# For demonstration, let's train a new model and load it by run ID
with mlflow.start_run(run_name="model-for-run-id-demo") as demo_run:
    
    # Train a simple model
    demo_model = LogisticRegression(random_state=42)
    demo_model.fit(X_train, y_train)
    
    # Log the model
    demo_signature = infer_signature(X_train, demo_model.predict(X_train))
    demo_model_info = mlflow.sklearn.log_model(
        sk_model=demo_model,
        artifact_path="demo_model",
        signature=demo_signature
    )
    
    demo_run_id = demo_run.info.run_id
    print(f"   🆔 Demo run ID: {demo_run_id}")

# Load model by run ID
run_model_uri = f"runs:/{demo_run_id}/demo_model"
run_model = mlflow.pyfunc.load_model(run_model_uri)

print(f"✅ Loaded model from run: {run_model_uri}")

# Make predictions
run_predictions = run_model.predict(X_test)
run_accuracy = accuracy_score(y_test, run_predictions)
print(f"   📈 Run Model Accuracy: {run_accuracy:.4f}")

# === MODEL COMPARISON AND ANALYSIS ===
print("\\n📊 Model Loading and Inference Analysis...")

# Create a comprehensive prediction analysis
prediction_results = pd.DataFrame({
    'actual': y_test,
    'actual_name': [target_names[i] for i in y_test]
})

# Add predictions from different loading methods
if 'predictions' in locals():
    prediction_results['production_pred'] = predictions
    prediction_results['production_correct'] = y_test == predictions

if 'version_predictions' in locals():
    prediction_results['version_pred'] = version_predictions
    prediction_results['version_correct'] = y_test == version_predictions

prediction_results['run_pred'] = run_predictions
prediction_results['run_correct'] = y_test == run_predictions

print("\\n📋 Prediction Comparison (first 10 rows):")
print(prediction_results.head(10))

# Calculate consistency between models
if 'predictions' in locals() and 'version_predictions' in locals():
    consistency = np.mean(predictions == version_predictions)
    print(f"\\n🔄 Model Consistency:")
    print(f"   Production vs Version: {consistency:.4f} agreement")

# === PRACTICAL INFERENCE EXAMPLES ===
print("\\n🚀 Practical Inference Examples...")

# Example 1: Single prediction
print("\\n🌸 Example 1: Single flower prediction")
single_flower = X_test[0:1]  # First test sample
single_prediction = run_model.predict(single_flower)[0]
single_actual = y_test[0]

print(f"   Input features: {single_flower[0]}")
print(f"   Predicted class: {target_names[int(single_prediction)]} (index: {single_prediction})")
print(f"   Actual class: {target_names[single_actual]} (index: {single_actual})")
print(f"   Correct: {'✅' if single_prediction == single_actual else '❌'}")

# Example 2: Batch prediction with confidence
print("\\n🌻 Example 2: Batch prediction with confidence")

# Load the sklearn model directly for probability scores
try:
    sklearn_model = mlflow.sklearn.load_model(run_model_uri)
    batch_samples = X_test[:5]
    batch_predictions = sklearn_model.predict(batch_samples)
    batch_probabilities = sklearn_model.predict_proba(batch_samples)
    
    print("   Batch Predictions with Confidence:")
    for i, (pred, prob, actual) in enumerate(zip(batch_predictions, batch_probabilities, y_test[:5])):
        confidence = np.max(prob)
        predicted_class = target_names[int(pred)]
        actual_class = target_names[actual]
        status = '✅' if pred == actual else '❌'
        
        print(f"   Sample {i+1}: {predicted_class} (confidence: {confidence:.3f}) vs {actual_class} {status}")
        
except Exception as e:
    print(f"   ⚠️  Could not load sklearn model directly: {e}")

# === MODEL METADATA AND INFORMATION ===
print("\\n📋 Model Metadata and Information...")

try:
    # Get model information
    model_version_info = client.get_model_version(registered_model_name, latest_version)
    
    print(f"\\n🏷️  Model Registry Information:")
    print(f"   Name: {model_version_info.name}")
    print(f"   Version: {model_version_info.version}")
    print(f"   Stage: {model_version_info.current_stage}")
    print(f"   Status: {model_version_info.status}")
    print(f"   Description: {model_version_info.description}")
    print(f"   Creation Time: {model_version_info.creation_timestamp}")
    print(f"   Source Run: {model_version_info.run_id}")
    
    # Show tags
    if model_version_info.tags:
        print(f"   Tags:")
        for tag_key, tag_value in model_version_info.tags.items():
            print(f"      {tag_key}: {tag_value}")
            
except Exception as e:
    print(f"   ⚠️  Could not retrieve model metadata: {e}")

print("\\n✅ Model loading and inference demonstration complete!")

print("\\n💡 Model Loading Best Practices:")
print("   ✓ Use stage-based loading for production systems")
print("   ✓ Load by version for specific model comparisons")
print("   ✓ Load by run ID for debugging and development")
print("   ✓ Always validate loaded models before use")
print("   ✓ Handle loading exceptions gracefully")
print("   ✓ Log inference requests and results for monitoring")
print("   ✓ Cache loaded models for better performance")
print("   ✓ Implement model health checks in production")

print("\\n🎯 Model Serving Options:")
print("   🌐 REST API: mlflow models serve")
print("   ☁️  Cloud: Deploy to AWS SageMaker, Azure ML, etc.")
print("   🐳 Docker: mlflow models build-docker")
print("   ⚡ Real-time: Apache Spark, Ray Serve")
print("   📱 Edge: Convert to TensorFlow Lite, ONNX")

📥 Loading and Using Registered Models...
\n🎭 Method 1: Loading model by stage...
⚠️  Could not load production model: No versions of model with name 'iris-classifier-production' and stage 'Production' found
   This is normal if no model is in Production stage yet
\n🔢 Method 2: Loading model by specific version...




✅ Loaded model version 2 from: models:/iris-classifier-production/2
   📈 Version 2 Accuracy: 0.9667
\n🆔 Method 3: Loading model by run ID...
   🆔 Demo run ID: 970b1545effe48609e99d3f91e3ce650
✅ Loaded model from run: runs:/970b1545effe48609e99d3f91e3ce650/demo_model
   📈 Run Model Accuracy: 0.9667
\n📊 Model Loading and Inference Analysis...
\n📋 Prediction Comparison (first 10 rows):
   actual actual_name  version_pred  version_correct  run_pred  run_correct
0       0      setosa             0             True         0         True
1       2   virginica             2             True         2         True
2       1  versicolor             1             True         1         True
3       1  versicolor             1             True         1         True
4       0      setosa             0             True         0         True
5       1  versicolor             1             True         1         True
6       0      setosa             0             True         0         True
7     

## 11. Compare Multiple Experiment Runs

One of MLflow's most powerful features is the ability to compare multiple experiments systematically. This enables data-driven decision making and hypothesis testing in ML development.

### What to Compare:
- **Different algorithms**: Logistic Regression vs Random Forest vs SVM
- **Hyperparameter variations**: Different learning rates, regularization values
- **Feature engineering**: Different preprocessing techniques
- **Data variations**: Different train/test splits, feature selections

In [24]:
# Demonstrate comprehensive experiment comparison
print("🔬 Comprehensive Experiment Comparison...")

# === EXPERIMENT 1: DIFFERENT ALGORITHMS ===
print("\\n🧪 Experiment 1: Comparing Different Algorithms...")

algorithms = [
    {
        "name": "Logistic Regression",
        "model": LogisticRegression(random_state=42, max_iter=1000),
        "params": {"algorithm": "logistic_regression", "solver": "lbfgs", "max_iter": 1000}
    },
    {
        "name": "Random Forest",
        "model": RandomForestClassifier(random_state=42, n_estimators=100),
        "params": {"algorithm": "random_forest", "n_estimators": 100, "max_depth": None}
    }
]

algorithm_results = []

for algo_config in algorithms:
    with mlflow.start_run(run_name=f"algorithm-comparison-{algo_config['name'].lower().replace(' ', '-')}"):
        
        print(f"\\n   🚀 Training {algo_config['name']}...")
        
        # Train the model
        model = algo_config['model']
        start_time = time.time()
        model.fit(X_train, y_train)
        training_time = time.time() - start_time
        
        # Make predictions
        y_pred = model.predict(X_test)
        
        # Calculate metrics
        metrics = {
            "accuracy": accuracy_score(y_test, y_pred),
            "precision": precision_score(y_test, y_pred, average='weighted'),
            "recall": recall_score(y_test, y_pred, average='weighted'),
            "f1_score": f1_score(y_test, y_pred, average='weighted'),
            "training_time": training_time
        }
        
        # Log parameters and metrics
        mlflow.log_params(algo_config['params'])
        mlflow.log_metrics(metrics)
        
        # Add tags
        mlflow.set_tag("experiment_type", "algorithm_comparison")
        mlflow.set_tag("algorithm", algo_config['params']['algorithm'])
        
        # Store results for comparison
        result = {
            "algorithm": algo_config['name'],
            "run_id": mlflow.active_run().info.run_id,
            **metrics
        }
        algorithm_results.append(result)
        
        print(f"      ✅ {algo_config['name']} - Accuracy: {metrics['accuracy']:.4f}, Time: {training_time:.4f}s")

# === EXPERIMENT 2: HYPERPARAMETER TUNING ===
print("\\n🎛️  Experiment 2: Logistic Regression Hyperparameter Tuning...")

# Test different C values (regularization strength)
c_values = [0.1, 0.5, 1.0, 2.0, 5.0]
hyperparameter_results = []

for c_value in c_values:
    with mlflow.start_run(run_name=f"lr-hyperparameter-C-{c_value}"):
        
        print(f"\\n   🔧 Testing C = {c_value}...")
        
        # Train model with specific C value
        model = LogisticRegression(C=c_value, random_state=42, max_iter=1000)
        model.fit(X_train, y_train)
        
        # Evaluate
        y_pred = model.predict(X_test)
        
        metrics = {
            "accuracy": accuracy_score(y_test, y_pred),
            "precision": precision_score(y_test, y_pred, average='weighted'),
            "recall": recall_score(y_test, y_pred, average='weighted'),
            "f1_score": f1_score(y_test, y_pred, average='weighted')
        }
        
        # Log parameters and metrics
        mlflow.log_params({
            "algorithm": "logistic_regression",
            "C": c_value,
            "solver": "lbfgs",
            "regularization": "l2"
        })
        mlflow.log_metrics(metrics)
        
        # Tags for organization
        mlflow.set_tag("experiment_type", "hyperparameter_tuning")
        mlflow.set_tag("parameter_tuned", "C")
        
        result = {
            "C": c_value,
            "run_id": mlflow.active_run().info.run_id,
            **metrics
        }
        hyperparameter_results.append(result)
        
        print(f"      📊 C={c_value}: Accuracy = {metrics['accuracy']:.4f}")

# === EXPERIMENT 3: FEATURE ENGINEERING ===
print("\\n⚙️  Experiment 3: Feature Engineering Comparison...")

from sklearn.preprocessing import StandardScaler, MinMaxScaler

feature_configs = [
    {"name": "No Scaling", "scaler": None},
    {"name": "Standard Scaling", "scaler": StandardScaler()},
    {"name": "MinMax Scaling", "scaler": MinMaxScaler()}
]

feature_results = []

for config in feature_configs:
    with mlflow.start_run(run_name=f"feature-engineering-{config['name'].lower().replace(' ', '-')}"):
        
        print(f"\\n   🔬 Testing {config['name']}...")
        
        # Prepare data
        if config['scaler'] is not None:
            X_train_scaled = config['scaler'].fit_transform(X_train)
            X_test_scaled = config['scaler'].transform(X_test)
        else:
            X_train_scaled = X_train
            X_test_scaled = X_test
        
        # Train model
        model = LogisticRegression(random_state=42, max_iter=1000)
        model.fit(X_train_scaled, y_train)
        
        # Evaluate
        y_pred = model.predict(X_test_scaled)
        
        metrics = {
            "accuracy": accuracy_score(y_test, y_pred),
            "precision": precision_score(y_test, y_pred, average='weighted'),
            "recall": recall_score(y_test, y_pred, average='weighted'),
            "f1_score": f1_score(y_test, y_pred, average='weighted')
        }
        
        # Log parameters and metrics
        mlflow.log_params({
            "algorithm": "logistic_regression",
            "feature_scaling": config['name'].lower().replace(' ', '_'),
            "scaler_type": type(config['scaler']).__name__ if config['scaler'] else "none"
        })
        mlflow.log_metrics(metrics)
        
        # Tags
        mlflow.set_tag("experiment_type", "feature_engineering")
        mlflow.set_tag("scaling_method", config['name'])
        
        result = {
            "scaling": config['name'],
            "run_id": mlflow.active_run().info.run_id,
            **metrics
        }
        feature_results.append(result)
        
        print(f"      📈 {config['name']}: Accuracy = {metrics['accuracy']:.4f}")

# === ANALYZE AND COMPARE RESULTS ===
print("\\n📊 Comprehensive Results Analysis...")

# Create comparison DataFrames
print("\\n🏆 Algorithm Comparison Results:")
algo_df = pd.DataFrame(algorithm_results)
print(algo_df.round(4))

print("\\n🎛️  Hyperparameter Tuning Results:")
hyperparam_df = pd.DataFrame(hyperparameter_results)
print(hyperparam_df.round(4))

print("\\n⚙️  Feature Engineering Results:")
feature_df = pd.DataFrame(feature_results)
print(feature_df.round(4))

# Find best performers
print("\\n🥇 Best Performers Summary:")

best_algorithm = algo_df.loc[algo_df['accuracy'].idxmax()]
print(f"   🎯 Best Algorithm: {best_algorithm['algorithm']} (Accuracy: {best_algorithm['accuracy']:.4f})")

best_hyperparameter = hyperparam_df.loc[hyperparam_df['accuracy'].idxmax()]
print(f"   🎛️  Best C Value: {best_hyperparameter['C']} (Accuracy: {best_hyperparameter['accuracy']:.4f})")

best_feature = feature_df.loc[feature_df['accuracy'].idxmax()]
print(f"   ⚙️  Best Scaling: {best_feature['scaling']} (Accuracy: {best_feature['accuracy']:.4f})")

# === PROGRAMMATIC EXPERIMENT QUERYING ===
print("\\n🔍 Programmatic Experiment Querying...")

# Get experiment ID
experiment = mlflow.get_experiment_by_name(experiment_name)
experiment_id = experiment.experiment_id

# Search runs programmatically
print(f"\\n📋 Querying experiment: {experiment_name} (ID: {experiment_id})")

# Query all runs in this experiment
all_runs = mlflow.search_runs(experiment_ids=[experiment_id])
print(f"   📈 Total runs in experiment: {len(all_runs)}")

# Filter runs by tag
algorithm_comparison_runs = mlflow.search_runs(
    experiment_ids=[experiment_id],
    filter_string="tags.experiment_type = 'algorithm_comparison'"
)
print(f"   🧪 Algorithm comparison runs: {len(algorithm_comparison_runs)}")

# Get best run by metric
if len(all_runs) > 0:
    best_run = all_runs.loc[all_runs['metrics.accuracy'].idxmax()]
    print(f"\\n🏆 Overall Best Run:")
    print(f"   Run ID: {best_run['run_id']}")
    print(f"   Accuracy: {best_run['metrics.accuracy']:.4f}")
    print(f"   Algorithm: {best_run.get('params.algorithm', 'Unknown')}")
    print(f"   Tags: {best_run.get('tags.experiment_type', 'Unknown')}")

# === VISUALIZATION RECOMMENDATIONS ===
print("\\n📊 Visualization and Analysis Recommendations:")
print("\\n💡 To visualize these results, you can:")
print("   📈 Use MLflow UI to compare runs side-by-side")
print("   📊 Create custom plots comparing metrics across runs")
print("   🎯 Use parallel coordinates plots for hyperparameter analysis")
print("   📋 Export run data to pandas for custom analysis")
print("   🔍 Filter runs by tags and parameters for focused comparisons")

print("\\n🎯 Advanced Analysis Ideas:")
print("   📊 Statistical significance testing between models")
print("   🔄 Cross-validation across multiple runs")
print("   📈 Learning curves for training progression")
print("   🎛️  Hyperparameter optimization with tools like Optuna")
print("   📋 A/B testing frameworks for model comparison")

print("\\n✅ Comprehensive experiment comparison complete!")
print("\\n🚀 Next Steps:")
print("   1. Open MLflow UI to visually compare runs")
print("   2. Export best model to production")
print("   3. Set up automated hyperparameter tuning")
print("   4. Implement model monitoring and drift detection")
print("   5. Scale experiments with MLflow Projects")

🔬 Comprehensive Experiment Comparison...
\n🧪 Experiment 1: Comparing Different Algorithms...
\n   🚀 Training Logistic Regression...
      ✅ Logistic Regression - Accuracy: 0.9667, Time: 0.0990s
\n   🚀 Training Random Forest...
      ✅ Logistic Regression - Accuracy: 0.9667, Time: 0.0990s
\n   🚀 Training Random Forest...
      ✅ Random Forest - Accuracy: 0.9000, Time: 0.3229s
\n🎛️  Experiment 2: Logistic Regression Hyperparameter Tuning...
\n   🔧 Testing C = 0.1...
      📊 C=0.1: Accuracy = 0.9667
\n   🔧 Testing C = 0.5...
      ✅ Random Forest - Accuracy: 0.9000, Time: 0.3229s
\n🎛️  Experiment 2: Logistic Regression Hyperparameter Tuning...
\n   🔧 Testing C = 0.1...
      📊 C=0.1: Accuracy = 0.9667
\n   🔧 Testing C = 0.5...
      📊 C=0.5: Accuracy = 0.9667
\n   🔧 Testing C = 1.0...
      📊 C=1.0: Accuracy = 0.9667
\n   🔧 Testing C = 2.0...
      📊 C=0.5: Accuracy = 0.9667
\n   🔧 Testing C = 1.0...
      📊 C=1.0: Accuracy = 0.9667
\n   🔧 Testing C = 2.0...
      📊 C=2.0: Accuracy = 0.96

## 🎉 Conclusion and Next Steps

Congratulations! You've completed a comprehensive MLflow tutorial covering all major aspects of experiment tracking and model management.

### 📚 What You've Learned:

1. **✅ MLflow Setup**: Configured tracking URI and experiment organization
2. **✅ Parameter Logging**: Tracked hyperparameters and configuration settings  
3. **✅ Metric Tracking**: Logged performance metrics and training progress
4. **✅ Artifact Management**: Saved models, plots, and data files
5. **✅ Model Registry**: Registered, versioned, and staged models
6. **✅ Model Loading**: Loaded models for inference and production use
7. **✅ Experiment Comparison**: Systematically compared multiple approaches

### 🚀 Production Best Practices:

- **🔧 Set up MLflow Tracking Server** for team collaboration
- **📊 Use consistent experiment naming** and tagging strategies  
- **🎯 Implement automated model validation** before production deployment
- **📈 Monitor model performance** in production with drift detection
- **🔄 Set up CI/CD pipelines** with MLflow for automated training and deployment
- **📋 Document experiments** with detailed descriptions and metadata
- **🛡️ Implement model governance** with approval workflows

### 🎯 Next Learning Steps:

1. **MLflow Projects**: Package your ML code for reproducible runs
2. **MLflow Plugins**: Extend MLflow with custom tracking backends
3. **Model Serving**: Deploy models with `mlflow models serve`
4. **Docker Integration**: Containerize models with `mlflow models build-docker`
5. **Cloud Deployment**: Deploy to AWS SageMaker, Azure ML, or GCP
6. **Advanced Tracking**: Custom metrics, nested runs, and system metrics
7. **Integration**: Connect with Apache Airflow, Kubernetes, and Spark

### 🔗 Useful Resources:

- **📖 MLflow Documentation**: https://mlflow.org/docs/latest/
- **🎓 MLflow Tutorials**: https://mlflow.org/docs/latest/tutorials-and-examples/
- **💻 GitHub Repository**: https://github.com/mlflow/mlflow
- **🏪 MLflow Model Registry**: https://mlflow.org/docs/latest/model-registry.html
- **🚀 Deployment Guide**: https://mlflow.org/docs/latest/models.html#deployment

### 💡 Remember:

> *"The best MLflow setup is the one your team actually uses consistently."*

Start simple, iterate, and gradually add complexity as your ML operations mature. Focus on creating reproducible experiments and maintainable ML workflows.

**Happy experimenting! 🧪✨**