# Phase 3.1: Registering Models in MLflow Model Registry

This comprehensive notebook demonstrates:
1. **Training and Logging a Model** - Build and track a model
2. **Model Registration** - Register models in the Model Registry
3. **Adding Descriptions** - Document your models
4. **Auto-Registration** - Register during logging

## What is the Model Registry?

The **Model Registry** is MLflow's central hub for managing the full lifecycle of ML models. Think of it as a "model database" where you can:

- **Store** different versions of your models
- **Track** which models are in production
- **Document** what each model does
- **Compare** different model versions

## Why Use Model Registry?

| Without Registry | With Registry |
|-----------------|---------------|
| Models scattered in folders | Centralized model storage |
| No version tracking | Full version history |
| Manual deployment | Stage-based deployment |
| No documentation | Rich model metadata |

## Learning Goals
- Understand what the Model Registry is
- Learn to register models from runs
- Know how to add descriptions to models
- Use auto-registration during logging

## Step 1: Import Libraries

We'll import MLflow along with the MlflowClient for registry operations.

In [None]:
# mlflow: Main library for experiment tracking
import mlflow
import mlflow.sklearn

# MlflowClient: Provides programmatic access to MLflow
# Used for registry operations like updating descriptions
from mlflow.tracking import MlflowClient

# sklearn for model building
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Data handling
import pandas as pd
import os

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

print("All libraries imported successfully!")
print("Ready to learn about Model Registry!")

## Step 2: Connect to MLflow

Set up our connection and create the MlflowClient for registry operations.

In [None]:
# Get MLflow tracking server URL
TRACKING_URI = os.getenv("MLFLOW_TRACKING_URI", "http://localhost:5000")

# Connect to MLflow
mlflow.set_tracking_uri(TRACKING_URI)

# Set experiment
mlflow.set_experiment("phase3-model-registry")

# Define a name for our model in the registry
# This is like a "product name" - all versions will be under this name
MODEL_NAME = "iris-classifier"

# Create MlflowClient for advanced operations
# The client gives us programmatic access to the registry
client = MlflowClient()

print(f"Connected to MLflow at: {TRACKING_URI}")
print(f"Experiment: phase3-model-registry")
print(f"Model name: {MODEL_NAME}")

## Step 3: Prepare Data

Load and split the Iris dataset for training and testing.

In [None]:
# Load Iris dataset
iris = load_iris()

# Create DataFrame with feature names
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2, 
    random_state=42
)

print("Data loaded and split!")
print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")

## Step 4: Train and Log a Model

First, let's train a model and log it to MLflow (without registering it yet).

In [None]:
print("="*60)
print("[1] Training and Logging Model")
print("="*60)

# Start an MLflow run
with mlflow.start_run(run_name="registry-demo") as run:
    # Train a RandomForest model
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    
    # Evaluate the model
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    
    # Log parameters and metrics
    mlflow.log_param("n_estimators", 100)
    mlflow.log_metric("accuracy", accuracy)
    
    # Create signature and log model
    signature = mlflow.models.infer_signature(X_train, model.predict(X_train))
    mlflow.sklearn.log_model(model, "model", signature=signature)
    
    # Save run_id for later use
    saved_run_id = run.info.run_id
    
    print(f"\nModel trained and logged!")
    print(f"Run ID: {saved_run_id}")
    print(f"Accuracy: {accuracy:.4f}")

## Step 5: Register the Model

Now let's register the logged model in the Model Registry. This creates a "registered model" that can have multiple versions.

**Model URI Format:**
- `runs:/<run_id>/<artifact_path>` - Points to a model in a specific run

In [None]:
print("\n" + "="*60)
print("[2] Registering Model in Registry")
print("="*60)

# Create the model URI pointing to our logged model
# Format: runs:/<run_id>/<artifact_path>
model_uri = f"runs:/{saved_run_id}/model"

print(f"\nModel URI: {model_uri}")

# Register the model using mlflow.register_model()
# This creates a new registered model (or adds a version to existing one)
result = mlflow.register_model(
    model_uri=model_uri,      # Where the model is stored
    name=MODEL_NAME           # Name in the registry
)

print(f"\nModel registered successfully!")
print(f"Registered model name: {result.name}")
print(f"Version: {result.version}")
print(f"Status: {result.status}")

## Step 6: Add Descriptions to Model

Good documentation is crucial! Let's add descriptions to both:
- The **registered model** (overall description)
- The **specific version** (version-specific notes)

In [None]:
print("\n" + "="*60)
print("[3] Adding Descriptions")
print("="*60)

# Add description to the registered model (applies to all versions)
# This is like the "product description"
client.update_registered_model(
    name=MODEL_NAME,
    description=(
        "Iris flower species classifier using Random Forest algorithm. "
        "Predicts setosa, versicolor, or virginica based on sepal/petal measurements."
    )
)
print("\nAdded model description (overall).")

# Add description to the specific version
# This is like "release notes" for this version
client.update_model_version(
    name=MODEL_NAME,
    version=result.version,
    description=f"Initial version trained with 100 estimators. Accuracy: {accuracy:.4f}"
)
print(f"Added version {result.version} description.")

print("\nDescriptions added successfully!")

## Step 7: Alternative - Auto-Registration During Logging

You can also register a model automatically when logging it by using the `registered_model_name` parameter. This is convenient when you know you want to register the model immediately.

In [None]:
print("\n" + "="*60)
print("[4] Alternative: Auto-Registration During Logging")
print("="*60)

# Train a slightly different model
with mlflow.start_run(run_name="auto-register-demo"):
    # Train with different hyperparameters
    model2 = RandomForestClassifier(n_estimators=150, random_state=42)
    model2.fit(X_train, y_train)
    
    # Evaluate
    accuracy2 = accuracy_score(y_test, model2.predict(X_test))
    mlflow.log_param("n_estimators", 150)
    mlflow.log_metric("accuracy", accuracy2)
    
    # Log AND register in one step!
    # The registered_model_name parameter auto-registers the model
    mlflow.sklearn.log_model(
        model2,
        "model",
        registered_model_name=MODEL_NAME  # This triggers auto-registration!
    )
    
    print(f"\nModel logged and registered automatically!")
    print(f"Accuracy: {accuracy2:.4f}")
    print(f"\nThis created a new VERSION of '{MODEL_NAME}'")

## Step 8: View Registered Models

Let's see all the models we have registered and their versions.

In [None]:
print("\n" + "="*60)
print("Registered Models Overview")
print("="*60)

# Search for all registered models
for rm in client.search_registered_models():
    print(f"\nModel: {rm.name}")
    
    # Show description (truncated if too long)
    if rm.description:
        desc = rm.description[:80] + "..." if len(rm.description) > 80 else rm.description
        print(f"Description: {desc}")
    else:
        print("Description: (none)")
    
    # Show all versions
    print("Versions:")
    for version in rm.latest_versions:
        print(f"  - Version {version.version}: Stage = {version.current_stage}")

## Summary: Model Registration

### Two Ways to Register Models

**Method 1: Register After Logging**
```python
# Log the model first
mlflow.sklearn.log_model(model, "model")

# Then register it
mlflow.register_model(f"runs:/{run_id}/model", "model-name")
```

**Method 2: Auto-Register During Logging**
```python
# Log AND register in one step
mlflow.sklearn.log_model(
    model, 
    "model",
    registered_model_name="model-name"  # Auto-registers!
)
```

### Key Concepts

| Concept | Description |
|---------|-------------|
| **Registered Model** | A named model in the registry (like a product) |
| **Model Version** | A specific iteration of the model |
| **Model URI** | Address to locate a model (runs:/... or models:/...) |
| **MlflowClient** | API for advanced registry operations |

In [None]:
print("="*60)
print("Model Registration Tutorial Complete!")
print("="*60)
print(f"\nView your model at: {TRACKING_URI}/#/models/{MODEL_NAME}")
print("\nWhat you learned:")
print("  1. What the Model Registry is and why to use it")
print("  2. How to register models using mlflow.register_model()")
print("  3. How to add descriptions to models and versions")
print("  4. How to auto-register during logging")
print("  5. How to view registered models programmatically")