# MLflow model registry

In this notebook, we will explore the MLflow model registry component. The MLflow model registry is a centralized repository where we can store, annotate, and manage our machine learning models. It provides features like versioning, stage transitions (e.g., Staging, Production), and model lineage, which help in managing the lifecycle of our models.

In [1]:
import mlflow
import mlflow.sklearn
from mlflow.models import infer_signature
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

import warnings
warnings.filterwarnings('ignore')

In [2]:
# Load dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

### Logging to a tracking server
First we need to ensure that we have an MLflow Tracking Server running, as the Model Registry is tightly integrated with the tracking server.

In [3]:
# Set the tracking URI (optional if you're running a local server)
mlflow.set_tracking_uri("http://localhost:5000")

- `mlflow.set_tracking_uri("http://localhost:5000")`: Connects to the MLflow Tracking Server. If we are running a local server, we nned to ensure it is up and running.

### Setting up the experiment

In [4]:
# Create a new experiment
mlflow.set_experiment("MLflow Model Registry Example")

2024/08/29 16:53:35 INFO mlflow.tracking.fluent: Experiment with name 'MLflow Model Registry Example' does not exist. Creating a new experiment.


<Experiment: artifact_location='file:///C:/Users/israe/Documents/Codes/Notebooks/mlruns/678057457486770417', creation_time=1724943215764, experiment_id='678057457486770417', last_update_time=1724943215764, lifecycle_stage='active', name='MLflow Model Registry Example', tags={}>

## Register models
we will demonstrate two methods to register models, using `mlflow.log_model()` and `mlflow.register_model()`. These methods create in the `mlruns` directory:
- **Model artifacts**: Both methods log the model artifacts (like `.pkl` files for scikit-learn models) in the `mlruns` directory. This directory contains subdirectories for each experiment and run, organized by experiment ID and run ID. Within each run directory, we will find:
  - The `artifacts` directory, which contains the logged model files.
  - Metadata files like `meta.yaml` that describe the run and the logged model.
- **Model registry metadata**: When a model is registered, metadata about the model version, aliases, and other details are stored under the registered model name folder in the `mlruns/models` directory. This is managed by the MLflow Tracking Server and is separate from the experiment directories.


### Method 1: Register a model using `mlflow.log_model()`
In this method, we use `mlflow.log_model()` to log and register a random forest model in one step. We will use this method when we want to log and register a model in one step. It’s less flexible but faster, straightforward and efficient for most use cases.

In [5]:
# Start a new MLflow run
with mlflow.start_run(run_name="RandomForest_Model") as run:
    # Train a Random Forest model
    rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
    rf_model.fit(X_train, y_train)

    # Infer the model signature
    y_pred = rf_model.predict(X_test)
    signature = infer_signature(X_test, y_pred)

    # Log the model to MLflow
    mlflow.sklearn.log_model(
        sk_model=rf_model, 
        artifact_path="random_forest_model", 
        signature=signature, 
        registered_model_name="random-forest-class-model"
    )

Successfully registered model 'random-forest-class-model'.
2024/08/29 16:53:43 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: random-forest-class-model, version 1
Created version '1' of model 'random-forest-class-model'.
2024/08/29 16:53:43 INFO mlflow.tracking._tracking_service.client: 🏃 View run RandomForest_Model at: http://localhost:5000/#/experiments/678057457486770417/runs/7c72da2a0bcd464d824b8de66d06619a.
2024/08/29 16:53:43 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://localhost:5000/#/experiments/678057457486770417.


`mlflow.log_model()` function logs a model as an artifact in MLflow and can optionally register it with the MLflow model registry.
  - `sk_model`: The trained model we want to log (in this case, the `RandomForestClassifier`).
  - `artifact_path`: The path within the MLflow run where the model artifact will be stored.
  - `signature`: (Optional) The model signature, which is an MLflow object that captures the input and output schema of the model. It helps to ensure consistency between training and serving.
  - `registered_model_name`: (Optional) If provided, MLflow registers the model in the model registry under the specified name. When we first register a model with a particular name, it becomes version 1. Each time we register a new model with the same name, MLflow automatically increments the version number (e.g., version 2, version 3, and so on). Model versions can have tags, which are useful for tracking specific attributes, like whether pre-deployment checks have passed.

### Method 2: Register a model using `mlflow.register_model()`
In this method, we first log the model using `mlflow.log_model()` and then manually register it using `mlflow.register_model()`. We will use this method when we want more control over the registration process. This is useful if we want to log the model first and decide to register it later or if we want to handle model registration in a separate step. It is more flexible, allowing for delayed or conditional registration.

In [6]:
# Start a new MLflow run
with mlflow.start_run(run_name="LogisticRegression_Model") as run:
    # Train a Logistic Regression model
    lr_model = LogisticRegression(max_iter=200, random_state=42)
    lr_model.fit(X_train, y_train)

    # Infer the model signature
    y_pred = rf_model.predict(X_test)
    signature = infer_signature(X_test, y_pred)

    # Log the model without registering it
    mlflow.sklearn.log_model(
        sk_model=lr_model, 
        artifact_path="logistic_regression_model",
        signature=signature
    )

    run_id_lr1 = run.info.run_id
    # Manually register the model in the model registry
    mlflow.register_model(
        model_uri=f"runs:/{run.info.run_id}/logistic_regression_model", 
        name="logistic-regression-class-model"
    )

Successfully registered model 'logistic-regression-class-model'.
2024/08/29 16:53:48 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: logistic-regression-class-model, version 1
Created version '1' of model 'logistic-regression-class-model'.
2024/08/29 16:53:48 INFO mlflow.tracking._tracking_service.client: 🏃 View run LogisticRegression_Model at: http://localhost:5000/#/experiments/678057457486770417/runs/aff94cc3bc8843f08a785568ad964b8e.
2024/08/29 16:53:48 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://localhost:5000/#/experiments/678057457486770417.


`mlflow.register_model()` function explicitly registers a model that has already been logged as an artifact in an MLflow run.
  - `model_uri`: The URI of the logged model that we want to register. This typically points to a specific run's artifact.
  - `name`: The name under which we want to register the model in the model registry. When we register a model with a new name, it becomes version 1. Each time we register another model with the same name, MLflow automatically assigns the next version number (e.g., version 2, version 3, and so on). Model versions can have tags, which are useful for tracking important details, such as whether the model has passed pre-deployment checks.

## Adding a new version to the model registry
In MLflow, adding a new version to the model registry involves registering a new model artifact with the same model name but different content or configuration. Each new version of a model in the registry is associated with a unique, monotonically increasing version number.
- **Version increment**: Every time we register a new model under an existing model name, MLflow assigns it a new version number. This version number is incremented sequentially. For example, if we previously registered a model as version 1, a new registration will automatically be assigned version 2.
- **Same model and hyperparameters**: If we log and register a model with the exact same parameters, hyperparameters, and content as an existing model, MLflow will still create a new version. This is because each registration is treated as a new entry, even if the model content hasn’t changed. The versioning is based on registration events, not model differences.

Now, we will demonstrate how to add a new version of a model to the MLflow model registry. We will use the logistic regression classifier with different hyperparameters compared to the previous version.

In [7]:
# Start a new MLflow run
with mlflow.start_run(run_name="LogisticRegression_Model_Version_2") as run:
    # Train a new Logistic Regression model with different hyperparameters
    lr_model_v2 = LogisticRegression(max_iter=300, C=0.5, random_state=42)
    lr_model_v2.fit(X_train, y_train)

    # Infer the model signature
    y_pred = lr_model_v2.predict(X_test)
    signature = infer_signature(X_test, y_pred)

    # Log the new model
    mlflow.sklearn.log_model(
        sk_model=lr_model_v2, 
        artifact_path="logistic_regression_model", 
        signature=signature
    )

    run_id_lr2 = run.info.run_id
    # Register the new model version
    mlflow.register_model(
        model_uri=f"runs:/{run.info.run_id}/logistic_regression_model", 
        name="logistic-regression-class-model"
    )

Registered model 'logistic-regression-class-model' already exists. Creating a new version of this model...
2024/08/29 16:53:52 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: logistic-regression-class-model, version 2
Created version '2' of model 'logistic-regression-class-model'.
2024/08/29 16:53:52 INFO mlflow.tracking._tracking_service.client: 🏃 View run LogisticRegression_Model_Version_2 at: http://localhost:5000/#/experiments/678057457486770417/runs/b1b344bf325c47b2978580957bc50fd7.
2024/08/29 16:53:52 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://localhost:5000/#/experiments/678057457486770417.


## Loading a registered model
Now, we will demonstrate how to load a registered model using two methods.

### Method 1: Load via tracking server
We can load a model directly from the tracking server using the model artifact path relative to a specific run. This method requires the run ID and relative path to the model artifact (The `artifact_path` we used to log the model).

In [8]:
# Load the model from the Tracking Server
model_via_tracking_server = mlflow.sklearn.load_model(f"runs:/{run_id_lr2}/logistic_regression_model")

# Make predictions and print 5 values
y_pred = model_via_tracking_server.predict(X_test)
print("Predictions:", y_pred[:5])

Predictions: [1 0 2 1 1]


### Method 2: Load via name and version
We can load a model using its registered name and a specific version number from the model registry. It is convenient when we want to access a model version directly by its name (The `name`/`registered_model_name` we used to regiater the model) and version without needing the run details.

##### Example 1: Load version 1

In [9]:
# Load the model from the Model Registry
model_via_name_version_1 = mlflow.sklearn.load_model(f"models:/logistic-regression-class-model/1")

# Make predictions and print 5 values
y_pred = model_via_name_version_1.predict(X_test)
print("Predictions for version 1:", y_pred[:5])

Predictions for version 1: [1 0 2 1 1]


##### Example 1: Load version 2

In [10]:
# Load the model from the Model Registry
model_via_name_version_2 = mlflow.sklearn.load_model(f"models:/logistic-regression-class-model/2")

# Make predictions and print 5 values
y_pred = model_via_name_version_2.predict(X_test)
print("Predictions for version 2:", y_pred[:5])

Predictions for version 2: [1 0 2 1 1]
