# Registering a model into MLflow registry

This notebook contains a toy example to see how to register a model in the MLflow Registry. In this example, we will use the models logged in the previously executed notebooks, and we will register the best models for each experiment.

The first step is to import all the required libraries

In [2]:
# The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
# P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
# Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from urllib.parse import urlparse
import mlflow
import mlflow.sklearn
import boto3

import logging

logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)

warnings.filterwarnings("ignore")
np.random.seed(40)

Before registering a model in the MLflow Registry, it is recommended to search for the best model within all the runs and only register that one instead of registering every model.

Once we have performed all the runs in an experiment, we can access the metadata of the experiment using its ID and the method ``search_runs`` provided by MLflow. Check the [documentation](https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.search_runs) for further details.

Go to the MLflow UI and get the experiment ID of your choice.

In [None]:
mlflow.set_tracking_uri("http://localhost:80")

In [10]:
# Search all runs in experiment_id
experiment_id = '1'  # in this case experiment_id 1 = 'winequality_elasticnet_autolog'
mlflow.search_runs([experiment_id])

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.training_mse,metrics.training_r2_score,metrics.training_mae,metrics.training_score,...,params.copy_X,params.l1_ratio,params.tol,params.max_iter,tags.mlflow.source.name,tags.mlflow.user,tags.mlflow.source.type,tags.mlflow.log-model.history,tags.estimator_class,tags.estimator_name
0,49e52e988cd1436eb48d99104ae0d407,1,FINISHED,s3://mlflow-bucket/mlflow/1/49e52e988cd1436eb4...,2020-12-08 17:56:40.503000+00:00,2020-12-08 17:56:41.145000+00:00,0.574932,0.091266,0.621656,0.091266,...,True,0.6,0.0001,1000,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...,mcanizo,LOCAL,"[{""run_id"": ""49e52e988cd1436eb48d99104ae0d407""...",sklearn.linear_model._coordinate_descent.Elast...,ElasticNet
1,c823e563d0c84de78b140f5fafc676cf,1,FINISHED,s3://mlflow-bucket/mlflow/1/c823e563d0c84de78b...,2020-12-08 17:56:39.318000+00:00,2020-12-08 17:56:40.241000+00:00,0.606808,0.040883,0.643434,0.040883,...,True,0.8,0.0001,1000,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...,mcanizo,LOCAL,"[{""run_id"": ""c823e563d0c84de78b140f5fafc676cf""...",sklearn.linear_model._coordinate_descent.Elast...,ElasticNet
2,1f5ab48a5cea4118ba8538268c5eb283,1,FINISHED,s3://mlflow-bucket/mlflow/1/1f5ab48a5cea4118ba...,2020-12-08 17:56:38.275000+00:00,2020-12-08 17:56:39.079000+00:00,0.553625,0.124943,0.606329,0.124943,...,True,0.5,0.0001,1000,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...,mcanizo,LOCAL,"[{""run_id"": ""1f5ab48a5cea4118ba8538268c5eb283""...",sklearn.linear_model._coordinate_descent.Elast...,ElasticNet
3,dc299a55a6c74c9085fcdd9ca122b9d1,1,FINISHED,s3://mlflow-bucket/mlflow/1/dc299a55a6c74c9085...,2020-12-08 17:56:37.479000+00:00,2020-12-08 17:56:38.082000+00:00,0.549027,0.132211,0.603006,0.132211,...,True,0.77,0.0001,1000,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...,mcanizo,LOCAL,"[{""run_id"": ""dc299a55a6c74c9085fcdd9ca122b9d1""...",sklearn.linear_model._coordinate_descent.Elast...,ElasticNet
4,e458c99ffadb4cb9a07b9274c2be7422,1,FINISHED,s3://mlflow-bucket/mlflow/1/e458c99ffadb4cb9a0...,2020-12-08 17:56:36.709000+00:00,2020-12-08 17:56:37.353000+00:00,0.475265,0.248799,0.544079,0.248799,...,True,0.7,0.0001,1000,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...,mcanizo,LOCAL,"[{""run_id"": ""e458c99ffadb4cb9a07b9274c2be7422""...",sklearn.linear_model._coordinate_descent.Elast...,ElasticNet
5,d4c7e5ac674643e8bd37d69353109ace,1,FINISHED,s3://mlflow-bucket/mlflow/1/d4c7e5ac674643e8bd...,2020-12-08 17:56:35.542000+00:00,2020-12-08 17:56:36.490000+00:00,0.47483,0.249486,0.543255,0.249486,...,True,0.2,0.0001,1000,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...,mcanizo,LOCAL,"[{""run_id"": ""d4c7e5ac674643e8bd37d69353109ace""...",sklearn.linear_model._coordinate_descent.Elast...,ElasticNet


We can make some queries to order the dataframe by column

In [9]:
mlflow.search_runs([experiment_id], order_by=["metrics.mae DESC"])

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.training_mse,metrics.training_r2_score,metrics.training_mae,metrics.training_score,...,params.copy_X,params.l1_ratio,params.tol,params.max_iter,tags.mlflow.source.name,tags.mlflow.user,tags.mlflow.source.type,tags.mlflow.log-model.history,tags.estimator_class,tags.estimator_name
0,c823e563d0c84de78b140f5fafc676cf,1,FINISHED,s3://mlflow-bucket/mlflow/1/c823e563d0c84de78b...,2020-12-08 17:56:39.318000+00:00,2020-12-08 17:56:40.241000+00:00,0.606808,0.040883,0.643434,0.040883,...,True,0.8,0.0001,1000,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...,mcanizo,LOCAL,"[{""run_id"": ""c823e563d0c84de78b140f5fafc676cf""...",sklearn.linear_model._coordinate_descent.Elast...,ElasticNet
1,49e52e988cd1436eb48d99104ae0d407,1,FINISHED,s3://mlflow-bucket/mlflow/1/49e52e988cd1436eb4...,2020-12-08 17:56:40.503000+00:00,2020-12-08 17:56:41.145000+00:00,0.574932,0.091266,0.621656,0.091266,...,True,0.6,0.0001,1000,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...,mcanizo,LOCAL,"[{""run_id"": ""49e52e988cd1436eb48d99104ae0d407""...",sklearn.linear_model._coordinate_descent.Elast...,ElasticNet
2,1f5ab48a5cea4118ba8538268c5eb283,1,FINISHED,s3://mlflow-bucket/mlflow/1/1f5ab48a5cea4118ba...,2020-12-08 17:56:38.275000+00:00,2020-12-08 17:56:39.079000+00:00,0.553625,0.124943,0.606329,0.124943,...,True,0.5,0.0001,1000,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...,mcanizo,LOCAL,"[{""run_id"": ""1f5ab48a5cea4118ba8538268c5eb283""...",sklearn.linear_model._coordinate_descent.Elast...,ElasticNet
3,dc299a55a6c74c9085fcdd9ca122b9d1,1,FINISHED,s3://mlflow-bucket/mlflow/1/dc299a55a6c74c9085...,2020-12-08 17:56:37.479000+00:00,2020-12-08 17:56:38.082000+00:00,0.549027,0.132211,0.603006,0.132211,...,True,0.77,0.0001,1000,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...,mcanizo,LOCAL,"[{""run_id"": ""dc299a55a6c74c9085fcdd9ca122b9d1""...",sklearn.linear_model._coordinate_descent.Elast...,ElasticNet
4,e458c99ffadb4cb9a07b9274c2be7422,1,FINISHED,s3://mlflow-bucket/mlflow/1/e458c99ffadb4cb9a0...,2020-12-08 17:56:36.709000+00:00,2020-12-08 17:56:37.353000+00:00,0.475265,0.248799,0.544079,0.248799,...,True,0.7,0.0001,1000,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...,mcanizo,LOCAL,"[{""run_id"": ""e458c99ffadb4cb9a07b9274c2be7422""...",sklearn.linear_model._coordinate_descent.Elast...,ElasticNet
5,d4c7e5ac674643e8bd37d69353109ace,1,FINISHED,s3://mlflow-bucket/mlflow/1/d4c7e5ac674643e8bd...,2020-12-08 17:56:35.542000+00:00,2020-12-08 17:56:36.490000+00:00,0.47483,0.249486,0.543255,0.249486,...,True,0.2,0.0001,1000,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...,mcanizo,LOCAL,"[{""run_id"": ""d4c7e5ac674643e8bd37d69353109ace""...",sklearn.linear_model._coordinate_descent.Elast...,ElasticNet


we can also obtain only the more interesting columns for our search

In [11]:
runs_metadata = mlflow.search_runs([experiment_id], order_by=["metrics.mae DESC"])
runs_metadata[['run_id', 'status', 'metrics.mae', 'metrics.rmse', 'metrics.r2']]

Unnamed: 0,run_id,status,metrics.training_mse,metrics.training_r2_score
0,c823e563d0c84de78b140f5fafc676cf,FINISHED,0.606808,0.040883
1,49e52e988cd1436eb48d99104ae0d407,FINISHED,0.574932,0.091266
2,1f5ab48a5cea4118ba8538268c5eb283,FINISHED,0.553625,0.124943
3,dc299a55a6c74c9085fcdd9ca122b9d1,FINISHED,0.549027,0.132211
4,e458c99ffadb4cb9a07b9274c2be7422,FINISHED,0.475265,0.248799
5,d4c7e5ac674643e8bd37d69353109ace,FINISHED,0.47483,0.249486


If we have too many runs, we can filter them

In [13]:
# Search the experiment_id using a filter_string with tag
# that has a case insensitive pattern
filter_string = "metrics.mae > 0.55"
runs_metadata = mlflow.search_runs([experiment_id], filter_string=filter_string)
runs_metadata[['run_id', 'artifact_uri','status', 'metrics.mae', 'metrics.r2']]

Unnamed: 0,run_id,artifact_uri,status,metrics.training_mse,metrics.training_r2_score
0,49e52e988cd1436eb48d99104ae0d407,s3://mlflow-bucket/mlflow/1/49e52e988cd1436eb4...,FINISHED,0.574932,0.091266
1,c823e563d0c84de78b140f5fafc676cf,s3://mlflow-bucket/mlflow/1/c823e563d0c84de78b...,FINISHED,0.606808,0.040883
2,1f5ab48a5cea4118ba8538268c5eb283,s3://mlflow-bucket/mlflow/1/1f5ab48a5cea4118ba...,FINISHED,0.553625,0.124943


In [21]:
# best_artifact_uri = runs_metadata.sort_values(by='artifact_uri', ascending=False)
best_artifact_uri = runs_metadata.sort_values(by='metrics.mae', ascending=False)['artifact_uri'].values[0]
best_artifact_uri

's3://mlflow-bucket/mlflow/1/c823e563d0c84de78b140f5fafc676cf/artifacts'

In [22]:
mlflow.register_model(best_artifact_uri, 'ElasticNetWineModel')

Successfully registered model 'Prueba'.
2020/12/09 22:11:11 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: Prueba, version 1
Created version '1' of model 'Prueba'.


<ModelVersion: creation_timestamp=1607548271216, current_stage='None', description='', last_updated_timestamp=1607548271216, name='Prueba', run_id='', run_link='', source='s3://mlflow-bucket/mlflow/1/c823e563d0c84de78b140f5fafc676cf/artifacts', status='READY', status_message='', tags={}, user_id='', version='1'>