# Registering a model into MLflow registry

This notebook contains the solution of the exercise where the best model of a concrete experiment must be registered in the MLflow Registry.

The first step is to import all the required libraries

In [2]:
import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.linear_model import ElasticNet
from urllib.parse import urlparse
import mlflow
import mlflow.sklearn
import boto3

import logging

logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)

warnings.filterwarnings("ignore")
np.random.seed(40)

Before registering a model in the MLflow Registry, it is recommended to search for the best model within all the runs and only register that one instead of registering every model.

Once we have performed all the runs in an experiment, we can access the metadata of the experiment using its ID and the method ``search_runs`` provided by MLflow. Check the [documentation](https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.search_runs) for further details.

Go to the MLflow UI and get the experiment ID of your choice.

In [8]:
# Search all runs in experiment_id
experiment_id = 4  # in this case experiment_id 4 = 'iris_gridsearch'
mlflow.search_runs([experiment_id])

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.recall,metrics.f1,metrics.accuracy,params.kernel,params.C,tags.mlflow.user,tags.mlflow.log-model.history,tags.mlflow.source.type,tags.mlflow.source.name
0,37f94a7a414c4ac983c12a9a1aa1613c,4,FINISHED,s3://mlflow-bucket/mlflow/4/37f94a7a414c4ac983...,2020-12-08 20:11:38.649000+00:00,2020-12-08 20:11:38.979000+00:00,0.947368,0.947976,0.947368,rbf,1.0,mcanizo,"[{""run_id"": ""37f94a7a414c4ac983c12a9a1aa1613c""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
1,e9d212f6c6334232ba87d64284e2167c,4,FINISHED,s3://mlflow-bucket/mlflow/4/e9d212f6c6334232ba...,2020-12-08 20:11:38.357000+00:00,2020-12-08 20:11:38.612000+00:00,0.947368,0.947976,0.947368,rbf,0.8,mcanizo,"[{""run_id"": ""e9d212f6c6334232ba87d64284e2167c""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
2,e2d04069c2c6416283f05036acc35234,4,FINISHED,s3://mlflow-bucket/mlflow/4/e2d04069c2c6416283...,2020-12-08 20:11:38.055000+00:00,2020-12-08 20:11:38.316000+00:00,0.921053,0.921955,0.921053,rbf,0.6,mcanizo,"[{""run_id"": ""e2d04069c2c6416283f05036acc35234""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
3,de1803bac63f44e3a0ac242629218eed,4,FINISHED,s3://mlflow-bucket/mlflow/4/de1803bac63f44e3a0...,2020-12-08 20:11:37.771000+00:00,2020-12-08 20:11:38.027000+00:00,0.947368,0.947976,0.947368,rbf,0.4,mcanizo,"[{""run_id"": ""de1803bac63f44e3a0ac242629218eed""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
4,6478189caa474d2faf6d45b0556a54ba,4,FINISHED,s3://mlflow-bucket/mlflow/4/6478189caa474d2faf...,2020-12-08 20:11:37.475000+00:00,2020-12-08 20:11:37.730000+00:00,0.894737,0.895534,0.894737,rbf,0.2,mcanizo,"[{""run_id"": ""6478189caa474d2faf6d45b0556a54ba""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
5,70a0a00e67c74d17b8e0a4a58572d103,4,FINISHED,s3://mlflow-bucket/mlflow/4/70a0a00e67c74d17b8...,2020-12-08 20:11:37.132000+00:00,2020-12-08 20:11:37.433000+00:00,0.973684,0.973364,0.973684,poly,1.0,mcanizo,"[{""run_id"": ""70a0a00e67c74d17b8e0a4a58572d103""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
6,ca3ada1abe9d46dd850eb30ffce94de2,4,FINISHED,s3://mlflow-bucket/mlflow/4/ca3ada1abe9d46dd85...,2020-12-08 20:11:36.808000+00:00,2020-12-08 20:11:37.108000+00:00,0.973684,0.973364,0.973684,poly,0.8,mcanizo,"[{""run_id"": ""ca3ada1abe9d46dd850eb30ffce94de2""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
7,3e5ca9dfbd534ce5abb79d0b8c31a836,4,FINISHED,s3://mlflow-bucket/mlflow/4/3e5ca9dfbd534ce5ab...,2020-12-08 20:11:36.521000+00:00,2020-12-08 20:11:36.764000+00:00,0.973684,0.973364,0.973684,poly,0.6,mcanizo,"[{""run_id"": ""3e5ca9dfbd534ce5abb79d0b8c31a836""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
8,9c1c718524784d1882337358dd5b03a0,4,FINISHED,s3://mlflow-bucket/mlflow/4/9c1c718524784d1882...,2020-12-08 20:11:35.948000+00:00,2020-12-08 20:11:36.500000+00:00,1.0,1.0,1.0,poly,0.4,mcanizo,"[{""run_id"": ""9c1c718524784d1882337358dd5b03a0""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
9,e9d93fea98d945528e2b5c3337010785,4,FINISHED,s3://mlflow-bucket/mlflow/4/e9d93fea98d945528e...,2020-12-08 20:11:35.321000+00:00,2020-12-08 20:11:35.926000+00:00,0.947368,0.947976,0.947368,poly,0.2,mcanizo,"[{""run_id"": ""e9d93fea98d945528e2b5c3337010785""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...


Sort the metadata by the column corresponding to the metric of your choice. Select the one that is the most relevant to select the best model

Hint: you can use the MLflow UI and the diferent plots provided when comparing multiple runs

In [10]:
mlflow.search_runs([experiment_id], order_by=["metrics.accuracy"])

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.recall,metrics.f1,metrics.accuracy,params.kernel,params.C,tags.mlflow.user,tags.mlflow.log-model.history,tags.mlflow.source.type,tags.mlflow.source.name
0,6478189caa474d2faf6d45b0556a54ba,4,FINISHED,s3://mlflow-bucket/mlflow/4/6478189caa474d2faf...,2020-12-08 20:11:37.475000+00:00,2020-12-08 20:11:37.730000+00:00,0.894737,0.895534,0.894737,rbf,0.2,mcanizo,"[{""run_id"": ""6478189caa474d2faf6d45b0556a54ba""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
1,e2d04069c2c6416283f05036acc35234,4,FINISHED,s3://mlflow-bucket/mlflow/4/e2d04069c2c6416283...,2020-12-08 20:11:38.055000+00:00,2020-12-08 20:11:38.316000+00:00,0.921053,0.921955,0.921053,rbf,0.6,mcanizo,"[{""run_id"": ""e2d04069c2c6416283f05036acc35234""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
2,37f94a7a414c4ac983c12a9a1aa1613c,4,FINISHED,s3://mlflow-bucket/mlflow/4/37f94a7a414c4ac983...,2020-12-08 20:11:38.649000+00:00,2020-12-08 20:11:38.979000+00:00,0.947368,0.947976,0.947368,rbf,1.0,mcanizo,"[{""run_id"": ""37f94a7a414c4ac983c12a9a1aa1613c""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
3,e9d212f6c6334232ba87d64284e2167c,4,FINISHED,s3://mlflow-bucket/mlflow/4/e9d212f6c6334232ba...,2020-12-08 20:11:38.357000+00:00,2020-12-08 20:11:38.612000+00:00,0.947368,0.947976,0.947368,rbf,0.8,mcanizo,"[{""run_id"": ""e9d212f6c6334232ba87d64284e2167c""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
4,de1803bac63f44e3a0ac242629218eed,4,FINISHED,s3://mlflow-bucket/mlflow/4/de1803bac63f44e3a0...,2020-12-08 20:11:37.771000+00:00,2020-12-08 20:11:38.027000+00:00,0.947368,0.947976,0.947368,rbf,0.4,mcanizo,"[{""run_id"": ""de1803bac63f44e3a0ac242629218eed""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
5,e9d93fea98d945528e2b5c3337010785,4,FINISHED,s3://mlflow-bucket/mlflow/4/e9d93fea98d945528e...,2020-12-08 20:11:35.321000+00:00,2020-12-08 20:11:35.926000+00:00,0.947368,0.947976,0.947368,poly,0.2,mcanizo,"[{""run_id"": ""e9d93fea98d945528e2b5c3337010785""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
6,70a0a00e67c74d17b8e0a4a58572d103,4,FINISHED,s3://mlflow-bucket/mlflow/4/70a0a00e67c74d17b8...,2020-12-08 20:11:37.132000+00:00,2020-12-08 20:11:37.433000+00:00,0.973684,0.973364,0.973684,poly,1.0,mcanizo,"[{""run_id"": ""70a0a00e67c74d17b8e0a4a58572d103""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
7,ca3ada1abe9d46dd850eb30ffce94de2,4,FINISHED,s3://mlflow-bucket/mlflow/4/ca3ada1abe9d46dd85...,2020-12-08 20:11:36.808000+00:00,2020-12-08 20:11:37.108000+00:00,0.973684,0.973364,0.973684,poly,0.8,mcanizo,"[{""run_id"": ""ca3ada1abe9d46dd850eb30ffce94de2""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
8,3e5ca9dfbd534ce5abb79d0b8c31a836,4,FINISHED,s3://mlflow-bucket/mlflow/4/3e5ca9dfbd534ce5ab...,2020-12-08 20:11:36.521000+00:00,2020-12-08 20:11:36.764000+00:00,0.973684,0.973364,0.973684,poly,0.6,mcanizo,"[{""run_id"": ""3e5ca9dfbd534ce5abb79d0b8c31a836""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...
9,531757563d794e63953082b255839f24,4,FINISHED,s3://mlflow-bucket/mlflow/4/531757563d794e6395...,2020-12-08 20:11:34.933000+00:00,2020-12-08 20:11:35.287000+00:00,0.973684,0.973364,0.973684,linear,1.0,mcanizo,"[{""run_id"": ""531757563d794e63953082b255839f24""...",LOCAL,/home/mcanizo/anaconda3/envs/mlflowEnv/lib/pyt...


Select only the most interesting columns to have a cleaner view of th emetadata

In [13]:
runs_metadata = mlflow.search_runs([experiment_id], order_by=["metrics.accuracy DESC"])
runs_metadata[['run_id', 'artifact_uri', 'status', 'metrics.accuracy', 'metrics.recall', 'metrics.f1']]

Unnamed: 0,run_id,artifact_uri,status,metrics.accuracy,metrics.recall,metrics.f1
0,9c1c718524784d1882337358dd5b03a0,s3://mlflow-bucket/mlflow/4/9c1c718524784d1882...,FINISHED,1.0,1.0,1.0
1,9726da2c7fbd4466acef5fba09cfda95,s3://mlflow-bucket/mlflow/4/9726da2c7fbd4466ac...,FINISHED,1.0,1.0,1.0
2,70a0a00e67c74d17b8e0a4a58572d103,s3://mlflow-bucket/mlflow/4/70a0a00e67c74d17b8...,FINISHED,0.973684,0.973684,0.973364
3,ca3ada1abe9d46dd850eb30ffce94de2,s3://mlflow-bucket/mlflow/4/ca3ada1abe9d46dd85...,FINISHED,0.973684,0.973684,0.973364
4,3e5ca9dfbd534ce5abb79d0b8c31a836,s3://mlflow-bucket/mlflow/4/3e5ca9dfbd534ce5ab...,FINISHED,0.973684,0.973684,0.973364
5,531757563d794e63953082b255839f24,s3://mlflow-bucket/mlflow/4/531757563d794e6395...,FINISHED,0.973684,0.973684,0.973364
6,7426c3cca6014f9ebbfe8b4b86467ac7,s3://mlflow-bucket/mlflow/4/7426c3cca6014f9ebb...,FINISHED,0.973684,0.973684,0.973889
7,dba8573569ef44a8942c9105d3f5bf1a,s3://mlflow-bucket/mlflow/4/dba8573569ef44a894...,FINISHED,0.973684,0.973684,0.973889
8,58d04d8e510142aab25e115dd08f28cb,s3://mlflow-bucket/mlflow/4/58d04d8e510142aab2...,FINISHED,0.973684,0.973684,0.973889
9,37f94a7a414c4ac983c12a9a1aa1613c,s3://mlflow-bucket/mlflow/4/37f94a7a414c4ac983...,FINISHED,0.947368,0.947368,0.947976


Filter runs that do not match a given condition

In [14]:
# Search the experiment_id using a filter_string with tag
# that has a case insensitive pattern
filter_string = "metrics.accuracy > 0.96"
runs_metadata = mlflow.search_runs([experiment_id], filter_string=filter_string)
runs_metadata[['run_id', 'artifact_uri', 'status', 'metrics.accuracy', 'metrics.recall', 'metrics.f1']]

Unnamed: 0,run_id,artifact_uri,status,metrics.accuracy,metrics.recall,metrics.f1
0,70a0a00e67c74d17b8e0a4a58572d103,s3://mlflow-bucket/mlflow/4/70a0a00e67c74d17b8...,FINISHED,0.973684,0.973684,0.973364
1,ca3ada1abe9d46dd850eb30ffce94de2,s3://mlflow-bucket/mlflow/4/ca3ada1abe9d46dd85...,FINISHED,0.973684,0.973684,0.973364
2,3e5ca9dfbd534ce5abb79d0b8c31a836,s3://mlflow-bucket/mlflow/4/3e5ca9dfbd534ce5ab...,FINISHED,0.973684,0.973684,0.973364
3,9c1c718524784d1882337358dd5b03a0,s3://mlflow-bucket/mlflow/4/9c1c718524784d1882...,FINISHED,1.0,1.0,1.0
4,531757563d794e63953082b255839f24,s3://mlflow-bucket/mlflow/4/531757563d794e6395...,FINISHED,0.973684,0.973684,0.973364
5,7426c3cca6014f9ebbfe8b4b86467ac7,s3://mlflow-bucket/mlflow/4/7426c3cca6014f9ebb...,FINISHED,0.973684,0.973684,0.973889
6,dba8573569ef44a8942c9105d3f5bf1a,s3://mlflow-bucket/mlflow/4/dba8573569ef44a894...,FINISHED,0.973684,0.973684,0.973889
7,9726da2c7fbd4466acef5fba09cfda95,s3://mlflow-bucket/mlflow/4/9726da2c7fbd4466ac...,FINISHED,1.0,1.0,1.0
8,58d04d8e510142aab25e115dd08f28cb,s3://mlflow-bucket/mlflow/4/58d04d8e510142aab2...,FINISHED,0.973684,0.973684,0.973889


Get the artifact URI of the best model

In [16]:
# best_artifact_uri = runs_metadata.sort_values(by='artifact_uri', ascending=False)
best_artifact_uri = runs_metadata.sort_values(by='metrics.accuracy', ascending=False)['artifact_uri'].values[0]
best_artifact_uri

's3://mlflow-bucket/mlflow/4/9c1c718524784d1882337358dd5b03a0/artifacts'

Register the best model into the MLflow registry

In [17]:
mlflow.register_model(best_artifact_uri, 'IrisSVMModel')

Successfully registered model 'IrisSVMModel'.
2020/12/09 22:58:45 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: IrisSVMModel, version 1
Created version '1' of model 'IrisSVMModel'.


<ModelVersion: creation_timestamp=1607551125134, current_stage='None', description='', last_updated_timestamp=1607551125134, name='IrisSVMModel', run_id='', run_link='', source='s3://mlflow-bucket/mlflow/4/9c1c718524784d1882337358dd5b03a0/artifacts', status='READY', status_message='', tags={}, user_id='', version='1'>