In this notebook we will leverage mlflow to keep track of our experiments and models. MLflow Tracking is organized around the concept of runs, which are executions of some piece of data science code. 

In [1]:
from pathlib import Path
import pandas as pd
import tempfile
from joblib import dump

from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score

import mlflow
from azureml.core import Workspace, Experiment, Run, Model

In [2]:
# Read the data from the file
DATA_DIR = Path("data/")
df_train = pd.read_csv(DATA_DIR/'train.csv')
df_test = pd.read_csv(DATA_DIR/'test.csv')

In [3]:
# We define the hyperparameters we want to tune
param_grid = {
    "n_estimators": [10, 25, 100],
    "criterion": ["gini", "entropy"],
    "max_depth": [2, 5, 10, None],
}
n_cross_vals = 5
print("Hyper-parameters:")
for param, value in param_grid.items():
    print(f"gridsearch-{param}", str(value))

Hyper-parameters:
gridsearch-n_estimators [10, 25, 100]
gridsearch-criterion ['gini', 'entropy']
gridsearch-max_depth [2, 5, 10, None]


<b>MLflow</b> stores the details of a run like logs, code versions, data, and model artifacts. The first thing we need to do is tell MLFlow where to store the logs.

We can do this using a MLFlow tracking URI ([docs](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri)). An MLflow tracking uri can be a local directory, database, or mlflow server.

In [4]:
# Check the currect tracking uri.
mlflow.is_tracking_uri_set()

True

The training will be done in this notebook, but the logs will be collected in the clouds.

In AzureML, you can get and set the tracking URI by running:

In [5]:
# Get the AzureML workspace.
workspace = Workspace.from_config()

In [6]:
azure_ws_mlflow_tracking_uri = workspace.get_mlflow_tracking_uri()
print(f"AzureML workspace mlflow uri: {azure_ws_mlflow_tracking_uri}")

AzureML workspace mlflow uri: azureml://westeurope.api.azureml.ms/mlflow/v1.0/subscriptions/8e155238-93f7-4377-9b62-6a2f4e51052e/resourceGroups/prashant-srivastava-sandbox/providers/Microsoft.MachineLearningServices/workspaces/azureml-mlflow?


In [7]:
mlflow.set_tracking_uri(azure_ws_mlflow_tracking_uri)
mlflow.get_tracking_uri()

'azureml://westeurope.api.azureml.ms/mlflow/v1.0/subscriptions/8e155238-93f7-4377-9b62-6a2f4e51052e/resourceGroups/prashant-srivastava-sandbox/providers/Microsoft.MachineLearningServices/workspaces/azureml-mlflow?'

We can now create mlflow runs to record our training experiments.
AzureML allows to create mlflow runs as Jobs. These jobs are create under an Experiment namespace.

In [9]:
# Create AzureML Experiment and use it for this run.
# If you do not create an Experiment, AzureML will create a Default experiment for runs.
experiment = Experiment(workspace, 'DevExp')
mlflow.set_experiment('DevExp')

<Experiment: artifact_location='', creation_time=1678278807682, experiment_id='61468a99-3ed0-4291-89c7-3227fca073ac', last_update_time=None, lifecycle_stage='active', name='DevExp', tags={}>

In [11]:
# Start mlflow run
with mlflow.start_run(
    run_name=f"MyTraining"
) as run:
    run_id = run.info.run_id
    print(f"Starting mlflow run with id {run_id}")
    for param, value in param_grid.items():
        mlflow.log_param(f"gridsearch/{param}", str(value))

    model = RandomForestClassifier()
    grid_search = GridSearchCV(model, param_grid, cv=n_cross_vals, n_jobs=-1)

    # We train the model
    print(f"Fitting the model...")
    grid_search.fit(df_train[["x1", "x2"]], df_train["y"])
    model = grid_search.best_estimator_
    
    # Here we evaluate the model
    predictions = model.predict(df_test[["x1", "x2"]])
    test_accuracy = accuracy_score(df_test["y"], predictions)
    print(f"Done! Test accuracy: {test_accuracy}")

    # We log the accuracy to azureml using mlflow
    # You can see the logged metrics in the azureml UI under the "Metrics" tab
    print("Logging to mlflow.")
    mlflow.log_metric("test_accuracy", test_accuracy)

    # We log the selected hyper-parameters to azureml using mlflow
    # You can find the best hyper-parameters in the azureml UI under parameters.
    for k, v in grid_search.best_params_.items():
        mlflow.log_param(f"selected-{k}", v)

    # Export the model and log it to azureml using mlflow
    with tempfile.TemporaryDirectory() as tmp_dir:
        dump(model, f"{tmp_dir}/model.joblib")
        mlflow.log_artifact(f"{tmp_dir}/model.joblib")

Starting mlflow run with id d66d0956-e52e-4e58-9c5c-1318db9a0c0d
Fitting the model...
Done! Test accuracy: 0.975
Logging to mlflow.


In [12]:
run = Run.get(workspace=workspace, run_id=run_id)
run

Experiment,Id,Type,Status,Details Page,Docs Page
DevExp,d66d0956-e52e-4e58-9c5c-1318db9a0c0d,,Completed,Link to Azure Machine Learning studio,Link to Documentation


We can 

In [14]:
artifact_path = "model.joblib"
model_uri = f"runs:/{run.id}/{artifact_path}"

mlflow.register_model(model_uri=model_uri, name='DevModel')

Successfully registered model 'DevModel'.
2023/03/08 12:34:23 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: DevModel, version 1
Created version '1' of model 'DevModel'.


<ModelVersion: creation_timestamp=1678278863844, current_stage='None', description='', last_updated_timestamp=1678278863844, name='DevModel', run_id='d66d0956-e52e-4e58-9c5c-1318db9a0c0d', run_link='', source='azureml://experiments/DevExp/runs/d66d0956-e52e-4e58-9c5c-1318db9a0c0d/artifacts/model.joblib', status='READY', status_message='', tags={}, user_id='', version='1'>

In [15]:
Model.list(workspace=workspace, name='DevModel')

[Model(workspace=Workspace.create(name='azureml-mlflow', subscription_id='8e155238-93f7-4377-9b62-6a2f4e51052e', resource_group='prashant-srivastava-sandbox'), name=DevModel, id=DevModel:1, version=1, tags={}, properties={'azureml.artifactPrefix': 'ExperimentRun/dcid.d66d0956-e52e-4e58-9c5c-1318db9a0c0d/model.joblib', 'mlflow.modelSourceUri': 'azureml://experiments/DevExp/runs/d66d0956-e52e-4e58-9c5c-1318db9a0c0d/artifacts/model.joblib'})]