Tutorial: Training models in Azure Databricks and deploying them on Azure ML

This notebook demostrates how to train models in Azure Databricks (or any Databricks implementation) and deploying those models on Azure ML.

Training and tracking experiments in Azure Databricks with Model Registries in Azure ML: This example shows how to do training and tracking of models in Azure Databricks. Tracking of experiments happens here in the MLflow instance running on Azure Databricks. However, model registries are kept on Azure ML to allow quick model's deployment from a centralized location and registry of models.

Mount the training data from azure blob store to dbfs : /dbfs/mnt/training-data/diabetes-training

In [None]:
configs = {"fs.azure.account.auth.type": "OAuth",
       "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",                      
       "fs.azure.account.oauth2.client.id": "<client-id>",
       "fs.azure.account.oauth2.client.secret": "<client-secert>",
       "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenate-id>/oauth2/token",
       "fs.azure.createRemoteFileSystemDuringInitialization": "true"}

dbutils.fs.mount(
source = "abfss://container@storage.dfs.core.windows.net/",
mount_point = "/mnt/training-data",
extra_configs = configs)

Run next cell to install latest version of library

In [None]:
%pip install azureml-mlflow
%pip install azure-ai-ml
%pip install mlflow

Configure the following variables

In [None]:
aml_region = ""
subscription_id = ""
aml_resource_group = ""
aml_workspace_name = ""
adb_user_id = ""

In [None]:
azureml_mlflow_uri=f"azureml://{aml_region}.api.azureml.ms/mlflow/v1.0/subscriptions/{subscription_id}/resourceGroups/{aml_resource_group}/providers/Microsoft.MachineLearningServices/workspaces/{aml_workspace_name}"
print(azureml_mlflow_uri)

In some cases we may want to keep doing tracking of experiments in the MLflow instance that comes with Azure Databricks. This is the case for instance of customers that were already using MLflow in Azure Databricks so they want to keep they existing experiments there. However, they may want to take adavantage of the deployment capabilities of Azure ML including managed inference solutions, no-code deployments, etc.

In [None]:
import warnings

warnings.simplefilter("ignore")

Configuring models' registry

MLflow allows us to segregate the instance where experiments are being tracked from the instance where models' are being tracked (or registered). The first tutorial is referred to Tracking URI while the second one is referred as Registry URI. By default, both of them are set to the same value, and in Azure Databricks, both of them are set to "databricks" meaning that tracking and model registries will happen inside of the MLflow instance that Databricks runs for us.

We are going to track the experiments in Azure Databricks, but model registries will be held in Azure ML. This will allow us to manage the model's lifecycle - including deployments - in Azure ML.

In [None]:
import mlflow
mlflow.set_registry_uri(azureml_mlflow_uri)

Configuring the experiment

Tracking of experiments will happen in Azure Datbricks and hence we need to use the naming we use here.

In [None]:
mlflow.set_experiment(experiment_name=f"/Users/{adb_user_id}/diabetes-prediction-databricks")

Exploring the data

In [None]:
import pandas as pd
df_diabetes = pd.read_csv('/dbfs/mnt/training/*.csv')
df_diabetes.head()

Training a diabetes prediction regression model

In [None]:
from sklearn.model_selection import train_test_split
# Separate features and labels
X, y = df_diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df_diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)


We are going to use autologging capabilities in MLflow to track parameters and metrics.

In [None]:
mlflow.autolog() #enable logging for sklearn models

Create a model & train it

In [None]:
from sklearn.linear_model import LogisticRegression
import numpy as np
from sklearn.metrics import roc_auc_score

# Set regularization hyperparameter
reg = 0.01

with mlflow.start_run() as run:
    model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)
    # calculate accuracy
    y_hat = model.predict(X_test)
    acc = np.average(y_hat == y_test)
    # calculate AUC
    y_scores = model.predict_proba(X_test)
    auc = roc_auc_score(y_test,y_scores[:,1])
    print("Accuracy: %.2f%%" % (acc * 100.0))
    print("AUC: %.2f%%" % (auc * 100.0))

Registering the model in Azure ML

So far, our model is trained and tracked inside of the MLflow instance in Azure Databricks. Now we want to register this model in Azure ML to manage the life cicle there. However, if we try to register the model as we usually do using the sintax mlflow.register_model(model_uri=f"runs:/{run.info.run_id}/model"). you will found an error. The reason why this is happening is related to where runs are being stored.

Right now runs are being stored in Azure Databricks and models in Azure ML. If you try to create a registered model from a Run, Azure ML don't have any way to guess how to get access to the runs, that are stored in a different service. because of that, you can't use runs:/ URI for registering models.

To overcome this limitation, we have to register the model from the artifacts themselfs, which we can achieve by first downloading them.

In [None]:
client = mlflow.tracking.MlflowClient()
model_path = client.download_artifacts(run.info.run_id, path="model")

model_path is a local path to the artifacts representing the MLmodel created. We can use this artifacts to register the model now:

In [None]:
mlflow.register_model(
    model_uri=f"file://{model_path}", name="databricks-diabetes-prediction"
)

Summary:

In this tutorial we leaned how we can us mlflow for AML* ADB integration. We train and track a model in databricks & use Azure ML model registery to register the model .
You'll find your model in Azure ML workspace. You can check Job & model UI configuration for more details.