Tutorial-1: Training on Azure Databricks while tracking experiments and models in Azure ML

This Tutorial shows how to do training of models in Azure Databricks while doing all the tracking of experiments in Azure ML (instead of in the MLflow instance running on Azure Databricks). This will also allow you to seemessly deploy models to Azure ML deployment targets in the easiest way.

Mount the training data from azure blob store to dbfs : /dbfs/mnt/training-data/diabetes-training

In [None]:
configs = {"fs.azure.account.auth.type": "OAuth",
       "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",                      
       "fs.azure.account.oauth2.client.id": "<client-id>",
       "fs.azure.account.oauth2.client.secret": "<client-secert>",
       "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenate-id>/oauth2/token",
       "fs.azure.createRemoteFileSystemDuringInitialization": "true"}

dbutils.fs.mount(
source = "abfss://container@storage.dfs.core.windows.net/",
mount_point = "/mnt/training-data",
extra_configs = configs)

Run next cell to install latest version of library

In [None]:
%pip install azureml-mlflow
%pip install azure-ai-ml
%pip install mlflow

Configure the following variables

In [None]:
aml_region = ""
subscription_id = ""
aml_resource_group = ""
aml_workspace_name = ""

In [None]:
azureml_mlflow_uri=f"azureml://{aml_region}.api.azureml.ms/mlflow/v1.0/subscriptions/{subscription_id}/resourceGroups/{aml_resource_group}/providers/Microsoft.MachineLearningServices/workspaces/{aml_workspace_name}"
print(azureml_mlflow_uri)

Training on Azure Databricks while tracking experiments and models in Azure ML

In [None]:
import warnings

warnings.simplefilter("ignore")

Tracking of experiments will happen in Azure ML and hence we need to use the naming convention we generally use with MLflow.

In [None]:
import mlflow
mlflow.set_tracking_uri(azureml_mlflow_uri)

Configuring the experiment

In [None]:
mlflow.set_experiment(experiment_name="diabetes-prediction")

In [None]:
import pandas as pd
df_diabetes = pd.read_csv('/dbfs/mnt/training-data/*.csv')
df_diabetes.head()

Training a diabetes prediction regression model

In [None]:
from sklearn.model_selection import train_test_split
# Separate features and labels
X, y = df_diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, df_diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)


We are going to use autologging capabilities in MLflow to track parameters and metrics.

In [None]:
mlflow.autolog() #enable logging for sklearn models

Create a model & train it

In [None]:
from sklearn.linear_model import LogisticRegression
import numpy as np
from sklearn.metrics import roc_auc_score

# Set regularization hyperparameter
reg = 0.01

with mlflow.start_run() as run:
    model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)
    # calculate accuracy
    y_hat = model.predict(X_test)
    acc = np.average(y_hat == y_test)
    # calculate AUC
    y_scores = model.predict_proba(X_test)
    auc = roc_auc_score(y_test,y_scores[:,1])
    print("Accuracy: %.2f%%" % (acc * 100.0))
    print("AUC: %.2f%%" % (auc * 100.0))

Registering the model in Azure ML
Since our experiments are being tracked in Azure ML, we can simply register models in the registry like this:

In [None]:
mlflow.register_model(
    model_uri=f"runs:/{run.info.run_id}/model", name="diabetes-prediction"
)

Summary:

In this tutorial we learned how we can us mlflow for AML* ADB integration. We train a model in databricks & track it from Azure ML.
You'll find your model in Azure ML workspace. You can check Job & model UI configuration for more details.