# Hands-on example (Azure ML)

## Use MLflow with Azure Machine Learning to Train and Deploy An ML Model

This example shows you how to use MLflow together with Azure Machine Learning services for tracking the metrics and artifacts while training a regression model to predict wine quality using an ElasticNet model and deploy the model as a web service. You'll learn how to:

 1. Set up MLflow tracking URI so as to use Azure ML
 2. Create experiment
 3. Instrument your model with MLflow tracking
 4. Train a regression model locally with MLflow
 5. View your experiment within your Azure ML Workspace in Azure Portal
 6. Deploy the model as a web service on Azure Container Instance
 7. Call the model to make predictions
 
### Pre-requisites
 
If you are using a Notebook VM, you are all set. Otherwise, go through the [Configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met.

Install azureml-mlflow package using ```pip install azureml-mlflow```. Note that azureml-mlflow installs mlflow package itself as a dependency if you haven't done so previously.

### Set-up

Import packages and check versions of Azure ML SDK and MLflow installed on your computer. Then connect to your Workspace.

In [69]:
import os
import warnings
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
import json

#Sklearn
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

#Azure
import azureml.core
import azureml.dataprep
from azureml.core import Workspace, Dataset, Experiment
from azureml.core.webservice import AciWebservice, Webservice
from azure.storage.blob import BlobServiceClient
from azureml.core.authentication import ServicePrincipalAuthentication

#MLFlow
import mlflow
import mlflow.sklearn
import mlflow.azureml
from mlflow.entities import ViewType

#Temporarily filter all warning for demo
warnings.filterwarnings("ignore")

In [4]:
# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.15.0


### Update Working Directory

In [5]:
os.getcwd()

'/Users/antfra/Desktop/MLFlow In Azure/notebooks'

In [6]:
#get path to parent folder (root folder)
parent_folder=os.path.dirname(os.getcwd())
parent_folder

'/Users/antfra/Desktop/MLFlow In Azure'

In [7]:
#set system path to parent folder
sys.path.insert(0,parent_folder)

### Set Up Authentication

In [8]:
#get environment variables
from dotenv import load_dotenv
from settings import (AZURE_SUBSCRIPTION_ID,AZURE_RESOURCE_GROUP, 
                     AZURE_TENANT_ID)

In [37]:
subscription_id = AZURE_SUBSCRIPTION_ID
resource_group = AZURE_RESOURCE_GROUP
workspace_name = 'mlflow_tutorial' #previous existing azureml workspace for this tutorial

## 0. The data

* The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
* P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
* Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

#### Get workspace object from azureml sdk

In [38]:
##Use this if inside AML Studio
# ws = Workspace.from_config()
ws = Workspace(subscription_id, resource_group, workspace_name)

dataset = Dataset.get_by_name(ws, name='wine-quality')
dataset.to_pandas_dataframe().head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


## 1. Set Up Tracking Server

### Set Tracking URI to Azure ML Workspace

#### Azure Tracking URI

In [12]:
ws.get_mlflow_tracking_uri()

'azureml://eastus.experiments.azureml.net/mlflow/v1.0/subscriptions/e1eb783d-78a5-42a4-bae3-bc0ddd433898/resourceGroups/robotdemo/providers/Microsoft.MachineLearningServices/workspaces/mlflow_tutorial?'

#### Local Tracking URI

In [11]:
mlflow.tracking.get_tracking_uri()

'file:///Users/antfra/Desktop/MLFlow%20In%20Azure/notebooks/mlruns'

#### Point MLflow To Azure URI

In [13]:
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())

## 2. Create Experiment

#### Create Experiment

In [48]:
exp_name = "mlflow_aml_spark"

In [49]:
exp_id=mlflow.create_experiment(exp_name)

In [50]:
mlflow.set_experiment(exp_name)

#### List experiments

In [51]:
mlflow.tracking.MlflowClient().list_experiments()

[<Experiment: artifact_location='', experiment_id='fe3ff3a4-0fa3-4075-bd4f-8352d3b501d9', lifecycle_stage='active', name='ElasticNet_wine_AML', tags={}>,
 <Experiment: artifact_location='', experiment_id='fc950002-c0b7-4ca0-b1dc-d1685a7ab854', lifecycle_stage='active', name='My Simple New Experiment AML', tags={}>,
 <Experiment: artifact_location='', experiment_id='7fa3ecfb-f601-46c0-a54b-90e9784e6785', lifecycle_stage='active', name='My Simple New Experiment AML - SPARK', tags={}>,
 <Experiment: artifact_location='', experiment_id='9dfeb90d-68ad-4e4c-8eb8-f996428ae994', lifecycle_stage='active', name='My New Experiment AML - SPARK', tags={}>,
 <Experiment: artifact_location='', experiment_id='a3f991b6-5c32-4f14-b448-6f695fd25c76', lifecycle_stage='active', name='mlflow_aml_spark', tags={}>]

## 3. Instrument Model

### What do we track?

- **Code Version**: Git commit hash used for the run (if it was run from an MLflow Project)
- **Start & End Time**: Start and end time of the run
- **Source**: what code run?
- **Parameters**: Key-value input parameters.
- **Metrics**: Key-value metrics, where the value is numeric (can be updated over the run)
- **Artifacts**: Output files in any format.

In [52]:
#Evaluation Metrics to Assess and compare the model
def eval_metrics(actual, pred):
    # compute relevant metrics
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

#Load Data and partition
def load_data(dataset):
    data = dataset.to_pandas_dataframe()

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]
    return train_x, train_y, test_x, test_y


#Train the model and include tracking metrics
def train(alpha=0.5, l1_ratio=0.5,full_view=False):
    # train a model with given parameters
    warnings.filterwarnings("ignore")
    np.random.seed(40)

    # Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
    #data_path = "data/wine-quality.csv"
    train_x, train_y, test_x, test_y = load_data(dataset)

    # Useful for multiple runs (only doing one run in this sample notebook)    
    with mlflow.start_run(experiment_id=exp_id) as run:
        # Execute ElasticNet
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        # Evaluate Metrics
        predicted_qualities = lr.predict(test_x)
        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        # Print out metrics
        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        # Log parameter, metrics, and model to MLflow
        mlflow.log_param(key="alpha", value=alpha)
        mlflow.log_param(key="l1_ratio", value=l1_ratio)
        mlflow.log_metric(key="rmse", value=rmse)
        mlflow.log_metrics({"mae": mae, "r2": r2})
        
        #Log model
        mlflow.sklearn.log_model(lr, "model")
        
        #print artifact uri
        print("Save to: {}".format(mlflow.get_artifact_uri()))
        
        #print run_id
        print(f"RunID: {run.info.run_uuid}")
        
        #print experiment id
        print(f"Experiment ID: {run.info.experiment_id}")
        
        if full_view:
            print("Run IDs: \n{}".format(mlflow.search_runs(ViewType.ACTIVE_ONLY)))
        else:
            pass
        
                

## 4. Train Model Locally

In [53]:
train(0.5, 0.5)

Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.82224284975954
  MAE: 0.6278761410160693
  R2: 0.12678721972772689




Save to: azureml://experiments/mlflow_aml_spark/runs/9df06b86-7311-41d2-922b-7953ea316417/artifacts
RunID: 9df06b86-7311-41d2-922b-7953ea316417
Experiment ID: a3f991b6-5c32-4f14-b448-6f695fd25c76


In [54]:
train(0.2, 0.2)

Elasticnet model (alpha=0.200000, l1_ratio=0.200000):
  RMSE: 0.7859129997062341
  MAE: 0.6155290394093893
  R2: 0.20224631822892103




Save to: azureml://experiments/mlflow_aml_spark/runs/dce56ea7-ef4f-49fe-acad-c4ecc1f7eb15/artifacts
RunID: dce56ea7-ef4f-49fe-acad-c4ecc1f7eb15
Experiment ID: a3f991b6-5c32-4f14-b448-6f695fd25c76


In [55]:
train(0.1, 0.1,full_view=True)

Elasticnet model (alpha=0.100000, l1_ratio=0.100000):
  RMSE: 0.7792546522251949
  MAE: 0.6112547988118587
  R2: 0.2157063843066196




Save to: azureml://experiments/mlflow_aml_spark/runs/5a96ee4f-cef7-4503-8e09-4a6f2adf4de9/artifacts
RunID: 5a96ee4f-cef7-4503-8e09-4a6f2adf4de9
Experiment ID: a3f991b6-5c32-4f14-b448-6f695fd25c76
Run IDs: 
Empty DataFrame
Columns: [run_id, experiment_id, status, artifact_uri, start_time, end_time]
Index: []


## 5. Comparing Runs in UI (Azure Portal)
Connect to Azure Portal to view experiments and run metrics.

In [78]:
#Register the experiment to AML
exp = Experiment(ws, exp_name)
#can use for future remote runs

In [60]:
print(exp_name)
print(ws.experiments)

mlflow_aml_spark
{'ElasticNet_wine_AML': Experiment(Name: ElasticNet_wine_AML,
Workspace: mlflow_tutorial), 'mlflow_aml_spark': Experiment(Name: mlflow_aml_spark,
Workspace: mlflow_tutorial), 'test_exp_name': Experiment(Name: test_exp_name,
Workspace: mlflow_tutorial)}


In [58]:
ws.experiments['mlflow_aml_spark']

Name,Workspace,Report Page,Docs Page
mlflow_aml_spark,mlflow_tutorial,Link to Azure Machine Learning studio,Link to Documentation


### 5.1 Comparing Runs Programmatically

### Look at MLFlow Directory Structure

In [61]:
from mlflow.tracking import MlflowClient

list_of_experiments = MlflowClient().list_experiments()
list_of_experiments

[<Experiment: artifact_location='', experiment_id='fe3ff3a4-0fa3-4075-bd4f-8352d3b501d9', lifecycle_stage='active', name='ElasticNet_wine_AML', tags={}>,
 <Experiment: artifact_location='', experiment_id='fc950002-c0b7-4ca0-b1dc-d1685a7ab854', lifecycle_stage='active', name='My Simple New Experiment AML', tags={}>,
 <Experiment: artifact_location='', experiment_id='7fa3ecfb-f601-46c0-a54b-90e9784e6785', lifecycle_stage='active', name='My Simple New Experiment AML - SPARK', tags={}>,
 <Experiment: artifact_location='', experiment_id='9dfeb90d-68ad-4e4c-8eb8-f996428ae994', lifecycle_stage='active', name='My New Experiment AML - SPARK', tags={}>,
 <Experiment: artifact_location='', experiment_id='a3f991b6-5c32-4f14-b448-6f695fd25c76', lifecycle_stage='active', name='mlflow_aml_spark', tags={}>,
 <Experiment: artifact_location='', experiment_id='94a773f8-0f5b-4414-96e9-2fdf050c02c8', lifecycle_stage='active', name='test_exp_name', tags={}>]

In [62]:
#Set up above
exp_id

'a3f991b6-5c32-4f14-b448-6f695fd25c76'

In [63]:
MlflowClient().get_experiment(exp_id)

<Experiment: artifact_location='', experiment_id='a3f991b6-5c32-4f14-b448-6f695fd25c76', lifecycle_stage='active', name='mlflow_aml_spark', tags={}>

#### List Run Info for a given experiment

In [64]:
MlflowClient().list_run_infos(experiment_id=exp_id)

[<RunInfo: artifact_uri='', end_time=1612362968589, experiment_id='a3f991b6-5c32-4f14-b448-6f695fd25c76', lifecycle_stage='active', run_id='dce56ea7-ef4f-49fe-acad-c4ecc1f7eb15', run_uuid='dce56ea7-ef4f-49fe-acad-c4ecc1f7eb15', start_time=1612362965942, status='FINISHED', user_id='c8427b67-4103-4ced-b0c1-2200b7c3dcfa'>,
 <RunInfo: artifact_uri='', end_time=1612362971933, experiment_id='a3f991b6-5c32-4f14-b448-6f695fd25c76', lifecycle_stage='active', run_id='5a96ee4f-cef7-4503-8e09-4a6f2adf4de9', run_uuid='5a96ee4f-cef7-4503-8e09-4a6f2adf4de9', start_time=1612362969331, status='FINISHED', user_id='c8427b67-4103-4ced-b0c1-2200b7c3dcfa'>,
 <RunInfo: artifact_uri='', end_time=1612362965186, experiment_id='a3f991b6-5c32-4f14-b448-6f695fd25c76', lifecycle_stage='active', run_id='9df06b86-7311-41d2-922b-7953ea316417', run_uuid='9df06b86-7311-41d2-922b-7953ea316417', start_time=1612362962242, status='FINISHED', user_id='c8427b67-4103-4ced-b0c1-2200b7c3dcfa'>]

Take a look at the contents of `model` above, which match what we see in the UI.

In [65]:
runID=MlflowClient().list_run_infos(experiment_id=exp_id)[0].run_id
runID

'dce56ea7-ef4f-49fe-acad-c4ecc1f7eb15'

Return the evaluation metrics for the run.

In [73]:
MlflowClient().get_run(runID).data.metrics

{'rmse': 0.7859129997062341,
 'mae': 0.6155290394093893,
 'r2': 0.20224631822892103}

### 5.2 Tagging runs

In [75]:
# get the runs
_run = MlflowClient().get_run(run_id=runID)
print(_run)

<Run: data=<RunData: metrics={'mae': 0.6155290394093893,
 'r2': 0.20224631822892103,
 'rmse': 0.7859129997062341}, params={'alpha': '0.2', 'l1_ratio': '0.2'}, tags={'mlflow.source.git.commit': '70635dade88cc31365a8a0b2275b639981cbab63',
 'mlflow.source.name': '/Users/antfra/.pyenv/versions/3.8.1/envs/mlflow_tutorial/lib/python3.8/site-packages/ipykernel_launcher.py',
 'mlflow.source.type': 'LOCAL',
 'mlflow.user': 'antfra'}>, info=<RunInfo: artifact_uri='azureml://experiments/mlflow_aml_spark/runs/dce56ea7-ef4f-49fe-acad-c4ecc1f7eb15/artifacts', end_time=1612362968589, experiment_id='a3f991b6-5c32-4f14-b448-6f695fd25c76', lifecycle_stage='active', run_id='dce56ea7-ef4f-49fe-acad-c4ecc1f7eb15', run_uuid='dce56ea7-ef4f-49fe-acad-c4ecc1f7eb15', start_time=1612362965942, status='FINISHED', user_id='c8427b67-4103-4ced-b0c1-2200b7c3dcfa'>>


In [76]:
from datetime import datetime
# add a tag to the run
dt = datetime.now().strftime("%d-%m-%Y (%H:%M:%S.%f)")
MlflowClient().set_tag(_run.info.run_id, "deployed", dt)

## 6. Deploy Model

In [83]:
_run.info.artifact_uri

'azureml://experiments/mlflow_aml_spark/runs/dce56ea7-ef4f-49fe-acad-c4ecc1f7eb15/artifacts'

In [85]:
model_path = "model"

model_uri=f'{_run.info.artifact_uri}/{model_path}'
print(model_uri)

azureml://experiments/mlflow_aml_spark/runs/dce56ea7-ef4f-49fe-acad-c4ecc1f7eb15/artifacts/model


In [86]:
# Create a deployment config
aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)

In [87]:
# Register and deploy model to Azure Container Instance (ACI)
(webservice, model) = mlflow.azureml.deploy(model_uri=model_uri,
                                            workspace=ws,
                                            model_name='winemodel',
                                            service_name='winemodelservice',
                                            deployment_config=aci_config)

Registering model winemodel


2021/02/03 10:44:36 INFO mlflow.azureml: Registered an Azure Model with name: `winemodel` and version: `1`
2021/02/03 10:44:46 INFO mlflow.azureml: Deploying an Azure Webservice with name: `winemodelservice`


Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running........................................................................................................
Succeeded
ACI service creation operation finished, operation "Succeeded"


## 7. Test Endpoint

In [88]:
# After the model deployment completes, requests can be posted via HTTP to the new ACI
# webservice's scoring URI. The following example posts a sample input from the wine dataset
# used in the MLflow ElasticNet example:
# https://github.com/mlflow/mlflow/tree/master/examples/sklearn_elasticnet_wine
print("Scoring URI is: %s", webservice.scoring_uri)

import requests
import json

# `sample_input` is a JSON-serialized pandas DataFrame with the `split` orientation
sample_input = {
    "columns": [
        "alcohol",
        "chlorides",
        "citric acid",
        "density",
        "fixed acidity",
        "free sulfur dioxide",
        "pH",
        "residual sugar",
        "sulphates",
        "total sulfur dioxide",
        "volatile acidity"
    ],
    "data": [
        [8.8, 0.045, 0.36, 1.001, 7, 45, 3, 20.7, 0.45, 170, 0.27]
    ]
}
response = requests.post(
              url=webservice.scoring_uri, data=json.dumps(sample_input),
              headers={"Content-type": "application/json"})
response_json = json.loads(response.text)
print(response_json)

Scoring URI is: %s http://6634922d-8f09-4b2e-a98b-98ea019a8cd5.eastus.azurecontainer.io/score
[3.5662844909341738]


#### Delete Webservice

In [89]:
webservice.delete()