## Use MLflow with Azure Machine Learning for Remote Training Run

This example shows you how to use MLflow tracking APIs together with Azure Machine Learning services for storing your metrics and artifacts, from local Notebook run. You'll learn how to:

 1. Set up MLflow tracking URI so as to use Azure ML
 2. Create experiment
 3. Train a model on Machine Learning Compute while logging metrics and artifacts
 4. View your experiment within your Azure ML Workspace in Azure Portal.

<img src="https://github.com/retkowsky/images/blob/master/AzureMLservicebanniere.png?raw=true">

<img src="https://mlflow.org/docs/0.2.1/_static/MLflow-logo-final-black.png" width="400">

## Prerequisites

Make sure you have completed the [Configuration](../../../configuration.ipnyb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met.

In [1]:
#pip install azureml-mlflow

## Set-up

Check Azure ML SDK version installed on your computer, and then connect to your Workspace.

In [1]:
import time
datedujour = time.strftime("%Y-%m-%d")
print(datedujour)

2020-01-08


In [2]:
# Check core SDK version number
import azureml.core
from azureml.core import Workspace, Experiment

print("SDK version:", azureml.core.VERSION)

ws = Workspace.from_config()

SDK version: 1.0.72


Let's also create a Machine Learning Compute cluster for submitting the remote run. 

In [3]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "cpu-cluster"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print("Found existing cpu-cluster")
except ComputeTargetException:
    print("Creating new cpu-cluster")
    
    # Specify the configuration for the new cluster
    compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_D2_V2",
                                                           min_nodes=0,
                                                           max_nodes=1)

    # Create the cluster with the specified name and configuration
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
    
    # Wait for the cluster to complete, show the output log
    cpu_cluster.wait_for_completion(show_output=True)

Found existing cpu-cluster


### Create Azure ML Experiment

The following steps show how to submit a training Python script to a cluster as an Azure ML run, while logging happens through MLflow APIs to your Azure ML Workspace. Let's first create an experiment to hold the training runs.

In [4]:
from azureml.core import Experiment

experiment_name = "MLFlow"
exp = Experiment(workspace=ws, name=experiment_name)

### Instrument remote training script using MLflow

Let's use [*train_diabetes.py*](train_diabetes.py) to train a regression model against diabetes dataset as the example. Note that the training script uses mlflow.start_run() to start logging, and then logs metrics, saves the trained scikit-learn model, and saves a plot as an artifact.

Run following command to view the script file. Notice the mlflow logging statements, and also notice that the script doesn't have explicit dependencies on azureml library.

In [5]:
training_script = 'train_diabetes.py'
with open(training_script, 'r') as f:
    print(f.read())

# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.

import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import mlflow
import mlflow.sklearn

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

with mlflow.start_run():
    X, y = load_diabetes(return_X_y=True)
    columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
    data = {
        "train": {"X": X_train, "y": y_train},
        "test": {"X": X_test, "y": y_test}}

    mlflow.log_metric("Training samples", len(data['train']['X']))
    mlflow.log_metric("Test samples", len(data['test']['X']))

    # Log the algorithm parameter alpha to the run
    mlflow.log_metric('alpha', 0.03)
    # Create, fit, and test the scikit-lea

### Submit Run to Cluster 

Let's submit the run to cluster. When running on the remote cluster as submitted run, Azure ML sets the MLflow tracking URI to point to your Azure ML Workspace, so that the metrics and artifacts are automatically logged there.

Note that you have to specify the packages your script depends on, including *azureml-mlflow* that implicitly enables the MLflow logging to Azure ML. 

First, create a environment with Docker enable and required package dependencies specified.

In [6]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

env = Environment(name="mlflow-env")

env.docker.enabled = True

# Specify conda dependencies with scikit-learn and temporary pointers to mlflow extensions
cd = CondaDependencies.create(
    conda_packages=["scikit-learn", "matplotlib"],
    pip_packages=["azureml-mlflow", "numpy"]
    )

env.python.conda_dependencies = cd

Next, specify a script run configuration that includes the training script, environment and CPU cluster created earlier.

In [7]:
from azureml.core import ScriptRunConfig

src = ScriptRunConfig(source_directory=".", script=training_script)
src.run_config.environment = env
src.run_config.target = cpu_cluster.name

Finally, submit the run. Note that the first instance of the run typically takes longer as the Docker-based environment is created, several minutes. Subsequent runs reuse the image and are faster.

In [8]:
%%time
run = exp.submit(src)
run.wait_for_completion(show_output=True)

RunId: MLFlow_1578489177_74b589f6
Web View: https://ml.azure.com/experiments/MLFlow/runs/MLFlow_1578489177_74b589f6?wsid=/subscriptions/70b8f39e-8863-49f7-b6ba-34a80799550c/resourcegroups/azuremlserviceresourcegroup/workspaces/azuremlservice

Streaming azureml-logs/20_image_build_log.txt

2020/01/08 13:13:08 Downloading source code...
2020/01/08 13:13:10 Finished downloading source code
2020/01/08 13:13:11 Creating Docker network: acb_default_network, driver: 'bridge'
2020/01/08 13:13:11 Successfully set up Docker network: acb_default_network
2020/01/08 13:13:11 Setting up Docker configuration...
2020/01/08 13:13:12 Successfully set up Docker configuration
2020/01/08 13:13:12 Logging in to registry: azuremlservice8791701193.azurecr.io
2020/01/08 13:13:13 Successfully logged into azuremlservice8791701193.azurecr.io
2020/01/08 13:13:13 Executing step ID: acb_step_0. Timeout(sec): 5400, Working directory: '', Network: 'acb_default_network'
2020/01/08 13:13:13 Scanning for dependencies...


{'runId': 'MLFlow_1578489177_74b589f6',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2020-01-08T13:23:28.571728Z',
 'endTimeUtc': '2020-01-08T13:25:47.920Z',
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': '942d5480-9481-4715-86f3-1f5f1e70dd10',
  'AzureML.DerivedImageName': 'azureml/azureml_a1c64b7c7a7449ee70ec47de9d3700d4',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [],
 'runDefinition': {'script': 'train_diabetes.py',
  'arguments': [],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'cpu-cluster',
  'dataReferences': {},
  'data': {},
  'jobName': None,
  'maxRunDurationSeconds': None,
  'nodeCount': 1,
  'environment': {'name': 'mlflow-env',
   'version': 'Autosave_2020-01-08T13:12:59Z_9a63d81c',
   'python': {'interpreterPath': 'python',
    'userManagedDependencies': False,
    'cond

You can navigate to your Azure ML Workspace at Azure Portal to view the run metrics and artifacts. 

In [9]:
run

Experiment,Id,Type,Status,Details Page,Docs Page
MLFlow,MLFlow_1578489177_74b589f6,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


You can also get the metrics and bring them to your local notebook, and view the details of the run.

In [12]:
run.get_metrics()

{'Training samples': 353.0,
 'Test samples': 89.0,
 'alpha': 0.03,
 'mse': 3424.900315896017}

In [13]:
ws.get_details()

{'id': '/subscriptions/70b8f39e-8863-49f7-b6ba-34a80799550c/resourceGroups/azuremlserviceresourcegroup/providers/Microsoft.MachineLearningServices/workspaces/azuremlservice',
 'name': 'azuremlservice',
 'location': 'westeurope',
 'type': 'Microsoft.MachineLearningServices/workspaces',
 'tags': {'market': 'Serge', 'Cost Center': 'Internal Azure Cost Center'},
 'sku': 'Enterprise',
 'workspaceid': 'b7a492c5-1d27-4c35-99bf-ef22bdee0fbb',
 'description': '',
 'friendlyName': '',
 'creationTime': '2019-03-27T16:29:47.5576428+00:00',
 'containerRegistry': '/subscriptions/70b8f39e-8863-49f7-b6ba-34a80799550c/resourcegroups/azuremlserviceresourcegroup/providers/microsoft.containerregistry/registries/azuremlservice8791701193',
 'keyVault': '/subscriptions/70b8f39e-8863-49f7-b6ba-34a80799550c/resourcegroups/azuremlserviceresourcegroup/providers/microsoft.keyvault/vaults/azuremlservice7711339503',
 'applicationInsights': '/subscriptions/70b8f39e-8863-49f7-b6ba-34a80799550c/resourcegroups/azuremls