# Lab 2 - Using Azure Machine Learning service Model Versioning and Run History

In this lab you will use the capabilities of the Azure Machine Learning service to collect model performance metrics and to capture model version, as well as query the experimentation run history to retrieve captured metrics.

## Download the datasets

The following cell will download the dataset used by this lab. Click into the following cell and use `Shift + Enter` to execute it

In [1]:
import os

main_path = os.path.abspath(os.path.curdir)
print("Current working directory is ", main_path)
data_path = os.path.join(main_path, 'data')
os.listdir(data_path)

Current working directory is  C:\Users\sasever\Desktop\SelfLearning\AzureML\AML-service-labs-master\starter-artifacts\jupyter\azure-ml-labs\02-model-management


['UsedCars_Affordability.csv']

## Exercise 2 - Train a simple model

This lab builds upon the lessons learned in the previous lab, but is self contained so you work thru this lab without having to run a previous lab. As such Steps 1, 2 and 3 in the lab are not explored in detail as their goal is to setup a few experiment runs, which was covered in detail in Lab 1.

Read thru the following cell. Use `Shift + Enter` to execute cell. Take a moment to look at the data loaded into the Pandas Dataframe - it contains data about used cars such as the price (in dollars), age (in years), KM (kilometers driven) and other attributes like weather it is automatic transimission, the number of doors, and the weight.

In [3]:
# Step 1 - load the training data locally
#########################################
import os
import numpy as np
import pandas as pd
from sklearn import linear_model 
from sklearn.externals import joblib
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import azureml
from azureml.core import Run
from azureml.core import Workspace
from azureml.core.run import Run
from azureml.core.experiment import Experiment
import pickle

pathToCsvFile = os.path.join(data_path, 'UsedCars_Affordability.csv')
df_affordability = pd.read_csv(pathToCsvFile, delimiter=',')
print(df_affordability)

full_X = df_affordability[["Age", "KM"]]
full_Y = df_affordability[["Affordable"]]

      Age     KM  Affordable
0      23  46986           0
1      23  72937           0
2      24  41711           0
3      26  48000           0
4      30  38500           0
5      32  61000           0
6      27  94612           0
7      30  75889           0
8      27  19700           0
9      23  71138           0
10     25  31461           0
11     22  43610           0
12     25  32189           0
13     31  23000           0
14     32  34131           0
15     28  18739           0
16     30  34000           0
17     24  21716           0
18     24  25563           0
19     30  64359           0
20     30  67660           0
21     29  43905           0
22     28  56349           0
23     28  32220           0
24     29  25813           0
25     25  28450           0
26     27  34545           0
27     29  41415           0
28     28  44142           0
29     30  11090           0
...   ...    ...         ...
1406   70  44850           1
1407   69  44826           1
1408   80  444

In the following cell, we will define a helper method that trains, evaluates and then registers the trained model with Azure Machine Learning. Execute the following cell.

In [4]:
# Step 2 - Define a helper method for training, evaluating and registering a model
################################################################################### 
def train_eval_register_model(experiment_name, full_X, full_Y,training_set_percentage):

    # start a training run by defining an experiment
    myexperiment = Experiment(ws, experiment_name)
    run = myexperiment.start_logging()


    train_X, test_X, train_Y, test_Y = train_test_split(full_X, full_Y, train_size=training_set_percentage, random_state=42)

    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(train_X)
    clf = linear_model.LogisticRegression(C=1)
    clf.fit(X_scaled, train_Y)

    scaled_inputs = scaler.transform(test_X)
    predictions = clf.predict(scaled_inputs)
    score = accuracy_score(test_Y, predictions)

    print("With %0.2f percent of data, model accuracy reached %0.4f." % (training_set_percentage, score))

    # Log the training metrics to Azure Machine Learning service run history
    run.log("Training_Set_Percentage", training_set_percentage)
    run.log("Accuracy", score)
    run.complete()

    output_model_path = 'outputs/' + experiment_name + '.pkl'
    pickle.dump(clf,open(output_model_path,'wb'))

    # Register and upload this version of the model with Azure Machine Learning service
    registered_model = run.register_model(model_name='usedcarsmodel', model_path=output_model_path)

    print(registered_model.name, registered_model.id, registered_model.version, sep = '\t')

    return (clf, score)

In the next cell, we retrieve an existing Azure Machine Learning Workspace (or create a new one if desired). In this cell, be sure to set the values for `subscription_id`, `resource_group`, `workspace_name` and `workspace_region` as directed by the comments. With the Workspace retrieved, we will train 3 different models using different subsets of the training data. Execute the cell.

In [6]:
# Step 3 - Run a few experiments in your Azure ML Workspace
###########################################################
# Verify AML SDK Installed
print("SDK Version:", azureml.core.VERSION)


# Create a new Workspace or retrieve the existing one
#Provide the Subscription ID of your existing Azure subscription
subscription_id ='757c4165-0823-49f7-9678-5a85fe5e13cc'

# Provide values for the Resource Group and Workspace that will be created
resource_group = 'MLworkspace2'
workspace_name = 'snml2'
workspace_region = 'westeurope'  # eastus, westcentralus, southeastasia, australiaeast, westeurope

# By using the exist_ok param, if the worskpace already exists we get a reference to the existing workspace instead of an error
ws = Workspace.create(
    name = workspace_name,
    subscription_id = subscription_id,
    resource_group = resource_group, 
    location = workspace_region,
    exist_ok = True)

print("Workspace Provisioning complete.")


# Create an experiment, log metrics and register the created models for multiple training runs
experiment_name = "Experiment-02-01"
training_set_percentage = 0.25
model, score = train_eval_register_model(experiment_name, full_X, full_Y, training_set_percentage)

experiment_name = "Experiment-02-02"
training_set_percentage = 0.5
model, score = train_eval_register_model(experiment_name, full_X, full_Y, training_set_percentage)

experiment_name = "Experiment-02-03"
training_set_percentage = 0.75
model, score = train_eval_register_model(experiment_name, full_X, full_Y, training_set_percentage)



SDK Version: 1.0.6
Workspace Provisioning complete.


  y = column_or_1d(y, warn=True)


With 0.25 percent of data, model accuracy reached 0.9192.
usedcarsmodel	usedcarsmodel:1	1
With 0.50 percent of data, model accuracy reached 0.9109.
usedcarsmodel	usedcarsmodel:2	2
With 0.75 percent of data, model accuracy reached 0.9081.
usedcarsmodel	usedcarsmodel:3	3


## Use Azure Machine Learning to query for performance metrics

As was demonstrated in the previous lab, you can use the Workspace to get a list of Experiments. You can also query for a particular Experiment by name. With an Experiment in hand, you review all runs associated with that Experiment and retrieve the metrics associated with each run. Execute the following cell| to see this process. What was the accuracy of the only run for Experiment-02-03?

In [7]:
# Step 4 - Query for all Experiments.
#####################################
# You can retreive the list of all experiments in Workspace using the following:
all_experiments = ws.experiments

print(all_experiments)

# Query for the metrics of a particular experiment
# You can retrieve an existing experiment by constructing an Experiment object using the name of an existing experiment.
my_experiment = Experiment(ws, "Experiment-02-03")
print(my_experiment)

# Query an experiment for metrics
# With an experiment in hand, you retrieve any metrics collected for any of its child runs 
my_experiment_runs = my_experiment.get_runs()
print( [ (run.experiment.name, run.id, run.get_metrics()) for run in my_experiment_runs] )

{'GEAR-EXP': Experiment(Name: GEAR-EXP,
Workspace: snml2), 'UsedCars_Experiment': Experiment(Name: UsedCars_Experiment,
Workspace: snml2), 'Experiment-02-01': Experiment(Name: Experiment-02-01,
Workspace: snml2), 'Experiment-02-02': Experiment(Name: Experiment-02-02,
Workspace: snml2), 'Experiment-02-03': Experiment(Name: Experiment-02-03,
Workspace: snml2)}
Experiment(Name: Experiment-02-03,
Workspace: snml2)
[('Experiment-02-03', '9aff433e-5425-48e1-9806-271448988641', {'Training_Set_Percentage': 0.75, 'Accuracy': 0.9080779944289693})]
