# Train Models

The central goal of machine learning is to train predictive models that can be used by applications. In Azure Machine Learning,  you can use scripts to train models leveraging common machine learning frameworks like Scikit-Learn, Tensorflow, PyTorch, SparkML, and others. You can run these training scripts as experiments in order to track metrics and outputs, which include the trained models.

## Connect to your workspace

To get started, connect to your workspace.

> **Note**: If you haven't already established an authenticated session with your Azure subscription, you'll be prompted to authenticate by clicking a link, entering an authentication code, and signing into Azure.

In [18]:
import azureml.core
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

Ready to use Azure ML 1.39.0 to work with amlworkshop


In [19]:
# userid = <userid>
userid = ''

model_name='diabetes_model_' + userid
experiment_name = 'mslearn-train-diabetes-' + userid

print(
model_name,
experiment_name
)

diabetes_model_jodobrze mslearn-train-diabetes-jodobrze


## Train model

Well train our model using local compute. We'll track the experiment with a run to capture metrics and trained model (.pkl).

In [20]:
# Import libraries
from azureml.core import Experiment, Run
import pandas as pd
import numpy as np
import joblib
import os
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve

# Create an Azure ML experiment in your workspace
experiment = Experiment(workspace=ws, name=experiment_name)

# Start logging data from the experiment, obtaining a reference to the experiment run
run = experiment.start_logging()

# load the diabetes dataset
print("Loading Data...")
diabetes = pd.read_csv('../../data/diabetes/diabetes.csv')

# Separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Set regularization hyperparameter
reg = 0.01

# Train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
run.log('Regularization Rate',  np.float(reg))
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

# Save the trained model in the outputs folder
os.makedirs('outputs', exist_ok=True)
joblib.dump(value=model, filename='outputs/diabetes_model.pkl')

run.complete()

print('Training run complete')

Loading Data...
Training a logistic regression model with regularization rate of 0.01
Accuracy: 0.774
AUC: 0.848370565699786
Training run complete


## Register the trained model

Note that the outputs of the experiment include the trained model file (**diabetes_model.pkl**). You can register this model in your Azure Machine Learning workspace, making it possible to track model versions and retrieve them later.

In [21]:
from azureml.core import Model

# Register the model
# run.register_model(model_path='', #TODO: Supply path to pkl file.
#                    model_name=model_name,
#                    tags={'Training context':'Script'},
#                    properties={})#TODO: Add the AUC and Accuracy run metrics as properties 

run.register_model(model_path='outputs/diabetes_model.pkl', model_name=model_name,
                   tags={'Training context':'Script'},
                   properties={'AUC': run.get_metrics()['AUC'], 'Accuracy': run.get_metrics()['Accuracy']})

Model(workspace=Workspace.create(name='amlworkshop', subscription_id='f5052c6f-7974-48eb-b0ba-f390a5c40ab8', resource_group='TrainingWorkshopAML'), name=diabetes_model_jodobrze, id=diabetes_model_jodobrze:6, version=6, tags={'Training context': 'Script'}, properties={'AUC': '0.848370565699786', 'Accuracy': '0.774'})

### Clean-up local workspace
Remove files and directories created during exercise.

In [22]:
import os
import shutil

shutil.rmtree('outputs', ignore_errors=True)