# Train Models

The central goal of machine learning is to train predictive models that can be used by applications. In Azure Machine Learning,  you can use scripts to train models leveraging common machine learning frameworks like Scikit-Learn, Tensorflow, PyTorch, SparkML, and others. You can run these training scripts as experiments in order to track metrics and outputs, which include the trained models.

## Connect to your workspace

To get started, connect to your workspace.

> **Note**: If you haven't already established an authenticated session with your Azure subscription, you'll be prompted to authenticate by clicking a link, entering an authentication code, and signing into Azure.

In [None]:
import os
directory_path = os.getcwd()
print("My current directory is : " + directory_path)
folder_name = os.path.basename(directory_path)
print("My directory name is : " + folder_name)

parent = os.path.dirname(directory_path)
parent_folder_name = os.path.basename(parent)
print("My user directory name is: " + parent_folder_name)

user = parent_folder_name
user = user.replace('_', '')
user = user.replace('-', '')
user = user[:10]

experiment_name = parent_folder_name + '-002-code-a-thon-diabetes'

In [None]:
import azureml.core
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

## Create a training script

You're going to use a Python script to train a machine learning model based on the diabates data, so let's start by creating a folder for the script and data files.

In [None]:
# Import libraries
from azureml.core import Experiment
from azureml.core import Model
from azureml.core import Run
import pandas as pd
import numpy as np
import joblib
import os
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve

# Create an Azure ML experiment in your workspace
experiment = Experiment(workspace=ws, name= experiment_name)
run = experiment.start_logging()
print("Starting experiment:", experiment.name)


# load the diabetes dataset
print("Loading Data...")
diabetes = pd.read_parquet('./data/diabetes.parquet')

# Separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Set regularization hyperparameter
reg = 0.01

# Train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
run.log('Regularization Rate',  np.float(reg))
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', np.float(auc))

# Save the trained model in the outputs folder
os.makedirs('outputs', exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record

model_file = 'outputs/diabetes_model.pkl'
joblib.dump(value=model, filename=model_file)

run.complete()



# Register the model
run.register_model(model_path=model_file, model_name= user + '-diabetes_code-a-thon-model',
                   tags={'Model Type':'Logistic Regresssion'})

print('Model trained and registered.')


## Register a new version of your model, outside of an experiment

Optionally - you can upload a trained model - View it in teh Model List, and note there is no Experiment or Run ID assosiated with the model, but the model version has been increaed.

In [None]:
# Register locally-trained model into AML workspace
# Note: you can optionally register a saved model directly in the AML Studio UI
from azureml.core import Model

model = Model.register(model_name= user + '-diabetes_code-a-thon-model', 
                       model_path= model_file, 
                       workspace=ws,
                       model_framework='scikit-learn',
                       tags={'Model Type': 'Logistic regression'}
                      )
model