# Train a model using the Python SDK

This notebook demonstrates how to train a classification model using the Python SDK from a Jupyter notebook.

The first step is to connect to an Azure ML workspace:

In [1]:
from azureml.core import Workspace
ws = Workspace.from_config()

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FLAFQ2HKF to authenticate.




Interactive authentication successfully completed.


In [16]:
print('ML Workspace: ' + ws.name, 'Resource Group: ' + ws.resource_group, 'Location: ' + ws.location, sep='\n')

ML Workspace: CT-AML-WS
Resource Group: CT-AML-RG
Location: southcentralus


## Create an experiment an load a dataset

Next we create an experiment in the workspace:

In [3]:
from azureml.core import Experiment
experiment = Experiment(workspace=ws, name="diabetes-experiment")

and load a dataset and split it into training and test datsets:

In [4]:
from azureml.opendatasets import Diabetes
from sklearn.model_selection import train_test_split

x_df = Diabetes.get_tabular_dataset().to_pandas_dataframe().dropna()
y_df = x_df.pop("Y")

X_train, X_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=66)

ActivityStarted, get_tabular_dataset
ActivityCompleted: Activity=get_tabular_dataset, HowEnded=Success, Duration=45059.75 [ms]


In [5]:
X_train.head()

Unnamed: 0,AGE,SEX,BMI,BP,S1,S2,S3,S4,S5,S6
440,36,1,30.0,95.0,201,125.2,42.0,4.79,5.1299,85
389,47,2,26.5,70.0,181,104.8,63.0,3.0,4.1897,70
5,23,1,22.6,89.0,139,64.8,61.0,2.0,4.1897,68
289,28,2,31.5,83.0,228,149.4,38.0,6.0,5.3132,83
101,53,2,22.2,113.0,197,115.2,67.0,3.0,4.3041,100


## Train the Model

Next we train a model using 10 runs at different alpha values (hyperparameters):

In [7]:
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.externals import joblib
import math

alphas = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]

for alpha in alphas:
    run = experiment.start_logging()
    run.log("alpha_value", alpha)
    
    model = Ridge(alpha=alpha)
    model.fit(X=X_train, y=y_train)
    y_pred = model.predict(X=X_test)
    rmse = math.sqrt(mean_squared_error(y_true=y_test, y_pred=y_pred))
    run.log("rmse", rmse)
    
    model_name = "model_alpha_" + str(alpha) + ".pkl"
    filename = "outputs/" + model_name
    
    joblib.dump(value=model, filename=filename)
    run.upload_file(name=model_name, path_or_stream=filename)
    run.complete()

## Review the Experiment in Azure ML 

On completion, we can review the experiment and runs in Azure ML:

In [8]:
experiment

Name,Workspace,Report Page,Docs Page
diabetes-experiment,CT-AML-WS,Link to Azure Machine Learning studio,Link to Documentation


## Get the best model

In addition to being able to download model files from the experiment in the portal, you can also download them programmatically. The following code iterates through each run in the experiment, and accesses both the logged run metrics and the run details (which contains the run_id). This keeps track of the best run, in this case the run with the lowest root-mean-squared-error.

In [9]:
minimum_rmse_runid = None
minimum_rmse = None

for run in experiment.get_runs():
    run_metrics = run.get_metrics()
    run_details = run.get_details()
    # each logged metric becomes a key in this returned dict
    run_rmse = run_metrics["rmse"]
    run_id = run_details["runId"]
    
    if minimum_rmse is None:
        minimum_rmse = run_rmse
        minimum_rmse_runid = run_id
    else:
        if run_rmse < minimum_rmse:
            minimum_rmse = run_rmse
            minimum_rmse_runid = run_id

print("Best run_id: " + minimum_rmse_runid)
print("Best run_id rmse: " + str(minimum_rmse))  

Best run_id: 66a504e7-007a-4317-9f89-b8ac39ef3792
Best run_id rmse: 56.60520331339142


## Get the model file and download

Use the best run id to fetch the individual run using the Run constructor along with the experiment object. Then call get_file_names() to see all the files available for download from this run.


In [12]:
from azureml.core import Run
best_run = Run(experiment=experiment, run_id=minimum_rmse_runid)
print(best_run.get_file_names())

['model_alpha_0.1.pkl']


In [None]:
best_run.download_file(name="model_alpha_0.1.pkl")