# Run a training script with the Python SDK

You can use the Python SDK for Azure Machine Learning to submit scripts as jobs. By using jobs, you can easily keep track of the input parameters and outputs when training a machine learning model.

## Before you start

You'll need the latest version of the **azure-ai-ml** package to run the code in this notebook. Run the cell below to verify that it is installed.

> **Note**:
> If the **azure-ai-ml** package is not installed, run `pip install azure-ai-ml` to install it.

In [1]:
%pip show azure-identity && azure-ai-ml

Name: azure-identity
Version: 1.17.1
Summary: Microsoft Azure Identity Library for Python
Home-page: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/identity/azure-identity
Author: Microsoft Corporation
Author-email: azpysdkhelp@microsoft.com
License: MIT License
Location: /opt/anaconda3/lib/python3.12/site-packages
Requires: azure-core, cryptography, msal, msal-extensions, typing-extensions
Required-by: opencensus-ext-azure
zsh:1: command not found: azure-ai-ml
Note: you may need to restart the kernel to use updated packages.


## Connect to your workspace

With the required SDK packages installed, now you're ready to connect to your workspace.

To connect to a workspace, we need identifier parameters - a subscription ID, resource group name, and workspace name. A `config.json` file containing these parameters can be downloaded from the Azure Machine Learning workspace or Azure portal.

In [2]:
from azureml.core import Workspace

ws = Workspace.from_config(path="../config.json")

## Create the Experiment

In [3]:
from azureml.core import Experiment
import pandas as pd

# create the experiment
experiment = Experiment(workspace=ws, name="diabetes-train-predict")


## Create the Experiment Python script to train and score a model

To train a model, you'll first create the **diabetes_training.py** script in the **src** folder. The script uses the **diabetes.csv** file in the same folder as the training data.

In [4]:
%%writefile diabetes-training.py
# import libraries
from azureml.core import Run, Model
import joblib
import argparse
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve

# Get the experiment run context 
run = Run.get_context()

# Set regularization hyperparameter
parser = argparse.ArgumentParser()
parser.add_argument('--reg-rate', type=float, dest='reg_rate', default=0.01)
parser.add_argument('--test-size', type=float, dest='test_size', default=0.30)
args = parser.parse_args()

reg_rate = args.reg_rate
test_size = args.test_size

# load the diabetes dataset
print("Loading Data...")
diabetes = pd.read_csv('diabetes.csv')

# separate features and labels
X = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values
y = diabetes['Diabetic'].values

# split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=0)

# train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg_rate)
model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', float(acc))
run.log('Accuracy', float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', float(auc))

# Save the model
filename = 'outputs/model.pkl'
os.makedirs('outputs', exist_ok=True)
joblib.dump(value=model, filename= filename)

# complete the experiment
run.complete()

Overwriting diabetes-training.py


## Submit script to the run as an Experiment

Submit the script that trains a classification model to predict diabetes, to run as an experiment. This will create a job base don the ScriptConfiguration.

Here, the Experiment is created in Azure with local compute target. An Azure compute instance target can also be specified using the  RunConfiguration of the ScriptRunConfig. 

In [5]:
from azureml.core import ScriptRunConfig, Environment

myenv = Environment("user-managed-env")
myenv.python.user_managed_dependencies = True

# Define arguments / parameters

test_size = 0.30
reg_rate = 0.01

script_config = ScriptRunConfig(
    source_directory=".",
    script="diabetes-training.py",
    arguments=["--reg-rate", reg_rate, "--test-size", test_size],
    environment=myenv,
)

run = experiment.submit(config=script_config)
run.wait_for_completion(show_output=False)

{'runId': 'diabetes-train-predict_1726128117_70c05993',
 'target': 'local',
 'status': 'Finalizing',
 'startTimeUtc': '2024-09-12T08:02:00.421266Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'local',
  '_azureml.ClusterName': 'local',
  'ContentSnapshotId': '7e3c3183-65ac-4bbb-9e39-11974a013b69',
  'azureml.git.repository_uri': 'https://github.com/kennedyopokuasare/Azure_datascience.git',
  'mlflow.source.git.repoURL': 'https://github.com/kennedyopokuasare/Azure_datascience.git',
  'azureml.git.branch': 'main',
  'mlflow.source.git.branch': 'main',
  'azureml.git.commit': 'e63012099664eb62f3f33b338f35aba4fa3da8ed',
  'mlflow.source.git.commit': 'e63012099664eb62f3f33b338f35aba4fa3da8ed',
  'azureml.git.dirty': 'True'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': 'diabetes-training.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': ['--reg-rate', '0.01', '--test-size', '0.3'],
  'sourceDirectoryDataStore': None,
  'framewo

## Register the model output

In [6]:
import sklearn
from azureml.core import Model

filename = 'outputs/model.pkl'

run.register_model(
    model_name="diabetes-classification-model",
    model_path = filename,
    description = "A LogisticRegression classification model for Diabetes",
    tags = { 'data-format':"CSV", "regularization-rate":reg_rate, "test-size": test_size},
    model_framework = Model.Framework.SCIKITLEARN,
    model_framework_version = str(sklearn.__version__)
)

Model(workspace=Workspace.create(name='Diabetes-prediction', subscription_id='ed463f81-92a5-476c-b6e1-82f1a28d21e2', resource_group='ml-prod-scale'), name=diabetes-classification-model, id=diabetes-classification-model:1, version=1, tags={'data-format': 'CSV', 'regularization-rate': '0.01', 'test-size': '0.3'}, properties={})