# Run a training script with the Python SDK

You can use the Python SDK for Azure Machine Learning to submit scripts as jobs. By using jobs, you can easily keep track of the input parameters and outputs when training a machine learning model.

## Before you start

You'll need the latest version of the **azure-ai-ml** package to run the code in this notebook. Run the cell below to verify that it is installed.

> **Note**:
> If the **azure-ai-ml** package is not installed, run `pip install azure-ai-ml` to install it.

In [1]:
%pip show azure-identity && azure-ai-ml

Name: azure-identity
Version: 1.17.1
Summary: Microsoft Azure Identity Library for Python
Home-page: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/identity/azure-identity
Author: Microsoft Corporation
Author-email: azpysdkhelp@microsoft.com
License: MIT License
Location: /opt/anaconda3/lib/python3.12/site-packages
Requires: azure-core, cryptography, msal, msal-extensions, typing-extensions
Required-by: opencensus-ext-azure
zsh:1: command not found: azure-ai-ml
Note: you may need to restart the kernel to use updated packages.


## Connect to your workspace

With the required SDK packages installed, now you're ready to connect to your workspace.

To connect to a workspace, we need identifier parameters - a subscription ID, resource group name, and workspace name. Since you're working with a compute instance, managed by Azure Machine Learning, you can use the default values to connect to the workspace.

In [2]:
from azureml.core import Workspace
ws = Workspace.from_config(path="../config.json")

## Create Experiment

* Create an experiment 
* Start logging

Experiment is created in Azure with local compute target

In [3]:
from azureml.core import Experiment
import pandas as pd
# create the experiment
experiment = Experiment(workspace=ws, name= "diabetest-train-predict")

# start logging 
#run = experiment.start_logging()

## Create the Experiment Python script to train and score a model

To train a model, you'll first create the **diabetes_training.py** script in the **src** folder. The script uses the **diabetes.csv** file in the same folder as the training data.

In [4]:
%%writefile diabetes-training.py
# import libraries
from azureml.core import Run
import joblib
import argparse
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve

# Get the experiment run context 
run = Run.get_context()

# Set regularization hyperparameter
parser = argparse.ArgumentParser()
parser.add_argument('--reg-rate', type=float, dest='reg_rate', default=0.01)
args = parser.parse_args()
reg = args.reg_rate

# load the diabetes dataset
print("Loading Data...")
diabetes = pd.read_csv('diabetes.csv')

# separate features and labels
X = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values
y = diabetes['Diabetic'].values

# split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', float(acc))
run.log('Accuracy', float(acc))

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test,y_scores[:,1])
print('AUC: ' + str(auc))
run.log('AUC', float(auc))

# Save the model
os.makedirs('outputs', exist_ok=True)
joblib.dump(value=model, filename='outputs/model.pkl')

# complete the experiment
run.complete()

Overwriting diabetes-training.py


## Submit Script to the run as an experiment

Run the cell below to submit the job that trains a classification model to predict diabetes. 

In [5]:
from azureml.core import ScriptRunConfig, Environment

myenv = Environment("user-managed-env")
myenv.python.user_managed_dependencies = True

script_config = ScriptRunConfig(
    source_directory = ".",
    script = "diabetes-training.py",
    arguments = ['--reg-rate', 0.01],
    environment = myenv
)

run = experiment.submit(config= script_config)
run.wait_for_completion(show_output=True)
#help(script_config)

RunId: diabetest-train-predict_1726074198_93a312fc
Web View: https://ml.azure.com/runs/diabetest-train-predict_1726074198_93a312fc?wsid=/subscriptions/ed463f81-92a5-476c-b6e1-82f1a28d21e2/resourcegroups/AzureML/workspaces/ConnectionFromVSCode&tid=746a21b3-a76e-4d67-a142-eeb97ab5314f

Streaming azureml-logs/70_driver_log.txt

  """
  print("[{}] Entering context manager injector.".format(datetime.datetime.utcnow().isoformat()))
[2024-09-11T17:03:20.904512] Entering context manager injector.
  print("[{}] {} Command line Options: {}".format(datetime.datetime.utcnow().isoformat(), os.path.basename(__file__), options))
[2024-09-11T17:03:21.228155] context_manager_injector.py Command line Options: Namespace(inject=['ProjectPythonPath:context_managers.ProjectPythonPath', 'RunHistory:context_managers.RunHistory', 'TrackUserError:context_managers.TrackUserError'], invocation=['diabetes-training.py', '--reg-rate', '0.01'])
Script type = None
[2024-09-11T17:03:21.229239] Entering Run History Con

{'runId': 'diabetest-train-predict_1726074198_93a312fc',
 'target': 'local',
 'status': 'Completed',
 'startTimeUtc': '2024-09-11T17:03:20.759625Z',
 'endTimeUtc': '2024-09-11T17:04:29.128423Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'local',
  '_azureml.ClusterName': 'local',
  'ContentSnapshotId': '0d20c200-1b29-45c5-8877-5b56facf3ec3',
  'azureml.git.repository_uri': 'https://github.com/kennedyopokuasare/Azure_datascience.git',
  'mlflow.source.git.repoURL': 'https://github.com/kennedyopokuasare/Azure_datascience.git',
  'azureml.git.branch': 'main',
  'mlflow.source.git.branch': 'main',
  'azureml.git.commit': 'b87e479665afa323368385ae10508a13af4771f6',
  'mlflow.source.git.commit': 'b87e479665afa323368385ae10508a13af4771f6',
  'azureml.git.dirty': 'True'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': 'diabetes-training.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': ['--reg-rate', '0.01'],
  'sourceDirectoryData