# Run a training script with the Python SDK

You can use the Python SDK for Azure Machine Learning to submit scripts as jobs. By using jobs, you can easily keep track of the input parameters and outputs when training a machine learning model.

## Before you start

You'll need the latest version of the **azure-ai-ml** package to run the code in this notebook. Run the cell below to verify that it is installed.

> **Note**:
> If the **azure-ai-ml** package is not installed, run `pip install azure-ai-ml` to install it.

In [None]:
# pip install azure-ai-ml

In [None]:
pip show azure-ai-ml

## Connect to your workspace

With the required SDK packages installed, now you're ready to connect to your workspace.

To connect to a workspace, we need identifier parameters - a subscription ID, resource group name, and workspace name. Since you're working with a compute instance, managed by Azure Machine Learning, you can use the default values to connect to the workspace.

In [None]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

In [None]:
# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

Select Show for the Key field under key1.
Copy the value of the Key field to a notepad. You'll need to paste this value into the notebook later.
Copy the name of your storage account from the top of the page. The name should start with mlwdp100storage...
#access key, key1

In [None]:
import os
print("Current working directory:", os.getcwd())

In [None]:
import os
os.chdir('/home/azureuser/cloudfiles/code/Users/Mohamed.Outahajala/AzurePortal')
print("Working directory set to:", os.getcwd())

## Use the Python SDK to train a model

To train a model, you'll first create the **diabetes_training.py** script in the **src** folder. The script uses the **diabetes.csv** file in the same folder as the training data.

In [None]:
# make sure diabetes.csv is src folder
import pandas as pd
diabetes = pd.read_csv('./src/diabetes.csv')
print(diabetes.head())

In [None]:
%%writefile src/diabetes-training.py
# import libraries
import os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, roc_curve
import joblib

# load the diabetes dataset
print("Loading Data...")
diabetes = pd.read_csv('./diabetes.csv')

# separate features and labels
X, y = diabetes[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness','SerumInsulin','BMI','DiabetesPedigree','Age']].values, diabetes['Diabetic'].values

# split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# set regularization hyperparameter
reg = 0.01

# train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# save the model
output_path = './outputs/model.pkl'
os.makedirs('./outputs', exist_ok=True)  # Ensure the outputs directory exists
joblib.dump(model, output_path)
print(f"Model saved to {output_path}")

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)

# calculate AUC
y_scores = model.predict_proba(X_test)
auc = roc_auc_score(y_test, y_scores[:,1])
print('AUC: ' + str(auc))

Run the cell below to submit the job that trains a classification model to predict diabetes. 

In [None]:
from azure.ai.ml import command

# configure job
job = command(
    code="./src",
    command="python diabetes-training.py",
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="aml-cluster-dev",
    display_name="diabetes-pythonv2-train_regression-job",
    experiment_name="diabetes-training-regression-experiment",
)

# submit job
returned_job = ml_client.create_or_update(job)
aml_url = returned_job.studio_url
print("Monitor your job at", aml_url)

In [None]:
print(returned_job.name + " job submitted successfully.")

In [None]:
#check current directory
import os
print("Current working directory:", os.getcwd())

In [None]:
# Step 1
# # Register the model from the job outputs
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

# Get the job name
job_name = returned_job.name

model = Model(
    path=f"azureml://jobs/{job_name}/outputs/artifacts/outputs/model.pkl",
    name="diabetes-logistic-regression",
    description="Logistic regression model for diabetes prediction",
    type=AssetTypes.CUSTOM_MODEL,
    tags={"accuracy": "0.774", "AUC": "0.848"}
)

registered_model = ml_client.models.create_or_update(model)
print(f"✅ Model registered: {registered_model.name} (version: {registered_model.version})")

In [None]:
# Get the registered model by name
model_name = "diabetes-logistic-regression"
registered_model = ml_client.models.get(name=model_name, version=1)

# Print the model ID
print(f"Registered Model ID: {registered_model.id}")

Step 2 : Create a scoring script
Create a scoring script that will handle inference requests:

In [None]:
%%writefile src/score.py
import json
import joblib
import numpy as np
import os

def init():
    """
    This function is called when the container is initialized/started.
    """
    global model
    # Get the path to the registered model file and load it
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'model.pkl')
    model = joblib.load(model_path)
    print("Model loaded successfully")

def run(raw_data):
    """
    This function is called for every invocation of the endpoint.
    """
    try:
        data = json.loads(raw_data)['data']
        data = np.array(data)
        result = model.predict(data)
        # Return predictions as JSON
        return json.dumps({"result": result.tolist()})
    except Exception as e:
        error = str(e)
        return json.dumps({"error": error})

Step 3: Create Environment Configuration
Create a conda dependencies file

In [None]:
%%writefile src/conda.yml
name: model-env
channels:
  - conda-forge
dependencies:
  - python=3.8
  - numpy
  - pip
  - scikit-learn
  - scipy
  - pip:
      - azureml-defaults
      - inference-schema[numpy-support]
      - joblib

Step 4: Create a Managed Online Endpoint

In [None]:
from azure.ai.ml.entities import ManagedOnlineEndpoint
from datetime import datetime

# Create a unique endpoint name
endpoint_name = f"diabetes-endpoint-{datetime.now().strftime('%m%d%H%M%f')}"

# Define the endpoint
endpoint = ManagedOnlineEndpoint(
    name=endpoint_name,
    description="Endpoint for diabetes prediction",
    auth_mode="key"  # or "aml_token" for Azure AD authentication
)

# Create the endpoint
ml_client.online_endpoints.begin_create_or_update(endpoint).result()
print(f"✅ Endpoint created: {endpoint_name}")

In [None]:
environments = ml_client.environments.list()
for env in environments:
    print(f"Name: {env.name}, Version: {env.version}")

In [None]:
environment = "AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest"

In [None]:
#Step 5 Create a Deployment
from azure.ai.ml.entities import ManagedOnlineDeployment, CodeConfiguration

deployment = ManagedOnlineDeployment(
    name="blue",  # Deployment name 
    endpoint_name=endpoint_name, # Endpoint name diabetes-endpoint-11051251184959
    model=registered_model.id,
    code_configuration=CodeConfiguration(
        code="./src",
        scoring_script="score.py"
    ),
    environment=environment,
    instance_type="Standard_DS2_v2",
    instance_count=1
)

# Create the deployment
ml_client.online_deployments.begin_create_or_update(deployment).result()
print(f"✅ Deployment created: blue")

# Set traffic to 100% for this deployment
endpoint.traffic = {"blue": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()
print("✅ Traffic set to 100% for blue deployment")

Step 6: Test the Endpoint

In [None]:
print(f"Endpoint URL: {ml_client.online_endpoints.get(name=endpoint_name).scoring_uri}")

In [None]:
#print endpoint name
print(f"Endpoint Name: {endpoint_name}")

In [None]:
import json
import urllib.request
# Define the correct endpoint name

# Retrieve the primary key
keys = ml_client.online_endpoints.get_keys(name=endpoint_name)
primary_key = keys.primary_key


# Request data goes here
# The example below assumes JSON formatting which may be updated
# depending on the format your endpoint expects.
# More information can be found here:
# https://docs.microsoft.com/azure/machine-learning/how-to-deploy-advanced-entry-script

data = {
    "data": [
        [5, 200, 80, 30, 100, 35.0, 0.500, 50],  # \returns 1 (diabetic)
        [1, 100, 60, 20, 50, 22.0, 0.200, 25]   # returns 0 (non-diabetic)
    ]
}

body = str.encode(json.dumps(data))

# Get the endpoint details
endpoint_details = ml_client.online_endpoints.get(name=endpoint_name)
url = endpoint_details.scoring_uri
# Replace this with the primary/secondary key, AMLToken, or Microsoft Entra ID token for the endpoint
api_key = primary_key
if not api_key:
    raise Exception("A key should be provided to invoke the endpoint")


headers = {'Content-Type':'application/json', 'Accept': 'application/json', 'Authorization':('Bearer '+ api_key)}

req = urllib.request.Request(url, body, headers)

try:
    response = urllib.request.urlopen(req)

    result = response.read()
    print(result)
except urllib.error.HTTPError as error:
    print("The request failed with status code: " + str(error.code))

    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(error.info())
    print(error.read().decode("utf8", 'ignore'))

Step 7: Get Endpoint Details and Use it Externally

In [None]:
#step 7
# Get the endpoint details
endpoint_details = ml_client.online_endpoints.get(name=endpoint_name)

# Get the scoring URI
scoring_uri = endpoint_details.scoring_uri
print(f"Scoring URI: {scoring_uri}")

# Get the authentication key - Store securely, do not print in notebook
keys = ml_client.online_endpoints.get_keys(name=endpoint_name)
primary_key = keys.primary_key
# NOTE: Never print or commit the primary_key. Use environment variables instead.

In [None]:
#step 8
import requests
import os

# Get credentials from environment variables (set these securely in Azure Key Vault)
primary_key = os.getenv('ENDPOINT_KEY')
scoring_uri = os.getenv('SCORING_URI')

if not primary_key or not scoring_uri:
    print("ERROR: Set ENDPOINT_KEY and SCORING_URI environment variables")
else:
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {primary_key}'
    }

    test_data = {
        "data": [
            [2, 180, 74, 24, 21, 23.9, 1.488, 22]
        ]
    }

    # Make the prediction request
    response = requests.post(scoring_uri, json=test_data, headers=headers)
    print(f"Status Code: {response.status_code}")
    print(f"Response: {response.json()}")