#  Azure Machine Learning Deployment

 most used features of the Azure Machine Learning service.  In it, you will create, register and deploy a model. This tutorial will help you become familiar with the core concepts of Azure Machine Learning and their most common usage. 

You'll learn how to run a training job on a scalable compute resource, then deploy it, and finally test the deployment.

You'll create a training script to handle the data preparation, train and register a model. Once you train the model, you'll *deploy* it as an *endpoint*, then call the endpoint for *inferencing*.

The steps you'll take are:

> * Set up a handle to your Azure Machine Learning workspace
> * Create your training script
> * Create and run a command job that will run the training script on the compute cluster, configured with the appropriate job environment
> * View the output of your training script
> * Deploy the newly-trained model as an endpoint
> * Call the Azure Machine Learning endpoint for inferencing

## Create handle to workspace

Before we dive in the code, you need a way to reference your workspace. The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning.

You'll create `ml_client` for a handle to the workspace.  You'll then use `ml_client` to manage resources and jobs.

In [1]:
from azure.ai.ml import MLClient, Input
from azure.ai.ml.entities import (
    BatchEndpoint,
    ModelBatchDeployment,
    ModelBatchDeploymentSettings,
    Model,
    AmlCompute,
    Data,
    BatchRetrySettings,
    CodeConfiguration,
    Environment,
)
from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction
from azure.identity import DefaultAzureCredential

In [2]:
ml_client = MLClient(              #  subscription_id                 #resource_group_name   #workspace_name
    DefaultAzureCredential(), "cb5c6859-b468-4e7a-ac1c-e60f4ffe6dc8", "MLResouercegroup", "mlworkspacedepi"
)

In [3]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# authenticate
credential = DefaultAzureCredential()

SUBSCRIPTION = "cb5c6859-b468-4e7a-ac1c-e60f4ffe6dc8"
RESOURCE_GROUP = "MLResouercegroup"
WS_NAME = "mlworkspacedepi"
# Get a handle to the workspace
ml_client = MLClient(
    credential=credential,
    subscription_id=SUBSCRIPTION,
    resource_group_name=RESOURCE_GROUP,
    workspace_name=WS_NAME,
)

## Create training and registering the model as MLFlow model script

Let's start by creating the training script - the *main.py* Python file.

First create a source folder for the script:

In [4]:
import os

train_src_dir = "./src"
os.makedirs(train_src_dir, exist_ok=True)

In [5]:
%%writefile {train_src_dir}/main2.py
import os
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer

def main():
    """Main function of the script."""
   
    # Define parameters
    data_path = "https://mlworkspacedep6198697815.blob.core.windows.net/data/preprocessed_data.csv?sp=rw&st=2024-11-06T13:15:23Z&se=2024-11-30T21:15:23Z&spr=https&sv=2022-11-02&sr=c&sig=YluurvwULdmSEsnbR4fahYRVewWDt7VRulpn7KIHV7k%3D"
    test_train_ratio = 0.2
    max_iter = 600
    registered_model_name = "Depi_Log_Reg_model"

    # Start Logging
    mlflow.start_run()

    # Enable autologging
    mlflow.sklearn.autolog()

    #######################
    #<prepare the data>
    #######################
    print(f"data={data_path} test_train_ratio={test_train_ratio} max_iter={max_iter} registered_model_name={registered_model_name}")
    
    # Load data
    data_df = pd.read_csv(data_path)
    
    # Assuming the dataset has columns 'clean_text' for the review text and 'sentiment' as the target label
    text_data = data_df['clean_text']
    labels = data_df['Sentiment']

    mlflow.log_metric("num_samples", data_df.shape[0])
    mlflow.log_metric("num_features", 1)  # text feature only

    # Split data
    X_train, X_test, y_train, y_test = train_test_split(
        text_data,
        labels,
        test_size=test_train_ratio,
    )

    # Convert text data to TF-IDF features
    vectorizer = TfidfVectorizer()
    X_train_tfidf = vectorizer.fit_transform(X_train)
    X_test_tfidf = vectorizer.transform(X_test)
    ####################
    #</prepare the data>
    ####################

    ##################
    #<train the model>
    ##################
    print(f"Training with data of shape {X_train_tfidf.shape}")

    clf = LogisticRegression(max_iter=max_iter)
    clf.fit(X_train_tfidf, y_train)

    y_pred = clf.predict(X_test_tfidf)

    print(classification_report(y_test, y_pred))
    ###################
    #</train the model>
    ###################

    ##########################
    #<save and register model>
    ##########################
    # Registering the model to the workspace as mlflow model
    print("Registering the model via MLFlow")
    mlflow.sklearn.log_model(
        sk_model=clf,
        registered_model_name=registered_model_name,
        artifact_path=registered_model_name,
    )

    # Saving the model and vectorizer to files
    model_dir = os.path.join(registered_model_name, "trained_model")
    os.makedirs(model_dir, exist_ok=True)
    mlflow.sklearn.save_model(
        sk_model=clf,
        path=model_dir,
    )
    ###########################
    #</save and register model>
    ###########################
    
    # Stop Logging
    mlflow.end_run()

if __name__ == "__main__":
    main()


Writing ./src/main2.py


## Configure the command for ML Job

In [8]:
from azure.ai.ml import command

# Create the Azure ML job (no parameters passed)
job = command(
    code="./src/",  # The folder containing your main1.py script
    command="python main2.py",
    environment="azureml://registries/azureml/environments/sklearn-1.5/labels/latest",
    display_name="sentiment_analysis_logistic_regression6",
)

# Submit the job
ml_client.create_or_update(job)

Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
[32mUploading src (0.01 MBs): 100%|██

Experiment,Name,Type,Status,Details Page
ma29805271600256,kind_calypso_qc6tzzrl5s,command,Starting,Link to Azure Machine Learning studio


In [10]:
# List all registered models
models = ml_client.models.list()

# Print the details of each model
for model in models:
    print(f"Model name: {model.name}")
    print(f"Model version: {model.version}")
    print(f"Model description: {model.description}")
    print("-" * 30)

Model name: azureml_affable_lock_57c6whpxhr_output_mlflow_log_model_1301799385
Model version: None
Model description: None
------------------------------
Model name: azureml_affable_lock_57c6whpxhr_output_mlflow_log_model_1702644655
Model version: None
Model description: None
------------------------------
Model name: credit_defaults_model
Model version: None
Model description: None
------------------------------
Model name: logistic_regression_model
Model version: None
Model description: None
------------------------------
Model name: tfidf_vectorizer
Model version: None
Model description: None
------------------------------
Model name: DEPI_Logistic_Regression_model
Model version: None
Model description: None
------------------------------
Model name: DEPI_tfidf_vectorizer
Model version: None
Model description: None
------------------------------
Model name: azureml_kind_calypso_qc6tzzrl5s_output_mlflow_log_model_2056784825
Model version: None
Model description: None
----------------

In [11]:
model = ml_client.models.get(name="Depi_Log_Reg_model", label="latest")


## Deploy the model as an batch endpoint

In [12]:
from azure.ai.ml import MLClient
from azure.ai.ml.entities import BatchEndpoint, BatchDeployment

# Create a Managed Online Endpoint
endpoint = BatchEndpoint(
    name="depi-cloud-sentiment-endpoint",
    description="Cloud Endpoint for sentiment analysis using logistic regression",
)

# Create or update the endpoint
ml_client.batch_endpoints.begin_create_or_update(endpoint).wait()


In [13]:
endpoint.name

'depi-cloud-sentiment-endpoint'

In [14]:
compute_name = "batch-cluster"
if not any(filter(lambda m: m.name == compute_name, ml_client.compute.list())):
    compute_cluster = AmlCompute(
        name=compute_name, description="amlcompute", min_instances=0, max_instances=5
    )
    ml_client.begin_create_or_update(compute_cluster).result()

In [15]:
deployment = ModelBatchDeployment(
    name="classifier-LogReg",
    description="A Sentiment classifier based on Logistic Regression",
    endpoint_name=endpoint.name,
    model=model,
    compute=compute_name,
    settings=ModelBatchDeploymentSettings(
        instance_count=2,
        max_concurrency_per_instance=2,
        mini_batch_size=10,
        output_action=BatchDeploymentOutputAction.APPEND_ROW,
        output_file_name="predictions.csv",
        retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
        logging_level="info",
    ),
)

Class ModelBatchDeploymentSettings: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ModelBatchDeployment: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


In [16]:
ml_client.batch_deployments.begin_create_or_update(deployment).result()

BatchDeployment({'provisioning_state': 'Succeeded', 'endpoint_name': 'depi-cloud-sentiment-endpoint', 'type': None, 'name': 'classifier-logreg', 'description': 'A Sentiment classifier based on Logistic Regression', 'tags': {}, 'properties': {}, 'print_as_yaml': False, 'id': '/subscriptions/cb5c6859-b468-4e7a-ac1c-e60f4ffe6dc8/resourceGroups/MLResouercegroup/providers/Microsoft.MachineLearningServices/workspaces/mlworkspacedepi/batchEndpoints/depi-cloud-sentiment-endpoint/deployments/classifier-logreg', 'Resource__source_path': '', 'base_path': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/ma298052716002561/code/Users/ma29805271600256', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x7fe9e86388b0>, 'serialize': <msrest.serialization.Serializer object at 0x7fe9e8638d00>, 'model': '/subscriptions/cb5c6859-b468-4e7a-ac1c-e60f4ffe6dc8/resourceGroups/MLResouercegroup/providers/Microsoft.MachineLearningServices/workspaces/mlworkspacedepi/models/Depi_Log_Reg_mo

In [17]:
endpoint = ml_client.batch_endpoints.get(endpoint.name)
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

BatchEndpoint({'scoring_uri': 'https://depi-cloud-sentiment-endpoint.eastus2.inference.ml.azure.com/jobs', 'openapi_uri': None, 'provisioning_state': 'Succeeded', 'name': 'depi-cloud-sentiment-endpoint', 'description': 'Cloud Endpoint for sentiment analysis using logistic regression', 'tags': {}, 'properties': {'BatchEndpointCreationApiVersion': '2023-10-01', 'azureml.onlineendpointid': '/subscriptions/cb5c6859-b468-4e7a-ac1c-e60f4ffe6dc8/resourceGroups/MLResouercegroup/providers/Microsoft.MachineLearningServices/workspaces/mlworkspacedepi/batchEndpoints/depi-cloud-sentiment-endpoint'}, 'print_as_yaml': False, 'id': '/subscriptions/cb5c6859-b468-4e7a-ac1c-e60f4ffe6dc8/resourceGroups/MLResouercegroup/providers/Microsoft.MachineLearningServices/workspaces/mlworkspacedepi/batchEndpoints/depi-cloud-sentiment-endpoint', 'Resource__source_path': '', 'base_path': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/ma298052716002561/code/Users/ma29805271600256', 'creation_context': None, 'seriali

In [18]:
print(f"The default deployment is {endpoint.defaults.deployment_name}")

The default deployment is classifier-logreg


In [21]:
data_path = "data"
dataset_name = "sentiment-dataset-unlabeledv2"

sentiment_dataset_unlabeled = Data(
    path=data_path,
    type=AssetTypes.URI_FOLDER,
    description="An unlabeled dataset for sentiment classification",
    name=dataset_name,
)

In [22]:
ml_client.data.create_or_update(sentiment_dataset_unlabeled)

[32mUploading data (0.04 MBs):   0%|          | 0/43170 [00:00<?, ?it/s][32mUploading data (0.04 MBs): 100%|██████████| 43170/43170 [00:00<00:00, 862882.39it/s]
[39m



Data({'path': 'azureml://subscriptions/cb5c6859-b468-4e7a-ac1c-e60f4ffe6dc8/resourcegroups/MLResouercegroup/workspaces/mlworkspacedepi/datastores/workspaceblobstore/paths/LocalUpload/3b82557f2e3420ef17fa20430d3d5829/data/', 'skip_validation': False, 'mltable_schema_url': None, 'referenced_uris': None, 'type': 'uri_folder', 'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': 'sentiment-dataset-unlabeledv2', 'description': 'An unlabeled dataset for sentiment classification', 'tags': {}, 'properties': {}, 'print_as_yaml': False, 'id': '/subscriptions/cb5c6859-b468-4e7a-ac1c-e60f4ffe6dc8/resourceGroups/MLResouercegroup/providers/Microsoft.MachineLearningServices/workspaces/mlworkspacedepi/data/sentiment-dataset-unlabeledv2/versions/1', 'Resource__source_path': '', 'base_path': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/ma298052716002561/code/Users/ma29805271600256', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x

In [25]:
datasets = ml_client.data.list()

# Print the details of each dataset
for dataset in datasets:
    print(f"Dataset name: {dataset.name}")
    print(f"Dataset version: {dataset.version}")
    print(f"Dataset description: {dataset.description}")
    print("-" * 30)

Dataset name: dataset
Dataset version: None
Dataset description: None
------------------------------
Dataset name: 12bb6350-ac26-4b0d-982f-dcdfda9615f5
Dataset version: None
Dataset description: None
------------------------------
Dataset name: data
Dataset version: None
Dataset description: None
------------------------------
Dataset name: reviewdata
Dataset version: None
Dataset description: None
------------------------------
Dataset name: f144f1b9-764a-4c77-b7a9-b3e7cde10472
Dataset version: None
Dataset description: None
------------------------------
Dataset name: 1e547ccf-3e65-4f1f-8ee3-bed1dd6497dd
Dataset version: None
Dataset description: None
------------------------------
Dataset name: 3bf173b2-a6ab-4467-bcd5-689bdb3a09b3
Dataset version: None
Dataset description: None
------------------------------
Dataset name: sentiment-dataset-unlabeled
Dataset version: None
Dataset description: None
------------------------------
Dataset name: sentiment-dataset-unlabeledv2
Dataset vers

In [30]:
dataset = ml_client.data.get(name='SentimentDataset', version=None,label="latest")

In [31]:
input = Input(type=AssetTypes.URI_FOLDER, path=dataset.id)

In [32]:
job = ml_client.batch_endpoints.invoke(endpoint_name=endpoint.name, input=input)

In [33]:
ml_client.jobs.get(job.name)


Experiment,Name,Type,Status,Details Page
depi-cloud-sentiment-endpoint,batchjob-aaa3076d-a5bf-4824-bde1-360cd0008412,pipeline,Running,Link to Azure Machine Learning studio
