# Tutorial: Model development on a compute instance

This notebook contains just the code cells used in [Tutorial: Model development on a cloud workstation](https://learn.microsoft.com/azure/machine-learning/tutorial-cloud-workstation).  See the article for more details.

In [1]:
import os
import argparse
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

[Learn more about this data on the UCI Machine Learning Repository.](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients)

## Create handle to workspace

Before we dive in the code, you need a way to reference your workspace. You'll create `ml_client` for a handle to the workspace.  You'll then use `ml_client` to manage resources and jobs.

In the next cell, enter your Subscription ID, Resource Group name and Workspace name. To find these values:

1. In the upper right Azure Machine Learning studio toolbar, select your workspace name.
1. Copy the value for workspace, resource group and subscription ID into the code.
1. You'll need to copy one value, close the area and paste, then come back for the next one.

In [2]:
pip install --upgrade mlflow azureml-mlflow

Note: you may need to restart the kernel to use updated packages.


In [3]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential, AzureCliCredential, ManagedIdentityCredential
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes
from dotenv import dotenv_values
import requests 

config = dotenv_values(".env") 
subscription_id =config['SUBSCRIPTION_ID']
resource_group = config['RESOURCE_GP']
workspace = config['WORKSPACE']

os.environ["AZURE_CLIENT_ID"] = config["AZURE_CLIENT_ID"]
os.environ["AZURE_CLIENT_SECRET"] = config["AZURE_CLIENT_SECRET"]
os.environ["AZURE_TENANT_ID"] = config["AZURE_TENANT_ID"]

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()
# Check if given credential can get token successfully.
credential.get_token("https://management.azure.com/.default")

# Get a handle to the workspace
ml_client = MLClient(
    credential=credential,
    subscription_id=subscription_id,
    resource_group_name=resource_group,
    workspace_name=workspace,
)

In [4]:
data_asset = ml_client.data.get(name="credit-card", version="2")
credit_df = pd.read_parquet(data_asset.path)

train_df, test_df = train_test_split(
    credit_df,
    test_size=0.25,
)

If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.


# Carry out data processing, EDA, modeling  

In [5]:
# Extracting the label column
y_train = train_df.pop("default")

# convert the dataframe values to array
X_train = train_df.values

# Extracting the label column

y_test = test_df.pop("default")

# convert the dataframe values to array
X_test = test_df.values

In [6]:
# set name for logging
mlflow.set_experiment("Develop on cloud tutorial")
# enable autologging with MLflow
mlflow.sklearn.autolog()
mlflow.get_tracking_uri()

'azureml://eastus.api.azureml.ms/mlflow/v1.0/subscriptions/9017d57d-c4df-480d-b92d-7aea2266b0f0/resourceGroups/azure-mlops-demo/providers/Microsoft.MachineLearningServices/workspaces/demo-ws'

In [24]:
# mlflow.end_run()

RestException: BAD_REQUEST: Response: {'Error': {'Code': 'UserError', 'Severity': None, 'Message': 'Cannot set run to multiple terminal states (e.g. cannot set to FAILED after COMPLETED). ID: 6c907c23-f5b6-48a2-a1ea-e2180d4031c6', 'MessageFormat': None, 'MessageParameters': None, 'ReferenceCode': None, 'DetailsUri': None, 'Target': None, 'Details': [], 'InnerError': None, 'DebugInfo': None, 'AdditionalInfo': None}, 'Correlation': {'operation': '7d8fc476642f78e6259dcc07433e40c9', 'request': '40f9733568f97973'}, 'Environment': 'eastus', 'Location': 'eastus', 'Time': '2023-06-15T20:37:55.6071383+00:00', 'ComponentName': 'mlflow', 'error_code': 'BAD_REQUEST'}

In [7]:
# Train Gradient Boosting Classifier
print(f"Training with data of shape {X_train.shape}")

with mlflow.start_run() as gb_run:
    clf = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
    clf.fit(X_train, y_train)

    y_pred = clf.predict(X_test)

    print(classification_report(y_test, y_pred))
# Stop logging for this model
mlflow.end_run()

Training with data of shape (22500, 23)
              precision    recall  f1-score   support

           0       0.84      0.95      0.89      5817
           1       0.68      0.38      0.49      1683

    accuracy                           0.82      7500
   macro avg       0.76      0.66      0.69      7500
weighted avg       0.81      0.82      0.80      7500





In [8]:
# Train  AdaBoost Classifier
from sklearn.ensemble import AdaBoostClassifier

print(f"Training with data of shape {X_train.shape}")

with mlflow.start_run() as ada_run:
    ada = AdaBoostClassifier()

    ada.fit(X_train, y_train)

    y_pred = ada.predict(X_test)

    print(classification_report(y_test, y_pred))

# Stop logging for this model
mlflow.end_run()

Training with data of shape (22500, 23)
              precision    recall  f1-score   support

           0       0.83      0.96      0.89      5817
           1       0.70      0.33      0.45      1683

    accuracy                           0.82      7500
   macro avg       0.76      0.64      0.67      7500
weighted avg       0.80      0.82      0.79      7500





In [9]:
from mlflow.tracking import MlflowClient
# autolog_run = mlflow.active_run()
client = MlflowClient()
run_info = client.get_run(run_id = ada_run.info.run_id)
run_info

<Run: data=<RunData: metrics={'training_accuracy_score': 0.8184444444444444,
 'training_f1_score': 0.7915902532810785,
 'training_log_loss': 0.6803693858364175,
 'training_precision_score': 0.8014140997509144,
 'training_recall_score': 0.8184444444444444,
 'training_roc_auc': 0.7843666637820831,
 'training_score': 0.8184444444444444}, params={'algorithm': 'SAMME.R',
 'base_estimator': 'deprecated',
 'estimator': 'None',
 'learning_rate': '1.0',
 'n_estimators': '50',
 'random_state': 'None'}, tags={'estimator_class': 'sklearn.ensemble._weight_boosting.AdaBoostClassifier',
 'estimator_name': 'AdaBoostClassifier',
 'mlflow.rootRunId': '3b99587c-df97-42ec-addb-8bc5d7ef5971',
 'mlflow.runName': 'affable_bulb_47gglnjj',
 'mlflow.user': '6088b4d8-928f-4743-b30f-30e4c4a860f8'}>, info=<RunInfo: artifact_uri='azureml://eastus.api.azureml.ms/mlflow/v2.0/subscriptions/9017d57d-c4df-480d-b92d-7aea2266b0f0/resourceGroups/azure-mlops-demo/providers/Microsoft.MachineLearningServices/workspaces/demo-w

In [10]:
model_name =  "credit-default-model-adaboost"
mlflow.sklearn.log_model(ada, model_name)
run_id = mlflow.active_run().info.run_id
model_uri = f"runs:/{run_id}/{model_name}"
# register model with tags
tags = {
    "run_id": run_id,
}
tags.update(**run_info.data.metrics)
mlflow_model = mlflow.register_model(model_uri, model_name, tags=tags)

Registered model 'credit-default-model-adaboost' already exists. Creating a new version of this model...
2023/06/16 11:08:02 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: credit-default-model-adaboost, version 5
Created version '5' of model 'credit-default-model-adaboost'.
