# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
import json #
import logging #
import os #
import pandas as pd #
import requests
import sys
import azureml.core #
from azureml.automl.core.onnx_convert import OnnxConvertConstants
from azureml.automl.runtime.onnx_convert import OnnxConverter
from azureml.automl.runtime.onnx_convert import OnnxInferenceHelper
from azureml.core.compute import ComputeTarget, AmlCompute #
from azureml.core.compute_target import ComputeTargetException #
from azureml.core.dataset import Dataset #
from azureml.core.environment import Environment
from azureml.core.experiment import Experiment #
from azureml.core.model import InferenceConfig
from azureml.core.model import Model
from azureml.core.run import Run
from azureml.core.webservice import AciWebservice
from azureml.core.webservice import Webservice
from azureml.core.workspace import Workspace #
from azureml.interpret import ExplanationClient #
from azureml.train.automl import AutoMLConfig #
from azureml.train.automl import constants
from azureml.widgets import RunDetails
from matplotlib import pyplot as plt #


## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.


TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

Title: Car Evaluation Database

Task: To create the hierarchical decision model

This model evaluates cars according to the following concept structure:

 CAR:car acceptability;

 buying:buying price;v-high, high, med, low
 
 maint:price of the maintenance;v-high, high, med, low
 
 doors:number of doors;2, 3, 4, 5-more
 
 persons:capacity in terms of persons to carry;2, 4, more
 
 lug_boot:the size of luggage boot;small, med, big
 
 safety:estimated safety of the car;low, med, high
  
Class Distribution;unacc,acc,good,v-good

In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'auto-ml'
experiment=Experiment(ws, experiment_name)

2023-08-02:01:21:35,750 INFO     [workspace.py:291] Found the config file in: /config.json
2023-08-02:01:21:36,446 INFO     [clientbase.py:192] Created a worker pool for first use


In [3]:
print(ws)

Workspace.create(name='quick-starts-ws-239593', subscription_id='b968fb36-f06a-4c76-a15f-afab68ae7667', resource_group='aml-quickstarts-239593')


In [4]:
# Create the cluster
cpu_cluster_name = "auto-ml"
# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print("Found existing cluster, use it.")
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(
        vm_size='STANDARD_D2_V2',
        max_nodes=4
    )
    compute_target = ComputeTarget.create(
        ws, cpu_cluster_name, compute_config)
compute_target.wait_for_completion(show_output=True)

InProgress..
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [5]:
# Create AML Dataset and register it into Workspace
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data')
columns = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']
data.columns = columns
data.head()

if not os.path.isdir("data"):
    os.mkdir("data")
# Save the data to a csv to be uploaded to the datastore
pd.DataFrame(data).to_csv("data/data.csv", index=False)
ds = ws.get_default_datastore()
key='car evaluation data set'
ds.upload(
    src_dir="./data",
    target_path=key,
    overwrite=True,
    show_progress=True
)

# Upload the data as a tabular dataset for access during training on remote compute
dataset = Dataset.Tabular.from_delimited_files(
    path=ds.path(key+"/data.csv")
)


2023-08-02:01:23:19,434 INFO     [datastore_client.py:991] <azureml.core.authentication.InteractiveLoginAuthentication object at 0x7fb8b4b20f70>
2023-08-02:01:23:20,72 INFO     [azure_storage_datastore.py:923] Called AzureBlobDatastore.upload
2023-08-02:01:23:20,86 INFO     [azure_storage_datastore.py:372] Uploading an estimated of 1 files
2023-08-02:01:23:20,338 INFO     [azure_storage_datastore.py:372] Uploading ./data/data.csv
2023-08-02:01:23:20,339 INFO     [azure_storage_datastore.py:372] Uploaded ./data/data.csv, 1 files out of an estimated total of 1
2023-08-02:01:23:20,340 INFO     [azure_storage_datastore.py:372] Uploaded 1 files
2023-08-02:01:23:20,340 INFO     [azure_storage_datastore.py:941] Finished AzureBlobDatastore.upload with count=1.


Uploading an estimated of 1 files
Uploading ./data/data.csv
Uploaded ./data/data.csv, 1 files out of an estimated total of 1
Uploaded 1 files


## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

Automl setting

 - experiment time out minutes; to avoid endless working

 - max concurrent iterations; between cost and performance

 - primary metric; AUC is a widely used metric to evaluate the performance of binary classifiers. The weighted version extends it to multi-class classification by considering the class imbalance when computing the average AUC across classes. 

Automl configuration

 - task = "classification"; To make classification model

 - label_column_name="class"; This column gives evalation result

 - enable_early_stopping= True; To save cost at score improvement small

 - featurization= 'auto'; Automated separation for training data and test data


In [6]:
# TODO: Put your automl settings here
automl_settings = {
    "experiment_timeout_minutes": 20,
    "enable_early_stopping": True,
    "iteration_timeout_minutes": 5,
    "max_concurrent_iterations": 4,
    "max_cores_per_iteration": -1,
    "primary_metric" : 'AUC_weighted',
    "featurization": "auto",
    "verbosity": logging.INFO,
    "enable_code_generation": True,
}

project_folder = './project'

# TODO: Put your automl config here
automl_config = AutoMLConfig(
    task = "classification",
    debug_log = "automl_errors.log",
    compute_target=compute_target,
    enable_onnx_compatible_models=True,
    training_data=dataset,
    label_column_name="class",
    path = project_folder,
    **automl_settings
)

In [7]:
# TODO: Submit your experiment

remote_run = experiment.submit(automl_config)


Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
auto-ml,AutoML_e7f39757-41ab-4806-9036-ceb666bd7771,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [8]:
from azureml.widgets import RunDetails

RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [9]:
#Wait for the AutoML run to complete
remote_run.wait_for_completion(show_output=True)

#"best_run" contains the run object(model, metrics, and run ID)
#"onnx_model" contains the actual trained model object with onnx format
best_run, onnx_mdl = remote_run.get_output(return_onnx_model=True)

#"best_run_metrics" contains a dictionary of the metrics associated with the best run.
best_run_metrics = best_run.get_metrics()

# Display all the properties of the best model
print(best_run.get_properties())


Experiment,Id,Type,Status,Details Page,Docs Page
auto-ml,AutoML_e7f39757-41ab-4806-9036-ceb666bd7771,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation




********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Cross validation
STATUS:       DONE
DESCRIPTION:  In order to accurately evaluate the model(s) trained by AutoML, we leverage a dataset that the model is not trained on. Hence, if the user doesn't provide an explicit validation dataset, a part of the training dataset is used to achieve this. For smaller datasets (fewer than 20,000 samples), cross-validation is leveraged, else a single hold-out set is split from the training data to serve as the validation dataset. Hence, for your input data we leverage cross-validation with 10 folds, if the number of training samples are fewer than 1000, and 3 folds in all other cases.
              Learn more about cross validation: https://aka.ms/AutomatedMLCrossValidation
DETAILS:      
+------------------------------+
|Number of folds               |
|3                             |
+------------------------------+

******

In [13]:
print(best_run.properties['model_name'])
print(best_run.id)

AutoMLe7f3975740
AutoML_e7f39757-41ab-4806-9036-ceb666bd7771_0


In [10]:
#TODO: Save the best model
# Save the best ONNX model

onnx_fl_path = "./automl_best_model.onnx"
OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)


## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [11]:
model_name = best_run.properties["model_name"]
script_file_name = "inference/score.py"
best_run.download_file("outputs/scoring_file_v_1_0_0.py", "inference/score.py")

description = "AutoML Model trained on car evaluation data to predict if a car is acceptable"
tags = None
model = remote_run.register_model(
    model_name=model_name, description=description, tags=tags
)

print(
    remote_run.model_id
)  # This will be written to the script file later in the notebook.

AutoMLe7f3975740


In [12]:
inference_config = InferenceConfig(
    environment=best_run.get_environment(), entry_script=script_file_name
)

aciconfig = AciWebservice.deploy_configuration(
    cpu_cores=2,
    memory_gb=2,
    tags={"area": "bmData", "type": "automl_classification"},
    description="sample service for Automl Classification",
)

aci_service_name = model_name.lower()
print(aci_service_name)
aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)
aci_service.wait_for_deployment(True)
print(aci_service.state)



automle7f3975740
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2023-08-02 02:25:56+00:00 Creating Container Registry if not exists..
2023-08-02 02:35:57+00:00 Registering the environment.
2023-08-02 02:35:57+00:00 Use the existing image.
2023-08-02 02:35:58+00:00 Submitting deployment to compute..
2023-08-02 02:36:04+00:00 Checking the status of deployment automle7f3975740..
2023-08-02 02:38:29+00:00 Checking the status of inference endpoint automle7f3975740.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


TODO: In the cell below, send a request to the web service you deployed to test it.

In [15]:
# endpoint url of Web service
scoring_uri = aci_service.scoring_uri

# create request data
input_data = {
    "data": [
        {"buying":"low",
         "maint":"low",
         "doors":"2",
         "persons":"2",
         "lug_boot":"small",
         "safety":"low"}
    ]}

input_json = json.dumps(input_data)

# Post request
headers = {'Content-Type': 'application/json'}
response = requests.post(scoring_uri, input_json, headers=headers)

# get response
predictions = response.json()

In [17]:
print(predictions)

{"result": ["unacc"]}


TODO: In the cell below, print the logs of the web service and delete the service

In [16]:
# Retrieve and print the logs
logs = aci_service.get_logs()
print(logs)


2023-08-02T02:38:13,326604400+00:00 - rsyslog/run 
2023-08-02T02:38:13,330469200+00:00 - gunicorn/run 
2023-08-02T02:38:13,334848400+00:00 | gunicorn/run | 
2023-08-02T02:38:13,340475200+00:00 | gunicorn/run | ###############################################
2023-08-02T02:38:13,343211000+00:00 | gunicorn/run | AzureML Container Runtime Information
2023-08-02T02:38:13,345343200+00:00 - nginx/run 
2023-08-02T02:38:13,347683600+00:00 | gunicorn/run | ###############################################
2023-08-02T02:38:13,351964900+00:00 | gunicorn/run | 
2023-08-02T02:38:13,359308100+00:00 | gunicorn/run | 
2023-08-02T02:38:13,370304300+00:00 | gunicorn/run | AzureML image information: openmpi4.1.0-ubuntu20.04, Materializaton Build:20230628.v2
2023-08-02T02:38:13,373733600+00:00 | gunicorn/run | 
2023-08-02T02:38:13,379488600+00:00 | gunicorn/run | 
2023-08-02T02:38:13,384296600+00:00 | gunicorn/run | PATH environment variable: /azureml-envs/azureml-automl/bin:/opt/miniconda/bin:/usr/local/sbi

In [21]:
import os
import shutil

logs_file_path = './aci_logs.txt'
with open(logs_file_path, 'w') as f:
    f.write(logs)


In [22]:
# Delete the web service
aci_service.delete()
compute_target.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
