# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
from azureml.core import Workspace, Experiment
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.train.automl import AutoMLConfig
import os
import joblib
from azureml.core.webservice import AciWebservice, Webservice
from azureml.core.model import Model
import requests
import json

## Dataset

### Overview
The Car Evaluation dataset has been taken from the UCI Machine Learning Repository. This dataset contains various structural and technical details about cars on the basis of which they are classified into different categories. 
<br>Some of those details are: 
<br>1. Cost of buying the car (low, med, high, vhigh)
<br>2. Maintenance of the car (low, med, high, vhigh)
<br>3. Number of doors present in the car (2,3,4,5more)
<br>4. Number of passangers the car can accomodate (2, 4, more)
<br>5. Luggage space in the car (small, med)
<br>5. Safety of the car (low, med, high)
<br>The target class has 4 possible values: unacc, acc, good, vgood (ranging the category of car from the worst to the best).
<br>The dataset can be accessed from the following link: https://archive.ics.uci.edu/ml/datasets/Car+Evaluation
<br><br>We will be performing Classification using AutoML on the above dataset. In this case, we wont be sticking to one single algorithm, but rather leave it up to the Azure AutoML to come up with the best performing model.

In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'Car_Evaluation_AutoML'

experiment=Experiment(ws, experiment_name)

In [3]:
from azureml.core.compute import ComputeTarget, AmlCompute

# TODO: Create compute cluster
# Use vm_size = "Standard_D2_V2" in your provisioning configuration.
# max_nodes should be no greater than 4.

try:
    cpu_cluster = ComputeTarget(workspace=ws, name="capstone-compute")
    print('Found existing cluster, use it.')
except:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws,"project-cluster", compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## AutoML Configuration

For the AutoML run, we use several settings like:
<br>1. Experiment Timeout: 30 minutes (since we need to complete this experiment in a limited timeframe).
<br>2. Primary Metric: Accuracy (the metric on the basis of which performance of the models is evaluated).
<br>3. Enable DNN: True (this allows the AutoML run to look for Deep Neural Network models).
<br>4. Enable Early Stopping: True (since we want the training to stop in case the performance of the models starts deteriorating).
<br>5. Enable Voting Ensemble & Enable Stack Ensemble: False (since we dont want AutoML to conduct an ensemble of all the models at the end of the run).

In [4]:
ds = TabularDatasetFactory.from_delimited_files(path="https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data",header=False)

In [5]:
# TODO: Put your automl settings here
automl_settings = {
    "experiment_timeout_minutes": 30,
    "primary_metric": 'accuracy',
    "enable_dnn": True,
    "enable_early_stopping": True,
    "enable_voting_ensemble": False,
    "enable_stack_ensemble": False
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(
    task="classification",
    compute_target=cpu_cluster,
    training_data=ds,
    label_column_name="Column7",
    **automl_settings
)

In [6]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config,show_output=False)

Running on remote.


## Run Details

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [21]:
from azureml.widgets import RunDetails
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [22]:
remote_run.wait_for_completion(show_output=True)



****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Cross validation
STATUS:       DONE
DESCRIPTION:  Each iteration of the trained model was validated through cross-validation.
              
DETAILS:      
+---------------------------------+
|Number of folds                  |
|3                                |
+---------------------------------+

****************************************************************************************************

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  To decrease model bias, please cancel the current run and fix balancing problem.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData
DETAILS:      Imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias towards one class.
+---------------------------------+-----------------------------

## Best Model

Now we extract the best performing model out of the AutoML run and print its properties.



In [23]:
best_automl_run, best_automl_model = remote_run.get_output()

print(best_automl_run)

print(best_automl_run.get_metrics()['accuracy'])

Run(Experiment: Car_Evaluation_AutoML,
Id: AutoML_865b9eff-b5a9-4328-9e4c-d7f5cb1c9293_0,
Type: azureml.scriptrun,
Status: Completed)
0.9901620370370371


In [24]:
best_automl_model.steps

[('datatransformer',
  DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                  feature_sweeping_config=None, feature_sweeping_timeout=None,
                  featurization_config=None, force_text_dnn=None,
                  is_cross_validation=None, is_onnx_compatible=None, logger=None,
                  observer=None, task=None, working_dir=None)),
 ('MaxAbsScaler', MaxAbsScaler(copy=True)),
 ('LightGBMClassifier',
  LightGBMClassifier(boosting_type='gbdt', class_weight=None,
                     colsample_bytree=1.0, importance_type='split',
                     learning_rate=0.1, max_depth=-1, min_child_samples=20,
                     min_child_weight=0.001, min_split_gain=0.0, n_estimators=100,
                     n_jobs=1, num_leaves=31, objective=None, random_state=None,
                     reg_alpha=0.0, reg_lambda=0.0, silent=True, subsample=1.0,
                     subsample_for_bin=200000, subsample_freq=0, verbose=-10))]

## Model Deployment

Since Azure AutoML generates a model which has accuracy higher than the HyperDrive-optimized model, we deploy this model.

In [26]:
model = best_automl_run.register_model("car-evaluation-model", model_path="outputs/model.pkl")

In [27]:

best_automl_run.download_file('outputs/scoring_file_v_1_0_0.py', 'automl_inf/score.py')
best_automl_run.download_file('outputs/model.pkl', 'automl_inf/model.pkl')

In [28]:
print(model)

Model(workspace=Workspace.create(name='quick-starts-ws-136623', subscription_id='3d1a56d2-7c81-4118-9790-f85d1acf0c77', resource_group='aml-quickstarts-136623'), name=car-evaluation-model, id=car-evaluation-model:1, version=1, tags={}, properties={})


In [29]:
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(entry_script='automl_inf/score.py')

In [30]:
service = Model.deploy(ws, "car-evaluation-service", [model], inference_config, overwrite=True)
service.wait_for_deployment(show_output = True)
print(service.state)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running.........................................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


TODO: In the cell below, send a request to the web service you deployed to test it.

In [49]:
%run endpoint.py

{"result": ["unacc", "acc"]}


In [47]:
test_data = {"data":
        [
          {
            "Column1": "vhigh",
            "Column2": "vhigh",
            "Column3": 2,
            "Column4": 2,
            "Column5": "small",
            "Column6": "low"
          },
          {
            "Column1": "high",
            "Column2": "med",
            "Column3": 4,
            "Column4": 4,
            "Column5": "med",
            "Column6": "med"
          }
      ]
    }

# Convert to JSON string
input_data = json.dumps(test_data)

In [50]:
service.run(input_data)

'{"result": ["unacc", "acc"]}'

TODO: In the cell below, print the logs of the web service and delete the service

In [34]:
service.get_logs()



In [43]:
service.update(enable_app_insights=True)

In [51]:
service.delete()