# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
import os
import joblib
from pprint import pprint

from azureml.core import Workspace, Experiment, Model
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.run import Run
from azureml.widgets import RunDetails
from azureml.train.automl import AutoMLConfig
from azureml.data.dataset_factory import TabularDatasetFactory


## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.

This is a binary classification task that will predict death by heart failure.

These are the twelve features in the dataset:
- age (int): self explanatory
- anaemia (bool): whether there has been a decrease of red blood cells or hemoglobin
- creatinine_phosphokinase (int): level of the CPK enzyme in the blood in mcg/L
- diabetes (bool): whether the patient has diabetes
- ejection_fraction (int): percentage of blood leaving the heart at each contraction
- high_blood_pressure (bool): whether the patient has hypertension
- platelets (int): platelets in the blood in kiloplatelets/mL
- serum_creatinine (float): level of serum creatinine in the blood in mg/dL
- serum_sodium (int): level of serum sodium in the blood in mEq/L
- sex (int): female or male (binary)

In [2]:
# acquiring data
ds = TabularDatasetFactory.from_delimited_files("https://raw.githubusercontent.com/eparamasari/ML_Engineer_ND_Capstone/main/data/heart_failure_clinical_records_dataset.csv")

In [3]:
ws = Workspace.from_config()

# creating an experiment object
experiment_name = 'HeartFailureClassificationExp'

experiment = Experiment(ws, experiment_name)

In [4]:
# Checking an existing compute cluster or starting one
compute_name = "ComputeCluster"
try:
    compute_target = ComputeTarget(ws, compute_name)
    print(compute_name+ " already exist.")
except:
    compute_config = AmlCompute.provisioning_configuration(vm_size="Standard_D2_V2", min_nodes=1, max_nodes=5)
    compute_target = ComputeTarget.create(ws, compute_name, compute_config)
compute_target.wait_for_completion(show_output=True)

print(compute_target.get_status().serialize())

Creating
Succeeded........................
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
{'currentNodeCount': 1, 'targetNodeCount': 1, 'nodeStateCounts': {'preparingNodeCount': 1, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2021-02-12T07:29:54.987000+00:00', 'errors': None, 'creationTime': '2021-02-12T07:27:41.449347+00:00', 'modifiedTime': '2021-02-12T07:27:56.758676+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 1, 'maxNodeCount': 5, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_D2_V2'}


## AutoML Configuration

The automl settings and cofiguration chosen.

In [5]:
# setting up automl

automl_settings = {
    "experiment_timeout_minutes": 20,
    "max_concurrent_iterations": 5,
    "primary_metric": 'accuracy'
}

automl_config = AutoMLConfig(
    task='classification',
    compute_target=compute_target,
    training_data=ds,
    label_column_name='DEATH_EVENT',
    n_cross_validations=5,
    **automl_settings
)

In [6]:
# Submitting the experiment
remote_run = experiment.submit(automl_config)

Running on remote.


## Run Details

Using the `RunDetails` widget to show the different experiments.

In [7]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [8]:
remote_run.wait_for_completion(show_output=True)


Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

******************************************************************

        69                                                  0:00:05          nan    0.8630
ERROR: {
    "additional_properties": {},
    "error": {
        "additional_properties": {
            "debugInfo": null
        },
        "code": "UserError",
        "severity": null,
        "message": "Experiment timeout reached, please consider increasing your experiment timeout.",
        "message_format": "Experiment timeout reached, please consider increasing your experiment timeout.",
        "message_parameters": {},
        "reference_code": null,
        "details_uri": null,
        "target": null,
        "details": [],
        "inner_error": {
            "additional_properties": {},
            "code": "ResourceExhausted",
            "inner_error": {
                "additional_properties": {},
                "code": "Timeout",
                "inner_error": {
                    "additional_properties": {},
                    "code": "ExperimentTimeoutForIterations",
        

{'runId': 'AutoML_90926142-d8d8-4f9d-a593-3d5fff6a318c',
 'target': 'ComputeCluster',
 'status': 'Completed',
 'startTimeUtc': '2021-02-12T07:31:04.552432Z',
 'endTimeUtc': '2021-02-12T07:58:55.212453Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'ComputeCluster',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"0211f99c-7cf7-4da8-b8fb-2d5f36813395\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"isArchive\\\\\\": false, \\\\\\"path\\\\\\": {\\\\\\"target\\\\\\": 4, \\\\\\"resourceDetails\\\\\\": [{\\\\\\"path\\\\\\": \\\\\\"https://raw.githubusercontent.com/eparamasari/ML_Engineer_ND_Capstone/main/data/heart_failure_clinical_records_dataset.csv\\\\\\"}]}}, \\\\\\"localData\\\\\\": {}, \\\\\

## Best Model

In the cell below, we're getting the best model from the automl experiments and display all the properties of the model.



In [9]:
# Getting the best run and model
best_run, fitted_model = remote_run.get_output()

# Printing the best run
print(best_run)

# Getting all metrics of the best run
best_run_metrics = best_run.get_metrics()

# Printing all metrics of the best run
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)

Package:azureml-automl-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-core, training version:1.21.0.post1, current version:1.20.0
Package:azureml-dataprep, training version:2.8.2, current version:2.7.3
Package:azureml-dataprep-native, training version:28.0.0, current version:27.0.0
Package:azureml-dataprep-rslex, training version:1.6.0, current version:1.5.0
Package:azureml-dataset-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-defaults, training version:1.21.0, current version:1.20.0
Package:azureml-interpret, training version:1.21.0, current version:1.20.0
Package:azureml-pipeline-core, training version:1.21.0, current version:1.20.0
Package:azureml-telemetry, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-client, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-runtime, training version:1.21.0, current version:1.20.0


Run(Experiment: HeartFailureClassificationExp,
Id: AutoML_90926142-d8d8-4f9d-a593-3d5fff6a318c_70,
Type: azureml.scriptrun,
Status: Completed)
f1_score_micro 0.8763276836158193
f1_score_macro 0.8495459791619602
weighted_accuracy 0.8968878829357403
average_precision_score_weighted 0.9256618927723945
accuracy 0.8763276836158193
recall_score_macro 0.8456547619047619
AUC_weighted 0.9148786683277963
recall_score_weighted 0.8763276836158193
norm_macro_recall 0.6913095238095238
AUC_micro 0.9160299562705481
average_precision_score_micro 0.9175761251835782
log_loss 0.3712387980151897
matthews_correlation 0.7258331573961796
precision_score_macro 0.8837545144509594
balanced_accuracy 0.8456547619047619
precision_score_micro 0.8763276836158193
recall_score_micro 0.8763276836158193
average_precision_score_macro 0.9032289354516564
precision_score_weighted 0.8934916290495668
f1_score_weighted 0.8720743353669373
AUC_macro 0.9148786683277963
confusion_matrix aml://artifactId/ExperimentRun/dcid.AutoML_90

In [10]:
# Saving the best model
best_automl_model = best_run.register_model(model_path='outputs/model.pkl', model_name='heart_failure_automl',
                        tags = {'Training type': 'Automated ML'},
                        properties = {'Accuracy': best_run_metrics['accuracy']})

print(best_automl_model)

Model(workspace=Workspace.create(name='quick-starts-ws-138666', subscription_id='1b944a9b-fdae-4f97-aeb1-b7eea0beac53', resource_group='aml-quickstarts-138666'), name=heart_failure_automl, id=heart_failure_automl:1, version=1, tags={'Training type': 'Automated ML'}, properties={'Accuracy': '0.8763276836158193'})


In [22]:
# Listing registered models to verify that the model has been saved
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

HyperDrive_HighAccuracy version: 1
	 Accuracy : 0.7555555555555555


heart_failure_automl version: 1
	 Training type : Automated ML
	 Accuracy : 0.8763276836158193




## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [12]:
# Downloading the environment file
best_run.download_file('outputs/conda_env_v_1_0_0.yml', 'envFile.yml')

# Downloading the scoring file 
best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'scoreScript.py')

In [14]:
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(entry_script='scoreScript.py',
                                   environment=best_run.get_environment())

# deploy
from azureml.core.webservice import AciWebservice

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
service = Model.deploy(ws, "myservice", [best_automl_model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)
print(service.state)

print(service.scoring_uri)

print(service.swagger_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running...............
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
http://f3f19266-2c36-4fa3-b682-b0a253f3537e.southcentralus.azurecontainer.io/score
http://f3f19266-2c36-4fa3-b682-b0a253f3537e.southcentralus.azurecontainer.io/swagger.json


In [18]:
import json

#Importing test data
data_df = ds.to_pandas_dataframe().dropna()
test_df = data_df.sample(5) # data_df is the pandas dataframe of the original data
label_df = test_df.pop('DEATH_EVENT')

test_sample = json.dumps({'data': test_df.to_dict(orient='records')})

print(test_sample)

{"data": [{"age": 70.0, "anaemia": 0, "creatinine_phosphokinase": 161, "diabetes": 0, "ejection_fraction": 25, "high_blood_pressure": 0, "platelets": 244000.0, "serum_creatinine": 1.2, "serum_sodium": 142, "sex": 0, "smoking": 0, "time": 66}, {"age": 85.0, "anaemia": 0, "creatinine_phosphokinase": 212, "diabetes": 0, "ejection_fraction": 38, "high_blood_pressure": 0, "platelets": 186000.0, "serum_creatinine": 0.9, "serum_sodium": 136, "sex": 1, "smoking": 0, "time": 187}, {"age": 78.0, "anaemia": 1, "creatinine_phosphokinase": 64, "diabetes": 0, "ejection_fraction": 40, "high_blood_pressure": 0, "platelets": 277000.0, "serum_creatinine": 0.7, "serum_sodium": 137, "sex": 1, "smoking": 1, "time": 187}, {"age": 63.0, "anaemia": 1, "creatinine_phosphokinase": 61, "diabetes": 1, "ejection_fraction": 40, "high_blood_pressure": 0, "platelets": 221000.0, "serum_creatinine": 1.1, "serum_sodium": 140, "sex": 0, "smoking": 0, "time": 86}, {"age": 67.0, "anaemia": 0, "creatinine_phosphokinase": 21

In the cell below, we are sending a request to the web service deployed to test it.

In [19]:
import requests # for http post request

# Set the content type
headers = {'Content-type': 'application/json'}

response = requests.post(service.scoring_uri, test_sample, headers=headers)

In [20]:
# Printing results from the inference
print(response.text)

"{\"result\": [1, 0, 0, 0, 0]}"


In [21]:
# Printing ground truth labels
print(label_df)

68     1
207    0
204    0
99     0
272    0
Name: DEATH_EVENT, dtype: int64


In the cell below, we are printing the logs of the web service and deleting the service

In [23]:
print(service.get_logs())

2021-02-12T09:52:42,762385200+00:00 - rsyslog/run 
2021-02-12T09:52:42,769884100+00:00 - iot-server/run 
2021-02-12T09:52:42,781209500+00:00 - gunicorn/run 
2021-02-12T09:52:42,804964600+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
rsyslogd

In [24]:
service.delete()