# Automated ML

Import all the dependencies that we will need to complete the project.

In [1]:
from azureml.core import Workspace, Experiment, Dataset
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.widgets import RunDetails
import pandas as pd
from azureml.data.datapath import DataPath
from azureml.train.automl import AutoMLConfig
import joblib 
import os

## Dataset

### Overview

The dataset that we will be using for this project is the [Heart Failure Prediction](https://www.kaggle.com/andrewmvd/heart-failure-clinical-data) dataset from Kaggle. 

Heart failure is a common event caused by CVDs and this dataset contains 12 features that can be used to predict mortality by heart failure.

People with cardiovascular disease or who are at high cardiovascular risk need early detection and management wherein a machine learning model can be of great help.

**12 clinical features:**

* age - Age

* anaemia - Decrease of red blood cells or hemoglobin (boolean)

* creatinine_phosphokinase - Level of the CPK enzyme in the blood (mcg/L)

* diabetes - If the patient has diabetes (boolean)

* ejection_fraction - Percentage of blood leaving the heart at each contraction (percentage)

* high_blood_pressure - If the patient has hypertension (boolean)
  
* platelets - Platelets in the blood (kiloplatelets/mL)

* serum_creatinine - Level of serum creatinine in the blood (mg/dL)

* serum_sodium - Level of serum sodium in the blood (mEq/L)
  
* sex - Woman or man (binary)
  
* smoking - If the patient smokes or not (boolean)

* time - Follow-up period (days)

In this project we will use Azure Automated ML to make prediction on the death event based on the above mentioned clinical features.


In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'new-experiment'

experiment=Experiment(ws, experiment_name)

In [3]:
print('Workspace name: '+ ws.name,
     'Azure region: '+ ws.location,
      'Subscription id: '+ ws.subscription_id,
     'Resource group: '+ ws.resource_group, sep="\n")

run = experiment.start_logging()

Workspace name: quick-starts-ws-138669
Azure region: southcentralus
Subscription id: a0a76bad-11a1-4a2d-9887-97a29122c8ed
Resource group: aml-quickstarts-138669


In [4]:
ds = Dataset.get_by_name(ws, 'heart-failure-dataset')

In [5]:
df = ds.to_pandas_dataframe()
df.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


In [6]:
train_data, test_data = ds.random_split(0.9)

## Create Compute Cluster

In [7]:
cpu_cluster_name = "compute-cluster"
#Verify that the cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace = ws, name = cpu_cluster_name)
    print("Found existing cluster. Use it")
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes =4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
    
cpu_cluster.wait_for_completion(show_output=True)
    

Creating
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## AutoML Configuration

Instantiate an AutoMLConfig object for AutoML Configuration. 

The parameters used here are:

* `n_cross_validation = 3` - Since our dataset is small. We apply cross validation with 3 folds instead of train/validation data split.


* `primary_metric = 'accuracy'` - The primary metric parameter determines the metric to be used during model training for optimization. Accuracy primary metric is chosen for binary classification dataset.


* `experiment_timeout_minutes = 30` - This defines how long, in minutes, our experiment should continue to run. Here this timeout is set to 30 minutes.


* `max_concurrent_iterations = 4` - To help manage child runs and when they can be performed, we match the number of maximum concurrent iterations of our experiment to the number of nodes in the cluster. So, we get a dedicated cluster per experiment.


* `task = 'classification'` - This specifies the experiment type as classification.


*  `compute_target = cpu_cluster` -  Azure Machine Learning Managed Compute is a managed service that enables the ability to train machine learning models on clusters of Azure virtual machines. Here compute target is set to cpu_cluster which is already defined with 'STANDARD_D2_V2' and maximum nodes equal to 4.


* `training_data = train_data` - This specifies the training data to be used in this experiment which is set to train_data which is a part of the dataset uploaded to the datastore.


* `label_column_name = 'DEATH_EVENT'` - The target column here is set to DEATH_EVENT which has values 1 if the patient deceased or 0 if the patient survived.


* `featurization= 'auto'` - This indicates that as part of preprocessing, data guardrails and featurization steps are performed automatically.


In [8]:
# Automl settings
automl_settings = {
    "n_cross_validations": 3,
    "primary_metric": 'accuracy',
    "experiment_timeout_minutes": 30,
    "max_concurrent_iterations": 4
}

# automl config here
automl_config = AutoMLConfig(task = 'classification',
                            compute_target = cpu_cluster,
                             training_data = train_data,
                             label_column_name = 'DEATH_EVENT',
                             featurization= 'auto',
                             **automl_settings
                            )

In [9]:
# Submit the experiment
remote_run = experiment.submit(automl_config)

Running on remote.


## Run Details

The `RunDetails` widget shows the different experiments.

In [10]:
RunDetails(remote_run).show()
remote_run.wait_for_completion(show_output=True)

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…


Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

******************************************************************

{'runId': 'AutoML_faab6e12-a632-4021-8373-7cfab7444c1b',
 'target': 'compute-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-02-12T08:21:37.50759Z',
 'endTimeUtc': '2021-02-12T09:03:14.302602Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '3',
  'target': 'compute-cluster',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"783b3010-5f13-4171-abe2-8d1cc3cfaf42\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"UI/02-12-2021_081708_UTC/heart_failure_clinical_records_dataset.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"aml-quickstarts-138669\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"a

## Best Model

The best model from the automl experiments and all the properties of the model.



In [11]:
best_automl_run, best_automl_model = remote_run.get_output()

Package:azureml-automl-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-core, training version:1.21.0.post1, current version:1.20.0
Package:azureml-dataprep, training version:2.8.2, current version:2.7.3
Package:azureml-dataprep-native, training version:28.0.0, current version:27.0.0
Package:azureml-dataprep-rslex, training version:1.6.0, current version:1.5.0
Package:azureml-dataset-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-defaults, training version:1.21.0, current version:1.20.0
Package:azureml-interpret, training version:1.21.0, current version:1.20.0
Package:azureml-pipeline-core, training version:1.21.0, current version:1.20.0
Package:azureml-telemetry, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-client, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-runtime, training version:1.21.0, current version:1.20.0


In [12]:
print(best_automl_run)

Run(Experiment: new-experiment,
Id: AutoML_faab6e12-a632-4021-8373-7cfab7444c1b_96,
Type: azureml.scriptrun,
Status: Completed)


In [13]:
print(best_automl_model)

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                                    max_samples=None,
                                                                                                    min_impurity_decrease=0.0,
                                                                                                    min_impurity_split=None,
                      

In [14]:
best_automl_run

Experiment,Id,Type,Status,Details Page,Docs Page
new-experiment,AutoML_faab6e12-a632-4021-8373-7cfab7444c1b_96,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [15]:
get_best_automl_metrics = best_automl_run.get_metrics()

for metric_name in get_best_automl_metrics:
    metric = get_best_automl_metrics[metric_name]
    print(metric_name, metric)

matthews_correlation 0.6876159828584966
AUC_weighted 0.917153186106022
precision_score_micro 0.8671550671550672
precision_score_weighted 0.8656087018140589
f1_score_weighted 0.8643095067861993
recall_score_micro 0.8671550671550672
recall_score_weighted 0.8671550671550672
AUC_macro 0.917153186106022
f1_score_micro 0.8671550671550672
recall_score_macro 0.8298296829769306
f1_score_macro 0.8414189399492545
accuracy 0.8671550671550672
balanced_accuracy 0.8298296829769306
average_precision_score_macro 0.9022903467197653
average_precision_score_micro 0.9267713377931074
norm_macro_recall 0.6596593659538611
average_precision_score_weighted 0.9196967887161375
log_loss 0.3670245325078148
AUC_micro 0.9272409894632117
precision_score_macro 0.8584611568986569
weighted_accuracy 0.8950327066347574
accuracy_table aml://artifactId/ExperimentRun/dcid.AutoML_faab6e12-a632-4021-8373-7cfab7444c1b_96/accuracy_table
confusion_matrix aml://artifactId/ExperimentRun/dcid.AutoML_faab6e12-a632-4021-8373-7cfab7444c

In [16]:
# Save the best model
model = best_automl_run.register_model(model_name = 'best_automl_model', model_path = 'outputs/model.pkl', 
                                       tags = {'Training context':'Auto ML'},
                                       properties={'Accuracy': get_best_automl_metrics['accuracy']})
print(model)

Model(workspace=Workspace.create(name='quick-starts-ws-138669', subscription_id='a0a76bad-11a1-4a2d-9887-97a29122c8ed', resource_group='aml-quickstarts-138669'), name=best_automl_model, id=best_automl_model:1, version=1, tags={'Training context': 'Auto ML'}, properties={'Accuracy': '0.8671550671550672'})


In [17]:
# List best models of HyperDrive Run and AutoML Run to compare the accuracy of the model

for model in Model.list(ws):
    print(model.name)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print('\t',tag_name,':',tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print('\t',prop_name,':',prop)
    print("\n")

NameError: name 'Model' is not defined

## Model Deployment

Register the model, create an inference config and deploy the model as a web service.

In [18]:
# Download scoring file
best_automl_run.download_file('outputs/scoring_file_v_1_0_0.py','score.py')

# Download environment file
best_automl_run.download_file('outputs/conda_env_v_1_0_0.yml', 'envFile.yml')

In [19]:
# Create an inference config

from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig


# env = Environment.get(ws, "AzureML-Minimal").clone(env_name)

# for pip_package in ["scikit-learn"]:
#     env.python.conda_dependencies.add_pip_package(pip_package)

inference_config = InferenceConfig(entry_script='score.py',
                                    environment=best_automl_run.get_environment())

In [20]:
# Deploy the model as a web service
from azureml.core.webservice import AciWebservice, Webservice
from azureml.core.model import Model

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
service = Model.deploy(ws, "aciservice", [model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running....................................................
Succeeded
ACI service creation operation finished, operation "Succeeded"


In [21]:
service.update(enable_app_insights = True)

In [24]:
print("State : "+service.state)
print("Swagger URI : "+service.swagger_uri)
print("Scoring URI : "+service.scoring_uri)

State : Healthy
Swagger URI : http://f4b51c89-2d5e-469f-9132-b8a5132ff160.southcentralus.azurecontainer.io/swagger.json
Scoring URI : http://f4b51c89-2d5e-469f-9132-b8a5132ff160.southcentralus.azurecontainer.io/score


Send a request to the web service that is deployed to test it.

In [39]:
import requests
import json

# Two sets of data to score, so we get two results back

td = test_data.to_pandas_dataframe()
sample_data = td.sample(2)
y_test = sample_data["DEATH_EVENT"]
sample_data.drop(['DEATH_EVENT'], inplace=True, axis=1)
x_test = sample_data
data = {"data":x_test.to_dict()}

# Convert to JSON string
input_data = json.dumps(data)
print(input_data)

{"data": {"age": {"3": 90.0, "1": 75.0}, "anaemia": {"3": 1, "1": 1}, "creatinine_phosphokinase": {"3": 60, "1": 81}, "diabetes": {"3": 1, "1": 0}, "ejection_fraction": {"3": 50, "1": 38}, "high_blood_pressure": {"3": 0, "1": 1}, "platelets": {"3": 226000.0, "1": 368000.0}, "serum_creatinine": {"3": 1.0, "1": 4.0}, "serum_sodium": {"3": 134, "1": 131}, "sex": {"3": 1, "1": 1}, "smoking": {"3": 0, "1": 1}, "time": {"3": 30, "1": 10}}}


In [40]:
# Set the content type
headers = {'Content-Type': 'application/json'}

# Make the request and display the response
resp = requests.post(service.scoring_uri, input_data, headers=headers)
print(resp.text)

"{\"result\": [1, 1]}"


In [41]:
# Print original labels
print(y_test)

3    1
1    1
Name: DEATH_EVENT, dtype: int64


Print the logs of the web service and delete the service

In [42]:
print(service.get_logs())

2021-02-12T09:12:40,103765300+00:00 - gunicorn/run 
2021-02-12T09:12:40,102932200+00:00 - iot-server/run 
2021-02-12T09:12:40,138524100+00:00 - rsyslog/run 
2021-02-12T09:12:40,164934300+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
rsyslogd

In [43]:
service.delete()
cpu_cluster.delete()