# Automated ML

Importing Dependencies. In the cell below, we import all the dependencies that will be needed to complete the project.

In [1]:
import argparse
import os
import joblib
from pprint import pprint

from azureml.core import Workspace, Experiment, Model
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.run import Run
from azureml.widgets import RunDetails
from azureml.train.automl import AutoMLConfig
from azureml.data.dataset_factory import TabularDatasetFactory


## Dataset

### Overview

In this notebook we will be running automated ml for a classification task.

The binary classification task will predict death by heart failure.

These are the twelve features in the dataset:
- age (int): self explanatory
- anaemia (bool): whether there has been a decrease of red blood cells or hemoglobin
- creatinine_phosphokinase (int): level of the CPK enzyme in the blood in mcg/L
- diabetes (bool): whether the patient has diabetes
- ejection_fraction (int): percentage of blood leaving the heart at each contraction
- high_blood_pressure (bool): whether the patient has hypertension
- platelets (int): platelets in the blood in kiloplatelets/mL
- serum_creatinine (float): level of serum creatinine in the blood in mg/dL
- serum_sodium (int): level of serum sodium in the blood in mEq/L
- sex (int): female or male (binary)

In [2]:
# acquiring data
ds = TabularDatasetFactory.from_delimited_files("https://raw.githubusercontent.com/eparamasari/ML_Engineer_ND_Capstone/main/data/heart_failure_clinical_records_dataset.csv")

In [3]:
ws = Workspace.from_config()

# creating an experiment object
experiment_name = 'HeartFailureClassification-AutoML'

experiment = Experiment(ws, experiment_name)

In [4]:
# Checking an existing compute cluster or starting one
compute_name = "AutoML-Compute"
try:
    compute_target = ComputeTarget(ws, compute_name)
    print(compute_name+ " already exist.")
except:
    compute_config = AmlCompute.provisioning_configuration(vm_size="Standard_DS3_V2", min_nodes=1, max_nodes=5)
    compute_target = ComputeTarget.create(ws, compute_name, compute_config)
compute_target.wait_for_completion(show_output=True)

print(compute_target.get_status().serialize())

Creating...
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded..................
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
{'currentNodeCount': 1, 'targetNodeCount': 1, 'nodeStateCounts': {'preparingNodeCount': 1, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2021-04-13T04:33:47.714000+00:00', 'errors': None, 'creationTime': '2021-04-13T04:31:58.860006+00:00', 'modifiedTime': '2021-04-13T04:32:14.225765+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 1, 'maxNodeCount': 5, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_DS3_V2'}


## AutoML Configuration

In the cells below we configure the automl settings and cofiguration.

In [5]:
# setting up automl

automl_settings = {
    "experiment_timeout_minutes": 45,
    "max_concurrent_iterations": 5,
    "primary_metric": 'accuracy'
}

automl_config = AutoMLConfig(
    task='classification',
    compute_target=compute_target,
    training_data=ds,
    label_column_name='DEATH_EVENT',
    n_cross_validations=5,
    **automl_settings
)

In [6]:
# Submitting the experiment
automl_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
HeartFailureClassification-AutoML,AutoML_b9d87534-b5da-46b4-b55f-20385b7e8713,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details

In the cell below, we use the `RunDetails` widget to show the different experiments.

In [8]:
RunDetails(automl_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [9]:
automl_run.wait_for_completion(show_output=True)

Experiment,Id,Type,Status,Details Page,Docs Page
HeartFailureClassification-AutoML,AutoML_b9d87534-b5da-46b4-b55f-20385b7e8713,automl,Running,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

******************************************************************

{'runId': 'AutoML_b9d87534-b5da-46b4-b55f-20385b7e8713',
 'target': 'AutoML-Compute',
 'status': 'Completed',
 'startTimeUtc': '2021-04-13T04:36:39.619686Z',
 'endTimeUtc': '2021-04-13T05:27:19.094212Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'AutoML-Compute',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"3c1ce54a-1913-4562-8aa0-79f53477a0b2\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-widgets": "1.26.0", "azureml-train": "1.26.0", "azureml-train-restclients-hyperdrive": "1.26.0", "azureml-train-core": "1.26.0", "azureml-train-automl": "1.26.0", "azureml-train-automl-runtime": "1.26.0", "azureml-train-automl-client": "1.26.0"

## Best Model

In the cell below, we get the best model from the automl experiments and display all the properties of the model.


In [10]:
# Getting the best run and model
aml_best_run, aml_fitted_model = automl_run.get_output()

# Printing the best run
print(aml_best_run)

Run(Experiment: HeartFailureClassification-AutoML,
Id: AutoML_b9d87534-b5da-46b4-b55f-20385b7e8713_181,
Type: azureml.scriptrun,
Status: Completed)


In [11]:
# Printing the model details
print(aml_fitted_model)

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                                    min_samples_leaf=0.035789473684210524,
                                                                                                    min_samples_split=0.01,
                                                                                                    min_weight_fraction_leaf=0.0,

In [12]:
# Getting all metrics of the best run
best_run_metrics = aml_best_run.get_metrics()

# Printing all metrics of the best run
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)

recall_score_micro 0.8797175141242939
AUC_micro 0.9175964282294361
precision_score_macro 0.8855827457434765
recall_score_macro 0.8519047619047619
AUC_weighted 0.9146040513104466
balanced_accuracy 0.8519047619047619
average_precision_score_micro 0.9192686968496734
f1_score_macro 0.8551316173562942
matthews_correlation 0.7346724541734359
norm_macro_recall 0.7038095238095238
precision_score_micro 0.8797175141242939
weighted_accuracy 0.8984080729594932
f1_score_weighted 0.876044013986579
AUC_macro 0.9146040513104466
log_loss 0.3621858836241265
accuracy 0.8797175141242939
precision_score_weighted 0.8961565085606933
recall_score_weighted 0.8797175141242939
average_precision_score_macro 0.8988617805657771
average_precision_score_weighted 0.9238262377324796
f1_score_micro 0.8797175141242939
accuracy_table aml://artifactId/ExperimentRun/dcid.AutoML_b9d87534-b5da-46b4-b55f-20385b7e8713_181/accuracy_table
confusion_matrix aml://artifactId/ExperimentRun/dcid.AutoML_b9d87534-b5da-46b4-b55f-20385b7e

In [13]:
aml_fitted_model.steps[1][1].estimators

[('110',
  Pipeline(memory=None,
           steps=[('sparsenormalizer',
                   <azureml.automl.runtime.shared.model_wrappers.SparseNormalizer object at 0x7ff53a240f98>),
                  ('xgboostclassifier',
                   XGBoostClassifier(base_score=0.5, booster='gbtree',
                                     colsample_bylevel=1, colsample_bynode=1,
                                     colsample_bytree=0.6, eta=0.3, gamma=0.1,
                                     learning_rate=0.1, max_delta_step=0,
                                     max_depth=6, max_leaves=0,
                                     min_child_weight=1, missing=nan,
                                     n_estimators=100, n_jobs=1, nthread=None,
                                     objective='reg:logistic', random_state=0,
                                     reg_alpha=1.0416666666666667,
                                     reg_lambda=2.0833333333333335,
                                     scale_pos_we

In [14]:
# Saving the best model
best_automl_model = aml_best_run.register_model(model_path='outputs/model.pkl', model_name='heart_failure_automl',
                        tags = {'Training context': 'Automated ML'},
                        properties = {'Accuracy': best_run_metrics['accuracy']})

print(best_automl_model)

Model(workspace=Workspace.create(name='quick-starts-ws-142548', subscription_id='1b944a9b-fdae-4f97-aeb1-b7eea0beac53', resource_group='aml-quickstarts-142548'), name=heart_failure_automl, id=heart_failure_automl:1, version=1, tags={'Training context': 'Automated ML'}, properties={'Accuracy': '0.8797175141242939'})


In [15]:
# Listing registered models to verify that the model has been saved
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

heart_failure_automl version: 1
	 Training context : Automated ML
	 Accuracy : 0.8797175141242939


heart_failure_hyperdrive version: 1
	 Accuracy : 0.7555555555555555
	 N Estimators : 20
	 Min Samples Split : 2




## Model Deployment

In the cells below, we register the model, create an inference config and deploy the model as a web service.

In [17]:
# Downloading the environment file
aml_best_run.download_file('outputs/conda_env_v_1_0_0.yml', 'envFile.yml')

# Downloading the scoring file 
aml_best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'scoreScript.py')

In [19]:
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(entry_script='scoreScript.py',
                                   environment=aml_best_run.get_environment())

# Deploying
from azureml.core.webservice import AciWebservice

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
service = Model.deploy(ws, "myservice", [best_automl_model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)
print(service.state)

print(service.scoring_uri)

print(service.swagger_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-04-13 05:45:34+00:00 Creating Container Registry if not exists.
2021-04-13 05:45:34+00:00 Registering the environment.
2021-04-13 05:45:36+00:00 Use the existing image.
2021-04-13 05:45:36+00:00 Generating deployment configuration.
2021-04-13 05:45:37+00:00 Submitting deployment to compute..
2021-04-13 05:45:42+00:00 Checking the status of deployment myservice..
2021-04-13 05:50:14+00:00 Checking the status of inference endpoint myservice.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
http://23f71748-edf8-4269-97cb-f8df0fc734f3.southcentralus.azurecontainer.io/score
http://23f71748-edf8-4269-97cb-f8df0fc734f3.southcentralus.azurecontainer.io/swagger.json


In [20]:
import json

# Importing test data
data_df = ds.to_pandas_dataframe().dropna()
test_df = data_df.sample(5) # data_df is the pandas dataframe of the original data
label_df = test_df.pop('DEATH_EVENT')

test_sample = json.dumps({'data': test_df.to_dict(orient='records')})

print(test_sample)

{"data": [{"age": 51.0, "anaemia": 0, "creatinine_phosphokinase": 78, "diabetes": 0, "ejection_fraction": 50, "high_blood_pressure": 0, "platelets": 406000.0, "serum_creatinine": 0.7, "serum_sodium": 140, "sex": 1, "smoking": 0, "time": 79}, {"age": 72.0, "anaemia": 1, "creatinine_phosphokinase": 328, "diabetes": 0, "ejection_fraction": 30, "high_blood_pressure": 1, "platelets": 621000.0, "serum_creatinine": 1.7, "serum_sodium": 138, "sex": 0, "smoking": 1, "time": 88}, {"age": 66.0, "anaemia": 1, "creatinine_phosphokinase": 68, "diabetes": 1, "ejection_fraction": 38, "high_blood_pressure": 1, "platelets": 162000.0, "serum_creatinine": 1.0, "serum_sodium": 136, "sex": 0, "smoking": 0, "time": 95}, {"age": 59.0, "anaemia": 1, "creatinine_phosphokinase": 280, "diabetes": 1, "ejection_fraction": 25, "high_blood_pressure": 1, "platelets": 302000.0, "serum_creatinine": 1.0, "serum_sodium": 141, "sex": 0, "smoking": 0, "time": 78}, {"age": 63.0, "anaemia": 1, "creatinine_phosphokinase": 122,


In the cell below, we send a request to the web service you deployed to test it.


In [21]:
import requests # for http post request

# Set the content type
headers = {'Content-type': 'application/json'}

response = requests.post(service.scoring_uri, test_sample, headers=headers)

In [22]:
# Printing results from the inference
print(response.text)

"{\"result\": [0, 1, 0, 1, 0]}"


In [23]:
# Printing ground truth labels
print(label_df)

85     0
105    1
121    0
84     1
178    0
Name: DEATH_EVENT, dtype: int64


In the cell below, we print the logs of the web service and delete the service

In [24]:
print(service.get_logs())

2021-04-13T05:50:07,090243700+00:00 - rsyslog/run 
2021-04-13T05:50:07,089346000+00:00 - iot-server/run 
2021-04-13T05:50:07,110302100+00:00 - gunicorn/run 
rsyslogd: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libuuid.so.1: no version information available (required by rsyslogd)
2021-04-13T05:50:07,130212900+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml

In [25]:
service.delete()