# Automated ML

In the cell below, we import all the dependencies that we need to complete the project.

In [1]:
from azureml.core import Workspace, Dataset
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
from azureml.pipeline.steps import AutoMLStep

## Dataset

### Overview

For this final project we'll be using the [Heart Failure Prediction](https://www.kaggle.com/datasets/andrewmvd/heart-failure-clinical-data) dataset from Kaggle.

This dataset contains medical records for 299 patients with hear failure,
along with a column indicating survival as a binary variable.
Our goal is to predict survival from the rest of the data.
This means that we are faced with a classification problem with two classes.

In [3]:
workspace = Workspace.from_config()

experiment_name = 'edu_hf_automl_exp'
experiment = Experiment(workspace, experiment_name)

compute_cluster_name = "edu-compute-cluster"
compute_target = workspace.compute_targets[compute_cluster_name]

dataset_name = 'edu_heart_failure_dataset'
dataset = Dataset.get_by_name(workspace, name=dataset_name)

In [4]:
df = dataset.to_pandas_dataframe()
df.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


## AutoML Configuration

In the cells below we use the AutoMLConfig class to configure several aspects of AutoML:
- task is set to classification, which matches our use case
- training_data is set to the dataset we instantiated above
- label_column_name is the name of the target column, DEATH_EVENT
- n_cross_validations=5 splits the dataset into five folds, using each of them in sequence as a test set while training on the remaining four, to better assess model performance
- as primary metric to evaluate experiments we choose AUC weighted
- early stopping is enabled, so that the experiment can end early if results are discouraging
- we set a timeout of one hour to avoid running out of time with the Udacity VM
- as compute_target we choose a compute cluster we created beforehand

In [5]:
target_column = "DEATH_EVENT"

automl_settings = {
    "task": "classification",
    "primary_metric": "AUC_weighted",
    "training_data": dataset,
    "label_column_name": target_column,
    "n_cross_validations": 5,
    "compute_target": compute_target,
    "enable_early_stopping": True,
    "experiment_timeout_hours": 1
}

automl_config = AutoMLConfig(**automl_settings)

In [6]:
automl_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
edu_hf_automl_exp,AutoML_f83e72a8-31c1-43a6-9208-6ead30c98832,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details

In [7]:
RunDetails(automl_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

In the cells below, we get the best model from the AutoML experiments and display all the properties of the model.



In [10]:
best_run, fitted_model = automl_run.get_output()

Package:azureml-automl-runtime, training version:1.52.0.post1, current version:1.51.0.post1
Package:azureml-core, training version:1.52.0, current version:1.51.0
Package:azureml-dataprep, training version:4.11.4, current version:4.10.8
Package:azureml-dataprep-rslex, training version:2.18.4, current version:2.17.12
Package:azureml-dataset-runtime, training version:1.52.0, current version:1.51.0
Package:azureml-defaults, training version:1.52.0, current version:1.51.0
Package:azureml-interpret, training version:1.52.0, current version:1.51.0
Package:azureml-mlflow, training version:1.52.0, current version:1.51.0
Package:azureml-pipeline-core, training version:1.52.0, current version:1.51.0
Package:azureml-responsibleai, training version:1.52.0, current version:1.51.0
Package:azureml-telemetry, training version:1.52.0, current version:1.51.0
Package:azureml-train-automl-client, training version:1.52.0, current version:1.51.0.post1
Package:azureml-train-automl-runtime, training version:1.

In [11]:
print(f"Best run id: {best_run.id}")

Best run id: AutoML_f83e72a8-31c1-43a6-9208-6ead30c98832_30


In [13]:
print(best_run.get_metrics())

{'f1_score_micro': 0.8596610169491525, 'matthews_correlation': 0.6829649457110861, 'AUC_micro': 0.9210893102237543, 'recall_score_micro': 0.8596610169491525, 'recall_score_macro': 0.8220535714285715, 'average_precision_score_macro': 0.9113358251466103, 'AUC_weighted': 0.9240484957548911, 'f1_score_macro': 0.8280186557018453, 'log_loss': 0.3858267646170172, 'accuracy': 0.8596610169491525, 'average_precision_score_micro': 0.9242674506958579, 'precision_score_macro': 0.8644680512606724, 'AUC_macro': 0.9240484957548911, 'recall_score_weighted': 0.8596610169491525, 'precision_score_weighted': 0.8742129604817481, 'f1_score_weighted': 0.8538566359614312, 'precision_score_micro': 0.8596610169491525, 'average_precision_score_weighted': 0.9323235028272618, 'weighted_accuracy': 0.8841729775045412, 'balanced_accuracy': 0.8220535714285715, 'norm_macro_recall': 0.6441071428571429, 'confusion_matrix': 'aml://artifactId/ExperimentRun/dcid.AutoML_f83e72a8-31c1-43a6-9208-6ead30c98832_30/confusion_matrix

In [16]:
print(fitted_model)

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=False, enable_feature_sweeping=True, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=True, is_onnx_compatible=False, observer=None, task='classification', working_dir='/mnt/batch/tasks/shared/LS_root/mount...
                 PreFittedSoftVotingClassifier(classification_labels=array([0, 1]), estimators=[('16', Pipeline(memory=None, steps=[('minmaxscaler', MinMaxScaler(copy=True, feature_range=(0, 1))), ('extratreesclassifier', ExtraTreesClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=None, max_features=None, max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=0.01, min_samples_split=0.056842105263157895, min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=1, oob_score=False, random_state=None, verbose=0, warm

## Model Deployment

We do not deploy this model, but we register it nevertheless.

In [17]:
model_name = best_run.properties['model_name']
model = best_run.register_model(model_name=model_name, model_path="outputs")

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
