# Automated ML

In the cell below, we import all the dependencies that we need to complete the project.

In [17]:
from azureml.core import Workspace, Dataset
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails

## Dataset

### Overview

For this final project we'll be using the [Heart Failure Prediction](https://www.kaggle.com/datasets/andrewmvd/heart-failure-clinical-data) dataset from Kaggle.

This dataset contains medical records for 299 patients with hear failure,
along with a column indicating survival as a binary variable.
Our goal is to predict survival from the rest of the data.
This means that we are faced with a classification problem with two classes.

In [10]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'edu_heart_failure_exp'

experiment = Experiment(ws, experiment_name)

subscription_id = '976ee174-3882-4721-b90a-b5fef6b72f24'
resource_group = 'aml-quickstarts-238978'
workspace_name = 'quick-starts-ws-238978'

workspace = Workspace(subscription_id, resource_group, workspace_name)

dataset_name = 'edu_heart_failure_dataset'

dataset = Dataset.get_by_name(workspace, name=dataset_name)

In [11]:
df = dataset.to_pandas_dataframe()
df.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


## AutoML Configuration

In the cells below we use the AutoMLConfig class to configure several aspects of AutoML:
- task is set to classification, which matches our use case
- training_data is set to the dataset we instantiated above
- label_column_name is the name of the target column, DEATH_EVENT
- n_cross_validations=5 splits the dataset into five folds, using each of them in sequence as a test set while training on the remaining four, to better assess model performance
- as primary metric to evaluate experiments we choose AUC weighted
- early stopping is enabled, so that the experiment can end early if results are discouraging
- we set a timeout of one hour to avoid running out of time with the Udacity VM
- as compute_target we choose a compute cluster we created beforehand

In [14]:
ws.compute_targets.keys()

dict_keys(['notebook238978', 'edu-compute-cluster'])

In [15]:
compute_cluster_name = "edu-compute-cluster"

automl_settings = {
    "task": "classification",
    "primary_metric": "AUC_weighted",
    "training_data": dataset,
    "label_column_name": "DEATH_EVENT",
    "n_cross_validations": 5,
    "compute_target": ws.compute_targets[compute_cluster_name],
    "enable_early_stopping": True,
    "experiment_timeout_hours": 1
}

automl_config = AutoMLConfig(**automl_settings)

In [16]:
remote_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
edu_heart_failure_exp,AutoML_da6d54f5-5de1-4460-bed3-1daabd5092fe,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details

In [18]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

In the cells below, we get the best model from the AutoML experiments and display all the properties of the model.



In [21]:
best_run, fitted_model = remote_run.get_output()

Package:azureml-automl-runtime, training version:1.52.0.post1, current version:1.51.0.post1
Package:azureml-core, training version:1.52.0, current version:1.51.0
Package:azureml-defaults, training version:1.52.0, current version:1.51.0
Package:azureml-interpret, training version:1.52.0, current version:1.51.0
Package:azureml-mlflow, training version:1.52.0, current version:1.51.0
Package:azureml-pipeline-core, training version:1.52.0, current version:1.51.0
Package:azureml-responsibleai, training version:1.52.0, current version:1.51.0
Package:azureml-telemetry, training version:1.52.0, current version:1.51.0
Package:azureml-train-automl-client, training version:1.52.0, current version:1.51.0.post1
Package:azureml-train-automl-runtime, training version:1.52.0, current version:1.51.0.post2
Package:azureml-train-core, training version:1.52.0, current version:1.51.0
Package:azureml-train-restclients-hyperdrive, training version:1.52.0, current version:1.51.0
Package:azureml-training-tabula

In [28]:
print(best_run)

Run(Experiment: edu_heart_failure_exp,
Id: AutoML_da6d54f5-5de1-4460-bed3-1daabd5092fe_32,
Type: azureml.scriptrun,
Status: Completed)


In [29]:
print(fitted_model)

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=False, enable_feature_sweeping=True, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=True, is_onnx_compatible=False, observer=None, task='classification', working_dir='/mnt/batch/tasks/shared/LS_root/mount...
                 PreFittedSoftVotingClassifier(classification_labels=array([0, 1]), estimators=[('21', Pipeline(memory=None, steps=[('standardscalerwrapper', StandardScalerWrapper(copy=True, with_mean=False, with_std=False)), ('randomforestclassifier', RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1, oob_score=False, random_state=None, v

In [27]:
best_run.get_metrics()

{'matthews_correlation': 0.67481831452031,
 'norm_macro_recall': 0.6514714839424143,
 'AUC_macro': 0.9203002722406792,
 'log_loss': 0.38299137905300257,
 'weighted_accuracy': 0.8769823482837316,
 'precision_score_macro': 0.8511980828805216,
 'f1_score_micro': 0.8562146892655367,
 'AUC_micro': 0.9197659516741676,
 'f1_score_weighted': 0.851005090871357,
 'recall_score_macro': 0.8257357419712072,
 'average_precision_score_macro': 0.9048090328160298,
 'precision_score_weighted': 0.8683746221504934,
 'average_precision_score_weighted': 0.9283670277220428,
 'f1_score_macro': 0.826143842554149,
 'recall_score_weighted': 0.8562146892655367,
 'recall_score_micro': 0.8562146892655367,
 'accuracy': 0.8562146892655367,
 'average_precision_score_micro': 0.9234561593302612,
 'balanced_accuracy': 0.8257357419712072,
 'AUC_weighted': 0.9203002722406793,
 'precision_score_micro': 0.8562146892655367,
 'accuracy_table': 'aml://artifactId/ExperimentRun/dcid.AutoML_da6d54f5-5de1-4460-bed3-1daabd5092fe_32/

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [30]:
remote_run.register_model()

Model(workspace=Workspace.create(name='quick-starts-ws-238978', subscription_id='976ee174-3882-4721-b90a-b5fef6b72f24', resource_group='aml-quickstarts-238978'), name=AutoMLda6d54f5532, id=AutoMLda6d54f5532:1, version=1, tags={}, properties={})

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
