# Automated ML

In the cell below, we import all the dependencies that we need to complete the project.

In [9]:
from azureml.core import Workspace, Dataset
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
from azureml.pipeline.steps import AutoMLStep

## Dataset

### Overview

For this final project we'll be using the [Heart Failure Prediction](https://www.kaggle.com/datasets/andrewmvd/heart-failure-clinical-data) dataset from Kaggle.

This dataset contains medical records for 299 patients with hear failure,
along with a column indicating survival as a binary variable.
Our goal is to predict survival from the rest of the data.
This means that we are faced with a classification problem with two classes.

In [2]:
workspace = Workspace.from_config()

experiment_name = 'edu_heart_failure_exp'
experiment = Experiment(workspace, experiment_name)

dataset_name = 'edu_heart_failure_dataset'
dataset = Dataset.get_by_name(workspace, name=dataset_name)

In [3]:
df = dataset.to_pandas_dataframe()
df.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


## AutoML Configuration

In the cells below we use the AutoMLConfig class to configure several aspects of AutoML:
- task is set to classification, which matches our use case
- training_data is set to the dataset we instantiated above
- label_column_name is the name of the target column, DEATH_EVENT
- n_cross_validations=5 splits the dataset into five folds, using each of them in sequence as a test set while training on the remaining four, to better assess model performance
- as primary metric to evaluate experiments we choose AUC weighted
- early stopping is enabled, so that the experiment can end early if results are discouraging
- we set a timeout of one hour to avoid running out of time with the Udacity VM
- as compute_target we choose a compute cluster we created beforehand

In [4]:
compute_cluster_name = "edu-compute-cluster"

automl_settings = {
    "task": "classification",
    "primary_metric": "AUC_weighted",
    "training_data": dataset,
    "label_column_name": "DEATH_EVENT",
    "n_cross_validations": 5,
    "compute_target": workspace.compute_targets[compute_cluster_name],
    "enable_early_stopping": True,
    "experiment_timeout_hours": 1
}

automl_config = AutoMLConfig(**automl_settings)

In [8]:
from azureml.pipeline.core import PipelineData, TrainingOutput

ds = workspace.get_default_datastore()
metrics_output_name = 'metrics_output'
best_model_output_name = 'best_model_output'

metrics_data = PipelineData(name='metrics_data',
                           datastore=ds,
                           pipeline_output_name=metrics_output_name,
                           training_output=TrainingOutput(type='Metrics'))
model_data = PipelineData(name='model_data',
                           datastore=ds,
                           pipeline_output_name=best_model_output_name,
                           training_output=TrainingOutput(type='Model'))

In [10]:
automl_step = AutoMLStep(
    name='automl_module',
    automl_config=automl_config,
    outputs=[metrics_data, model_data],
    allow_reuse=True)

In [11]:
from azureml.pipeline.core import Pipeline
pipeline = Pipeline(
    description="pipeline_with_automlstep",
    workspace=workspace,    
    steps=[automl_step])

In [12]:
pipeline_run = experiment.submit(pipeline)

Created step automl_module [e4004bb3][6ca93e61-3330-493a-8949-995cf0b3c281], (This step will run and generate new outputs)
Submitted PipelineRun 52eb2990-41e7-4107-a218-65e5e187cfb1
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/52eb2990-41e7-4107-a218-65e5e187cfb1?wsid=/subscriptions/b968fb36-f06a-4c76-a15f-afab68ae7667/resourcegroups/aml-quickstarts-239589/workspaces/quick-starts-ws-239589&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254


## Run Details

In [14]:
RunDetails(pipeline_run).show()

_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …

KeyError: 'log_files'

KeyError: 'log_files'

## Best Model

In the cells below, we get the best model from the AutoML experiments and display all the properties of the model.



In [15]:
# Retrieve best model from Pipeline Run
best_model_output = pipeline_run.get_pipeline_output(best_model_output_name)
num_file_downloaded = best_model_output.download('.', show_progress=True)

Downloading azureml/c54e19cd-4309-4cf7-aec8-5780bf9bb825/model_data
Downloaded azureml/c54e19cd-4309-4cf7-aec8-5780bf9bb825/model_data, 1 files out of an estimated total of 1


In [16]:
import pickle

with open(best_model_output._path_on_datastore, "rb" ) as f:
    best_model = pickle.load(f)
best_model

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=False, enable_feature_sweeping=True, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=True, is_onnx_compatible=False, observer=None, task='classification', working_dir='/mnt/batch/tasks/shared/LS_root/mount...
                 PreFittedSoftVotingClassifier(classification_labels=array([0, 1]), estimators=[('20', Pipeline(memory=None, steps=[('standardscalerwrapper', StandardScalerWrapper(copy=True, with_mean=False, with_std=False)), ('xgboostclassifier', XGBoostClassifier(booster='gbtree', colsample_bytree=0.9, eta=0.1, gamma=0, max_depth=6, max_leaves=3, n_estimators=25, n_jobs=1, objective='reg:logistic', problem_info=ProblemInfo(gpu_training_param_dict={'processing_unit_type': 'cpu'}), random_state=0, reg_alpha=0, reg_lambda=0.7291666666666667, subsample=0.5, tree_method='auto'))], verbose=False)), ('21',

## Model Deployment

In [32]:
published_pipeline = pipeline_run.publish_pipeline(
    name="Heart Failure", description="Heart Failure AutoML Pipeline", version="v1.0")

published_pipeline

Name,Id,Status,Endpoint
Heart Failure,d7e17e07-0977-4f27-8bad-c0900e6d028a,Active,REST Endpoint


In [27]:
from azureml.core.authentication import InteractiveLoginAuthentication

interactive_auth = InteractiveLoginAuthentication()
auth_header = interactive_auth.get_authentication_header()

In [28]:
import requests

rest_endpoint = published_pipeline.endpoint
response = requests.post(rest_endpoint, 
                         headers=auth_header, 
                         json={"ExperimentName": "pipeline-rest-endpoint"}
                        )

In [29]:
try:
    response.raise_for_status()
except Exception:    
    raise Exception("Received bad response from the endpoint: {}\n"
                    "Response Code: {}\n"
                    "Headers: {}\n"
                    "Content: {}".format(rest_endpoint, response.status_code, response.headers, response.content))

run_id = response.json().get('Id')
print('Submitted pipeline run: ', run_id)

Submitted pipeline run:  a916d761-83a4-4401-b98d-8b60de8742bd


In [31]:
from azureml.pipeline.core.run import PipelineRun
from azureml.widgets import RunDetails

published_pipeline_run = PipelineRun(workspace.experiments["pipeline-rest-endpoint"], run_id)
RunDetails(published_pipeline_run).show()

_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
