# Automated ML

In [40]:
import os
import joblib

from azureml.core import Dataset, Workspace, Experiment
from azureml.core.compute import ComputeTarget
from azureml.widgets import RunDetails
from azureml.train.automl import AutoMLConfig

## Dataset

### Overview
This dataset is from the UCI repository and it has 14 most commonly used attributes by researchers in order to identify the presence of a heart disease on a patient. We will be performing here an Automated ML approach in order to make this classification to check how does it compare to a standard approach with tuned hyper-parameters

In [3]:
ws = Workspace.from_config()

experiment_name = 'automl-heart-disease'

experiment=Experiment(ws, experiment_name)

In [5]:
dataset = Dataset.get_by_name(ws, 'heart-disease-uci')

In [31]:
data_test, data_train = dataset.random_split(0.2)

## AutoML Configuration

Here it was chosen to use a RAM-optimized compute cluster for calculations. I have selected the timeout to be 30minutes and enabled the early stopping in order to minimize the costs involved in this experiment. The primary metric that the optimizer is looking for is accuracy, for it is a classification task. The number of cross validations was chosen to be 5 in order to have a better balance between lighter computation efforts and prevent it from overfitting.

In [14]:
# Get created compute cluster
compute_cluster_name = 'RAM-cluster'
compute_cluster = ComputeTarget(workspace=ws, name=compute_cluster_name)

In [34]:
automl_settings = {
    "experiment_timeout_minutes": 30,
    "task": 'classification',
    "enable_early_stopping": True,
    "primary_metric": 'accuracy',
    "label_column_name": 'target',
    "max_cores_per_iteration": -1,
    "n_cross_validations": 5,
}

automl_config = AutoMLConfig(
                            compute_target = compute_cluster,
                            training_data = data_train,
                            **automl_settings
                        )

In [49]:
# Submitting experiment
remote_run = experiment.submit(automl_config, show_output=True)

Running on remote.
Running on remote compute: RAM-cluster
Parent Run ID: AutoML_e3538ed2-fdec-410e-87a5-6881b7675331

Received interrupt. Returning now.

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

In [38]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

In [41]:
best_auto_run, fitted_model = remote_run.get_output()

os.makedirs('./outputs', exist_ok=True)

joblib.dump(fitted_model, filename='outputs/automl.joblib')

['outputs/automl.joblib']

In [50]:
best_auto_run

Experiment,Id,Type,Status,Details Page,Docs Page
automl-heart-disease,AutoML_02bba0d7-38e7-472e-93e9-77c3c7f8004e_25,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [48]:
# Registering model

model = best_auto_run.register_model(model_name='automl-heart-disease',
                           tags={'automl': 'heart-disease'},
                           model_path='.')

## Model Deployment

Made with the Hyperparameter-tuned model