# Automated ML

In the next cells: we will update the environment SDK to fit with the latest AutoML version and avoid incompatibility issues and import all the important libraries we will need to execute this notebook.

In [1]:
# In order to avoid problems with different SDK versions during AutoML training and later serving of the model in the compute cluster I update the SDK version of AutoML.
# It is important to use Python 3.6 as it is, at this moment the compatible version with the latest version of AutoML SDK.

import sys

! {sys.executable} -m pip install --upgrade azureml-sdk[automl]

Collecting azureml-sdk[automl]
  Using cached azureml_sdk-1.36.0-py3-none-any.whl (4.5 kB)
Collecting azureml-train-automl-client~=1.36.0
  Using cached azureml_train_automl_client-1.36.0-py3-none-any.whl (135 kB)
Collecting azureml-core~=1.36.0
  Using cached azureml_core-1.36.0.post2-py3-none-any.whl (2.4 MB)
Collecting azureml-pipeline~=1.36.0
  Using cached azureml_pipeline-1.36.0-py3-none-any.whl (3.7 kB)
Collecting azureml-train-core~=1.36.0
  Using cached azureml_train_core-1.36.0-py3-none-any.whl (8.6 MB)
Collecting azureml-train-automl~=1.36.0; extra == "automl"
  Using cached azureml_train_automl-1.36.0-py3-none-any.whl (3.5 kB)
Collecting azureml-telemetry~=1.36.0
  Using cached azureml_telemetry-1.36.0-py3-none-any.whl (30 kB)
Collecting azureml-automl-core~=1.36.0
  Using cached azureml_automl_core-1.36.1-py3-none-any.whl (221 kB)
Collecting azureml-pipeline-steps~=1.36.0
  Using cached azureml_pipeline_steps-1.36.0-py3-none-any.whl (70 kB)
Collecting azureml-pipeline-core

In [2]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.widgets import RunDetails
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset
from azureml.core import Datastore
import joblib
import pprint
import requests
import json
import pandas as pd
from azureml.core import Model
from azureml.core.resource_configuration import ResourceConfiguration


## Dataset

### Overview

We are going to configure our workspace so we can register our dataset and create a compute cluster for the AutoML training and deployment of our model.

The dataset we will be using is the "Heart failure prediction dataset" from Kaggle (https://www.kaggle.com/fedesoriano/heart-failure-prediction). This dataset tries to help in the early detection of severe heart diseases by studying the way several health indicators affect the occurrence of such diseases. This dataset is a combination of 5 different datasets about this kind of diseases (more information in the Kaggle url provided earlier). 

A copy of the dataset is provided in the Github repository but it is also possible to access it by an url.

In [3]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'Udacitycapsproject'

experiment=Experiment(ws, experiment_name)

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

run = experiment.start_logging()

Workspace name: quick-starts-ws-164599
Azure region: southcentralus
Subscription id: 510b94ba-e453-4417-988b-fbdc37b55ca7
Resource group: aml-quickstarts-164599


In [4]:
cluster_name="udacityprojclust"

try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('cpu cluster already exist. Using it.')
except ComputeTargetException:

    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_D2_V2', max_nodes=6)
    cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

InProgress....
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [11]:
data = pd.read_csv('./heart.csv')
datastore = Datastore.get(ws, 'workspaceblobstore')

dataset = TabularDatasetFactory.register_pandas_dataframe(data, target=datastore, name='udacitycapsprojdata')

Method register_pandas_dataframe: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


Validating arguments.
Arguments validated.
Successfully obtained datastore reference and path.
Uploading file to managed-dataset/5c61c621-2f75-4b83-84e3-327432684cb9/
Successfully uploaded file to datastore.
Creating and registering a new dataset.
Successfully created and registered a new dataset.


## AutoML Configuration

The task we need to develop with AutoML is a classification, we will be using a yes or no label in the dataset that is 'HeartDisease'. With the settings we try to define the AutoML work to give the best results but optimizing the resource consumption. So we have selected AUC metric to handle in the best possible way if there are some imbalance in the dataset and we also generate a number of 5 folds to assure a proper evaluation of the model, we have also enabled an early stopping policy and fixed the experiment timeout to optimize the use of the compute cluster. 

In [14]:
# automl settings
automl_settings = {
       "n_cross_validations": 5,
       "primary_metric": 'AUC_weighted',
       "enable_early_stopping": True,
       "experiment_timeout_hours": 1.0,
       "max_concurrent_iterations": 5,
       "max_cores_per_iteration": -1,
       "verbosity": logging.INFO}

# automl config
automl_config = AutoMLConfig(task = 'classification',
                               compute_target = cluster_name,
                               training_data = dataset,
                               label_column_name = 'HeartDisease',
                               **automl_settings)

In [15]:
# Submit the AutoML experiment
remote_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
Udacitycapsproject,AutoML_e6320db9-0c58-4c29-873c-f2a0b9e521f4,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [16]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [None]:
# Retrieve and get insights from your best automl model.

best_run_AutoML, fitted_model_AutoML = remote_run.get_output()

print(hasattr(fitted_model_AutoML, 'steps'))

In [None]:
# Function to list the hyperparameters 

def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators' : list(e[0] for e in step[1].estimators), 'weights' : step[1].weights})
            print()

            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0] + ' - ')
        
        else:
            pprint(step[1].get_params())
            print()
        
print_model(fitted_model_AutoML)

In [None]:
# Get information from guardrails.

print(remote_run.get_guardrails())

In [None]:
# Save the best model by AutoML

joblib.dump(fitted_model_AutoML, 'AutoML.model')

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [None]:
df = dataset.to_pandas_dataframe()

df_y = df['HeartDisease']
df_x = df[~['HeartDisease']]

model = Model.register(workspace=ws,
                       model_name='my-udacityproj3-automlmodel',                # Name of the registered model in your workspace.
                       model_path='./AutoML.model',  # Local file to upload and register as a model.
                       model_framework=Model.Framework.AutoML,  # Framework used to create the model.
                       resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=1.0),
                       auth_enabled=True,
                       description='AutoML model for heart disease prediction.',
                       tags={'area': 'heartdisease', 'type': 'classification'})

print('Name:', model.name)

TODO: In the cell below, send a request to the web service you deployed to test it.

In [None]:
service_name = 'my-udacityproj3-service'

service = Model.deploy(ws, service_name, [model], overwrite=True)
service.wait_for_deployment(show_output=True)

TODO: In the cell below, print the logs of the web service and delete the service

In [None]:
input_payload = json.dumps({
    'data': df_x[0:2].tolist(),
    'method': 'predict'  # If you have a classification model, you can get probabilities by changing this to 'predict_proba'.
})

output = service.run(input_payload)

print(output)

In [None]:
# Remove WebService endpoint

service.delete()

# Remove compute cluster

cpu_cluster.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
