# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
import json

from azureml.core import Experiment, Model, Workspace
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
from IPython.display import display

from compute_cluster import get_compute_cluster

## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.


TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = "udacity-project-automl"

experiment = Experiment(ws, experiment_name)

print('Workspace name: ' + ws.name,
      'Azure region: ' + ws.location,
      'Subscription id: ' + ws.subscription_id,
      'Resource group: ' + ws.resource_group, sep='\n'
      )

run = experiment.start_logging()

heart_dataset = TabularDatasetFactory.from_delimited_files(
    "https://raw.githubusercontent.com/t0m0ffel/udacity-capstone/master/starter_file/heart.csv"
)



Workspace name: quick-starts-ws-227003
Azure region: westus2
Subscription id: 5a4ab2ba-6c51-4805-8155-58759ad589d8
Resource group: aml-quickstarts-227003


## AutoML Configuration

TODO: Explain why you chose the automl settings and configuration you used below.

In [None]:
# Please refer to the Readme for a detailed explanation of the parameters

In [3]:
compute_target = get_compute_cluster(ws)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [4]:
# TODO: Put your automl settings here
automl_settings = {
    "experiment_timeout_minutes": 20,
    "max_concurrent_iterations": 5,
    "primary_metric": 'accuracy'
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(
    compute_target=compute_target,
    task="classification",
    training_data=heart_dataset,
    label_column_name="output",
    enable_early_stopping=True,
    iterations=200,
    featurization='auto',
    **automl_settings
)

In [5]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config, show_output=True)

Submitting remote run.
No run_configuration provided, running on project-cluster with default configuration
Running on remote compute: project-cluster


Experiment,Id,Type,Status,Details Page,Docs Page
udacity-project-automl,AutoML_987e9c5d-9ee9-49fe-ba81-db8bf828cc6f,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Cross validation
STATUS:       DONE
DESCRIPTION:  In order to accurately evaluate the model(s) trained by AutoML, we leverage a dataset that the model is not trained on. Hence, if the user doesn't provide an explicit validation dataset, a part of the training dataset is used to achieve this. For smaller datasets (fewer than 20,000 samples), cross-validation is leveraged, else a single hold-out set is split from the training data to serve as the validation dataset. Hence, for your input data we leverage cross-validation with 10 folds, if the number o

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [6]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [7]:
# Get best run and model
best_run, model = remote_run.get_output()
best_run_metrics = best_run.get_metrics()
display(best_run)
print("Primary metric accuracy: ", best_run_metrics['accuracy'])
print("All metrics:")
print(json.dumps(best_run_metrics, indent=4))
print("Model Config")
print(model)

Experiment,Id,Type,Status,Details Page,Docs Page
udacity-project-automl,AutoML_987e9c5d-9ee9-49fe-ba81-db8bf828cc6f_38,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


Primary metric accuracy:  0.857741935483871
All metrics:
{
    "average_precision_score_weighted": 0.9239149126994082,
    "AUC_macro": 0.9189068842807233,
    "average_precision_score_micro": 0.9133613798581839,
    "average_precision_score_macro": 0.9205373621144928,
    "AUC_weighted": 0.9189068842807233,
    "matthews_correlation": 0.7123759456169987,
    "precision_score_weighted": 0.8632265123923094,
    "recall_score_micro": 0.857741935483871,
    "precision_score_micro": 0.857741935483871,
    "f1_score_micro": 0.857741935483871,
    "weighted_accuracy": 0.8611574924402319,
    "AUC_micro": 0.9137122210660191,
    "accuracy": 0.857741935483871,
    "balanced_accuracy": 0.8538976415311555,
    "log_loss": 0.3929398731381307,
    "f1_score_weighted": 0.8563643347551155,
    "norm_macro_recall": 0.707795283062311,
    "precision_score_macro": 0.8587735426338368,
    "f1_score_macro": 0.8519645003493818,
    "recall_score_weighted": 0.857741935483871,
    "recall_score_macro": 0.85

In [18]:
#TODO: Save the best model
best_model = best_run.register_model(
    model_path='./outputs/model.pkl',
    model_name='best_automl_model', properties=dict(best_run_metrics)
)


## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [19]:

best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'scoring_file.py')


inference_config = InferenceConfig(entry_script='scoring_file.py', environment=best_run.get_environment())

deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)
service = Model.deploy(ws, "automl-deployment", [best_model], inference_config, deployment_config)
service.wait_for_deployment(show_output=True)

assert service.state == 'Healthy', "Something with the deployment went wrong since the endpoint is not healthy"

print(f"Swagger URI: {service.swagger_uri}\nScoring URI: {service.scoring_uri}")

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2023-03-01 17:07:47+00:00 Creating Container Registry if not exists.
2023-03-01 17:07:47+00:00 Registering the environment.
2023-03-01 17:07:47+00:00 Use the existing image.
2023-03-01 17:07:48+00:00 Submitting deployment to compute.
2023-03-01 17:07:53+00:00 Checking the status of deployment automl-deployment..
2023-03-01 17:09:55+00:00 Checking the status of inference endpoint automl-deployment.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Swagger URI: http://3280269c-052f-488b-b1af-7d04f6a3f347.westus2.azurecontainer.io/swagger.json
Scoring URI: http://3280269c-052f-488b-b1af-7d04f6a3f347.westus2.azurecontainer.io/score


TODO: In the cell below, send a request to the web service you deployed to test it.

In [20]:
import json
from requests import post
import pandas as pd

n_samples = 10

X = heart_dataset.to_pandas_dataframe().sample(10)
y = X.pop('output')

sample_payload = json.dumps(dict(data=X.to_dict(orient='records')))

response = post(service.scoring_uri, sample_payload, headers={'Content-type': 'application/json'})

(
    pd.concat([v.reset_index(drop=True) for v in [X, y, pd.DataFrame(eval(eval(response.text)))]], axis='columns')
    .rename(columns=dict(output='actual_output', result='predicted_output'))
    .assign(correct=lambda df: df.actual_output == df.predicted_output)
)

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,actual_output,predicted_output,correct
0,60,1,0,145,282,0,0,142,1,2.8,1,2,3,0,0,True
1,43,0,2,122,213,0,1,165,0,0.2,1,0,2,1,1,True
2,65,1,0,110,248,0,0,158,0,0.6,2,2,1,0,0,True
3,61,1,2,150,243,1,1,137,1,1.0,1,0,2,1,1,True
4,49,0,0,130,269,0,1,163,0,0.0,2,0,2,1,1,True
5,46,0,1,105,204,0,1,172,0,0.0,2,0,2,1,1,True
6,44,1,1,130,219,0,0,188,0,0.0,2,0,2,1,1,True
7,42,0,0,102,265,0,0,122,0,0.6,1,0,2,1,1,True
8,59,1,3,160,273,0,0,125,0,0.0,2,0,2,0,1,False
9,58,1,2,132,224,0,0,173,0,3.2,2,2,3,0,0,True


TODO: In the cell below, print the logs of the web service and delete the service

In [21]:

sample_payload

'{"data": [{"age": 60, "sex": 1, "cp": 0, "trtbps": 145, "chol": 282, "fbs": 0, "restecg": 0, "thalachh": 142, "exng": 1, "oldpeak": 2.8, "slp": 1, "caa": 2, "thall": 3}, {"age": 43, "sex": 0, "cp": 2, "trtbps": 122, "chol": 213, "fbs": 0, "restecg": 1, "thalachh": 165, "exng": 0, "oldpeak": 0.2, "slp": 1, "caa": 0, "thall": 2}, {"age": 65, "sex": 1, "cp": 0, "trtbps": 110, "chol": 248, "fbs": 0, "restecg": 0, "thalachh": 158, "exng": 0, "oldpeak": 0.6, "slp": 2, "caa": 2, "thall": 1}, {"age": 61, "sex": 1, "cp": 2, "trtbps": 150, "chol": 243, "fbs": 1, "restecg": 1, "thalachh": 137, "exng": 1, "oldpeak": 1.0, "slp": 1, "caa": 0, "thall": 2}, {"age": 49, "sex": 0, "cp": 0, "trtbps": 130, "chol": 269, "fbs": 0, "restecg": 1, "thalachh": 163, "exng": 0, "oldpeak": 0.0, "slp": 2, "caa": 0, "thall": 2}, {"age": 46, "sex": 0, "cp": 1, "trtbps": 105, "chol": 204, "fbs": 0, "restecg": 1, "thalachh": 172, "exng": 0, "oldpeak": 0.0, "slp": 2, "caa": 0, "thall": 2}, {"age": 44, "sex": 1, "cp": 1

In [None]:
print(service.get_logs())
service.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
