# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [None]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import choice
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core import Workspace, ScriptRunConfig, Environment
from azureml.core import Model 
import requests # Used for http post request
import json

## Dataset
### Context

According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths.
This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relavant information about the patient.

### Overview
The data is provided via the following Kaggle source: https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset

The data is provided as a .csv file and ist structured as followed.

Attribute Information:
1) id: unique identifier
2) gender: "Male", "Female" or "Other"
3) age: age of the patient
4) hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension
5) heart_disease: 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease
6) ever_married: "No" or "Yes"
7) work_type: "children", "Govt_jov", "Never_worked", "Private" or "Self-employed"
8) Residence_type: "Rural" or "Urban"
9) avg_glucose_level: average glucose level in blood
10) bmi: body mass index
11) smoking_status: "formerly smoked", "never smoked", "smokes" or "Unknown"*
12) stroke: 1 if the patient had a stroke or 0 if not
*Note: "Unknown" in smoking_status means that the information is unavailable for this patient

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

# Create workspace

In [2]:
ws = Workspace.from_config()

## Load data from Datastore

In [3]:
found = False
key = "Stroke Dataset"
description_text = "This dataset is used to predict whether a patient is likely to get stroke."

if key in ws.datasets.keys():
        found = True
        dataset = ws.datasets[key]

if not found:
        # Create AML Dataset and register it into Workspace
        example_data = 'https://raw.githubusercontent.com/jmtaverne/Udacity--Machine-Learning-Azure-Nanodegree/main/Project%20-%203_%20Capstone%20Project/healthcare-dataset-stroke-data.csv'
        dataset = Dataset.Tabular.from_delimited_files(example_data)
        #Register Dataset in Workspace
        dataset = dataset.register(workspace=ws,
                                   name=key,
                                   description=description_text)


df = dataset.to_pandas_dataframe()
df.describe()

Unnamed: 0,id,age,hypertension,heart_disease,avg_glucose_level,stroke
count,5110.0,5110.0,5110.0,5110.0,5110.0,5110.0
mean,36517.829354,43.226614,0.097456,0.054012,106.147677,0.048728
std,21161.721625,22.612647,0.296607,0.226063,45.28356,0.21532
min,67.0,0.08,0.0,0.0,55.12,0.0
25%,17741.25,25.0,0.0,0.0,77.245,0.0
50%,36932.0,45.0,0.0,0.0,91.885,0.0
75%,54682.0,61.0,0.0,0.0,114.09,0.0
max,72940.0,82.0,1.0,1.0,271.74,1.0


## Create compute cluster

In [4]:

cluster_name = "Udactiy-Project-Cluster"

# Verify that cluster does not exist already
try:
    aml_compute = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_D2_V2',
                                                           max_nodes=4)
    aml_compute = ComputeTarget.create(ws, cluster_name, compute_config)

aml_compute.wait_for_completion(show_output=True)


InProgress.
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Create experiment

In [5]:
experiment_name = 'Stroke_prediction_hyper'

experiment=Experiment(ws, experiment_name)

## Hyperdrive Configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

In [8]:
# TODO: Create an early termination policy. This is not required if you are using Bayesian sampling.
early_termination_policy = BanditPolicy(evaluation_interval=1, slack_factor=0.2, delay_evaluation=5)

#TODO: Create the different params that you will be using during training
param_sampling = RandomParameterSampling({"--n_estimators": choice(100,200,300,400,500), "--max_depth": choice(1,2,3,4,5)})

#TODO: Create your estimator and hyperdrive config
env = Environment.get(workspace=ws, name="AzureML-Tutorial")

compute_target = ws.compute_targets['Udactiy-Project-Cluster']
src = ScriptRunConfig(source_directory='.',
                      script='train.py',
                      compute_target=compute_target,
                      environment=env
                      )


hyperdrive_run_config = HyperDriveConfig(hyperparameter_sampling=param_sampling,
                                     primary_metric_name='Accuracy',
                                     primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                     policy=early_termination_policy,
                                     run_config=src,
                                     max_concurrent_runs=4,
                                     max_total_runs=16,                                     
                                    )

In [13]:
#TODO: Submit your experiment
hyperDrive_run = experiment.submit(hyperdrive_run_config)

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [14]:
RunDetails(hyperDrive_run).show()
hyperDrive_run.wait_for_completion(show_output=True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_84d0a147-cb78-40bb-91fd-086d0e20ff76
Web View: https://ml.azure.com/runs/HD_84d0a147-cb78-40bb-91fd-086d0e20ff76?wsid=/subscriptions/6971f5ac-8af1-446e-8034-05acea24681f/resourcegroups/aml-quickstarts-196660/workspaces/quick-starts-ws-196660&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254

Streaming azureml-logs/hyperdrive.txt

"<START>[2022-05-25T11:30:55.782653][API][INFO]Experiment created<END>\n""<START>[2022-05-25T11:30:56.536200][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space<END>\n"<START>[2022-05-25T11:30:57.3739474Z][SCHEDULER][INFO]Scheduling job, id='HD_84d0a147-cb78-40bb-91fd-086d0e20ff76_0'<END><START>[2022-05-25T11:30:57.6462033Z][SCHEDULER][INFO]Scheduling job, id='HD_84d0a147-cb78-40bb-91fd-086d0e20ff76_2'<END><START>[2022-05-25T11:30:57.6017987Z][SCHEDULER][INFO]Scheduling job, id='HD_84d0a147-cb78-40bb-91fd-086d0e20ff76_1'<END><START>[2022-05-25T11:30:57.7451206Z][SCHEDULER][INFO]Successfully scheduled a job. Id='HD_84d0a147-cb78-40bb-91fd-

{'runId': 'HD_84d0a147-cb78-40bb-91fd-086d0e20ff76',
 'target': 'Udactiy-Project-Cluster',
 'status': 'Completed',
 'startTimeUtc': '2022-05-25T11:30:55.511755Z',
 'endTimeUtc': '2022-05-25T11:39:31.632015Z',
 'services': {},
 'properties': {'primary_metric_config': '{"name": "Accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '83bf0f3b-1fc0-4f76-a883-8bac7f26083a',
  'user_agent': 'python/3.8.5 (Linux-5.4.0-1077-azure-x86_64-with-glibc2.10) msrest/0.6.21 Hyperdrive.Service/1.0.0 Hyperdrive.SDK/core.1.41.0',
  'space_size': '25',
  'score': '0.7619823489477257',
  'best_child_run_id': 'HD_84d0a147-cb78-40bb-91fd-086d0e20ff76_2',
  'best_metric_status': 'Succeeded',
  'best_data_container_id': 'dcid.HD_84d0a147-cb78-40bb-91fd-086d0e20ff76_2'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://mlstrg196660.blob.core.window

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [16]:
best_run = hyperDrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()

print('Best Run Id: ', best_run.id)
print('\n Accuracy:', best_run_metrics['Accuracy'])
print('\n n_estimators:',best_run_metrics['n_estimators:'])
print('\n max_depth:',best_run_metrics['max_depth:'])




Best Run Id:  HD_84d0a147-cb78-40bb-91fd-086d0e20ff76_2

 Accuracy: 0.7619823489477257

 n_estimators: 100

 max_depth: 1


In [18]:
#TODO: Save the best model
model = best_run.register_model(model_name='HyperDrive_HighAccuracy', model_path='outputs/',
                                properties={'Accuracy': best_run_metrics['Accuracy'],
                                            'n_estimatorsh': best_run_metrics['n_estimators:'],
                                           'max_depth': best_run_metrics['max_depth:']})

In [23]:
# List registered models to verify if model has been saved
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')


HyperDrive_HighAccuracy version: 1
	 Accuracy : 0.7619823489477257
	 n_estimatorsh : 100
	 max_depth : 1


capstoneModel_automl version: 1
	 Training context : Auto ML
	 Accuracy : 0.7879439518496805




**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.

