# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [8]:
#!pip install --upgrade azureml-sdk
#!pip install --upgrade azureml-core

In [None]:
#!pip list

In [17]:
import azureml.core
from azureml.core import Workspace, Environment, Experiment, Datastore, Dataset, ScriptRunConfig
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import RunConfiguration
from azureml.exceptions import ComputeTargetException
from azureml.pipeline.core import Pipeline, PipelineData, TrainingOutput
from azureml.pipeline.steps import HyperDriveStep, HyperDriveStepRun, PythonScriptStep
from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal
from azureml.train.hyperdrive import choice, loguniform

import os
import shutil
import urllib
import numpy as np
import matplotlib.pyplot as plt


from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from sklearn.ensemble import RandomForestClassifier  # Random Forests の分類器
#from sklearn.ensemble import RandomForestRegressor  # Random Forests の回帰器



# Check core SDK version number
print("SDK version:", azureml.core.VERSION)



SDK version: 1.49.0


In [18]:
from azureml.core import Workspace, Experiment

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

script_folder = './hyper_data'
os.makedirs(script_folder, exist_ok=True)

experiment_name = 'hyper_drive_exp'
exp=Experiment(ws, experiment_name)

run = exp.start_logging()

quick-starts-ws-237827
aml-quickstarts-237827
westeurope
3e42d11f-d64d-4173-af9b-12ecaa1030b3


In [19]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

# Create the cluster
amlcompute_cluster_name = "auto-ml"

compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
    # for GPU, use "STANDARD_NC6"
    #vm_priority = 'lowpriority', # optional
    max_nodes=4)
compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

#compute_target.wait_for_completion(show_output=True, min_node_count = 1, timeout_in_minutes = 10)
# For a more detailed view of current AmlCompute status, use get_status().

InProgress.
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded.....................................................................................................................
AmlCompute wait for completion finished

Wait timeout has been reached
Current provisioning state of AmlCompute is "Succeeded" and current node count is "0"


## Dataset

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [20]:
import pandas as pd
# Create AML Dataset and register it into Workspace
key='car evaluation data set'
data = 'https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'

df = pd.read_csv(data)
columns = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class']
df.columns = columns

# Convert the DataFrame to a TabularDataset
dataset = Dataset.Tabular.register_pandas_dataframe(
    dataframe=df, 
    target=(ws.get_default_datastore(), key), 
    name=key, 
    description='car evaluation data set')

df.describe()

Validating arguments.
Arguments validated.
Successfully obtained datastore reference and path.
Uploading file to car evaluation data set/cf8cb2d8-45fe-4f6a-b9ab-d08cf7081dd4/
Successfully uploaded file to datastore.
Creating and registering a new dataset.
Successfully created and registered a new dataset.


Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,class
count,1727,1727,1727,1727,1727,1727,1727
unique,4,4,4,3,3,3,4
top,high,high,3,4,med,high,unacc
freq,432,432,432,576,576,576,1209


## Hyperdrive Configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

model:RamdomForests

    Random Forests is an ensemble learning method that combines multiple decision trees. It can exhibit strong classification performance on datasets that include categorical data. By combining multiple decision trees, it helps mitigate overfitting and improves generalization performance.
    
hyperparameters:

    n_estimators: the model's complexity and expressive power
    
    min_samples_split: the minimum number of samples required for a split node. affect to the model's generalization performance.
    
    min_samples_leaf: the minimum number of samples required for a leaf node.
    
    
tarmination policy:

    slack_factor; triggers early termination if the performance of the current run is more than 15% worse than the best performing run.
    
    evaluation_interval;The progress is evaluated at each to make decisions for early termination.
    
    delay_evaluation; to avoid the possibility of the early termination policy reaching the termination condition before the first evaluation.
    
config setting:
    

In [21]:
# TODO: Create an early termination policy. This is not required if you are using Bayesian sampling.
#https://learn.microsoft.com/ja-jp/azure/machine-learning/how-to-tune-hyperparameters?view=azureml-api-1&preserve-view=true
early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)

#TODO: Create the different params that you will be using during training
#https://learn.microsoft.com/ja-jp/azure/machine-learning/how-to-tune-hyperparameters?view=azureml-api-1&preserve-view=true

param_sampling = RandomParameterSampling({
    "--n_estimators": choice(100, 500, 1000),
    "--min_samples_split": choice(2, 10, 20),
    "--min_samples_leaf": choice(1, 5, 10),
})

#TODO: Create your estimator and hyperdrive config
env = Environment.from_conda_specification(
    name='my_environment',
    file_path=script_folder+'/environment.yml'
)
#src = ScriptRunConfig(
#    source_directory="./",
#    script="train.py",
#    compute_target="auto-ml",
#    environment=env
#)
#estimator = src.get_estimator(environment=env)
from azureml.train.estimator import Estimator

estimator = Estimator(
    source_directory=script_folder+"/",
    script_params={},
    compute_target="auto-ml",
    entry_script="train.py",
    #environment=env
)

# Set the environment on the run configuration
estimator.run_config.environment = env

#hyperdrive_run_config = <your config here>
#https://learn.microsoft.com/ja-jp/azure/machine-learning/how-to-tune-hyperparameters?view=azureml-api-1&preserve-view=true
hyperdrive_config = HyperDriveConfig(
    estimator=estimator,
    #run_config=src,
    hyperparameter_sampling=param_sampling,
    policy=early_termination_policy,
    primary_metric_name="AUC_weighted",
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=100,
    max_concurrent_runs=4)



In [22]:
#TODO: Submit your experiment
hyperdrive_run = exp.submit(hyperdrive_config, show_output=True)




In [23]:
hyperdrive_run.wait_for_completion(show_output=True)


RunId: HD_b79e7e88-51ae-40ba-a5e3-3c7f03be75c8
Web View: https://ml.azure.com/runs/HD_b79e7e88-51ae-40ba-a5e3-3c7f03be75c8?wsid=/subscriptions/3e42d11f-d64d-4173-af9b-12ecaa1030b3/resourcegroups/aml-quickstarts-237827/workspaces/quick-starts-ws-237827&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254

Streaming azureml-logs/hyperdrive.txt

[2023-06-28T04:36:25.462351][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space
[2023-06-28T04:36:25.9005599Z][SCHEDULER][INFO]Scheduling job, id='HD_b79e7e88-51ae-40ba-a5e3-3c7f03be75c8_0' 
[2023-06-28T04:36:26.1134776Z][SCHEDULER][INFO]Scheduling job, id='HD_b79e7e88-51ae-40ba-a5e3-3c7f03be75c8_1' 
[2023-06-28T04:36:26.2267978Z][SCHEDULER][INFO]Scheduling job, id='HD_b79e7e88-51ae-40ba-a5e3-3c7f03be75c8_2' 
[2023-06-28T04:36:26.3375497Z][SCHEDULER][INFO]Scheduling job, id='HD_b79e7e88-51ae-40ba-a5e3-3c7f03be75c8_3' 
[2023-06-28T04:36:26.302184][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution t

{'runId': 'HD_b79e7e88-51ae-40ba-a5e3-3c7f03be75c8',
 'target': 'auto-ml',
 'status': 'Completed',
 'startTimeUtc': '2023-06-28T04:36:24.877161Z',
 'endTimeUtc': '2023-06-28T04:49:25.836677Z',
 'services': {},
 'properties': {'primary_metric_config': '{"name":"AUC_weighted","goal":"maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '007911b2-5df1-4e08-b429-673a78458576',
  'user_agent': 'python/3.8.5 (Linux-5.15.0-1035-azure-x86_64-with-glibc2.10) msrest/0.7.1 Hyperdrive.Service/1.0.0 Hyperdrive.SDK/core.1.49.0',
  'space_size': '27'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'configuration': None,
  'attribution': None,
  'telemetryValues': {'amlClientType': 'azureml-sdk-train',
   'amlClientModule': '[Scrubbed]',
   'amlClientFunction': '[Scrubbed]',
   'tenantId': '660b3398-b80e-49d2-bc5b-ac1dc93b5254',
   'amlClientRequestId': '92715ac9-2324-40c1-9d30-694b1e58ca

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [24]:
RunDetails(hyperdrive_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [25]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
print("Best Run ID: ", best_run.id)
print("AUC_weighted: ", best_run_metrics["AUC_weighted"])

AttributeError: 'NoneType' object has no attribute 'get_metrics'

In [None]:
#TODO: Save the best model
model = best_run.register_model(model_name='hyperdrive_model',
                                model_path=script_folder+'/hyperdrive_model.joblib')

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.

