# Hyperparameter Tuning using HyperDrive

In [22]:
from azureml.core import Workspace, Experiment, Model
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.parameter_expressions import uniform, choice
from azureml.widgets import RunDetails
import requests
import json
from azureml.core.webservice import AciWebservice, LocalWebservice
import sklearn

## Workspace setup

First, we setup our workspace to work with azure.

In [2]:
ws = Workspace.from_config()
experiment_name = 'hyperdrive'

experiment = Experiment(ws, experiment_name)

## Dataset

The dataset used is the [UCI Glass Identification](https://archive.ics.uci.edu/ml/datasets/Glass+Identification) dataset. All data importing and treating is done by the [train.py](https://github.com/reis-r/nd00333-capstone/blob/master/train.py) script. This will be the script used by our Hyperdrive run. The objective will be to classify the glass type according to it's composition and other characteristics. This dataset was chosen because it will not take too much time for cleaning, and it's a very known dataset for experimenting with machine learning.

## Create a compute cluster

In [3]:
cluster_name = "hyperdrive"
# Check if a compute cluster already exists
try:
    print("Trying to connect to an existing cluster...")
    compute_cluster = ComputeTarget(workspace=ws, name=cluster_name)
except ComputeTargetException:
    print("Creating a compute cluster...")
    compute_configuration = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    compute_cluster = ComputeTarget.create(ws, cluster_name, compute_configuration)
    compute_cluster.wait_for_completion(show_output=True)
print("Success!")

Trying to connect to an existing cluster...
Creating a compute cluster...
Creating
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
Success!


## Hyperdrive Configuration

For the Hyperdrive configuration, BanditPolicy was choosen for the termination policy, it terminates when the accuracy of a run is not within the slack amount compared to the best performing run. It's a less conservative policy that might prove sufficient for this experiment.

The algorithm choosen is the SVC, it is a good classification algorithm based on support-vector machine.

The Parameter Sampler is the Random Sampler, this method is faster, but may not provide the best possible results. The regularization parameter (penalty) was configured with uniform sampling, which gives a value uniformly distributed between the minimum and maximum possible values. It's the most basic and safe parameter sampling method for continuous variables.

The choice for the kernel will be random from every value supported by scikit-learn.

In [4]:
# Create an early termination policy
early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

# Create the different params that will be used during training
param_sampling = RandomParameterSampling({
    "--kernel": choice(['linear', 'poly', 'rbf', 'sigmoid', 'precomputed']),
    "--C": uniform(0.1, 1.0)
    })

# Create estimator and hyperdrive config
estimator = est = SKLearn(source_directory="./",
                          entry_script="train.py",
                          compute_target="hyperdrive")

hyperdrive_run_config = HyperDriveConfig(estimator=estimator,
                                         hyperparameter_sampling=param_sampling, 
                                         primary_metric_name='Accuracy',
                                         primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                         max_total_runs=10,
                                         policy=early_termination_policy,
                                         max_concurrent_runs=2)

'SKLearn' estimator is deprecated. Please use 'ScriptRunConfig' from 'azureml.core.script_run_config' with your own defined environment or the AzureML-Tutorial curated environment.


In [5]:
# Submit the experiment
run = experiment.submit(hyperdrive_run_config)



## Run Details

In [6]:
RunDetails(run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

We obtained an accuracy of 100%, while this usually means the model over-learned the trainning dataset, it should take for account that is a simple, classic Machine Learning problem. This may prove that using such advanced featureas Hyperdrive was probably an overkill.

## Best Model

In [8]:
best_run = run.get_best_run_by_primary_metric()
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
hyperdrive,HD_77b4c635-fba8-483e-81ca-62a996783d27_2,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [9]:
print(best_run.get_details()['runDefinition']['arguments'])
print('Run properties:')
print(best_run.get_properties())
print('Best metrics:')
print(best_run.get_metrics())

['--C', '0.5240323756618018', '--kernel', 'poly']
Run properties:
{'_azureml.ComputeTargetType': 'amlcompute', 'ContentSnapshotId': '1ad5d282-f1c1-4df5-a1c6-f0377b8ddf09', 'ProcessInfoFile': 'azureml-logs/process_info.json', 'ProcessStatusFile': 'azureml-logs/process_status.json'}
Best metrics:
{'Kernel type': 'poly', 'Regularization parameter': 0.5240323756618018, 'Accuracy': 1.0}


In [10]:
if "outputs" not in os.listdir():
    os.mkdir("./outputs")

# Save the best model
pickle_filename = "outputs/model.joblib"
best_run.download_file(pickle_filename, output_file_path="outputs/model.joblib")
print("Best model saved.")

Best model saved.


## Model Deployment

In Azure, if using one of the supported models, deploying a model is very simple, first we register the model to our workspace and then we deploy using the `Model.deploy()` method.

In [29]:
# Register the best model
model = best_run.register_model(model_name="glass-prediction", 
                                model_path="outputs/model.joblib",
                                model_framework=Model.Framework.SCIKITLEARN,
                                model_framework_version=sklearn.__version__,
                                description="Glass type prediction model based on UCI data.")
print("Name:", model.name)
print("Version:", model.version)

Name: glass-prediction
Version: 7


In [30]:
# Deploy the model as a web service
service = Model.deploy(ws, "glass-prediction-sklearn", [model], overwrite=True)
service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running....................
Succeeded
ACI service creation operation finished, operation "Succeeded"


Now, we enable Application Insights to get detailed logs from the service and troubleshoot any problems that might occur.

In [31]:
service.update(enable_app_insights=True)

Now that the model is deployed, we can send web requests to it using the HTTP API. We can also get quick predictions using the method `service.run()`.

In [38]:
payload = json.dumps({
    'data': dataset_x[0:2].tolist()
})

print(service.run(payload))

ERROR:azureml.core.webservice.aci:Received bad response from service. More information can be found by calling `.get_logs()` on the webservice object.
Response Code: 400
Headers: {'Connection': 'keep-alive', 'Content-Length': '77', 'Content-Type': 'application/json', 'Date': 'Fri, 29 Jan 2021 01:15:34 GMT', 'Server': 'nginx/1.10.3 (Ubuntu)', 'X-Ms-Request-Id': 'a87fd729-4e8f-43e7-94f6-74087d267a92', 'X-Ms-Run-Function-Failed': 'False'}
Content: b'{"status_code": 400, "message": "\'SVC\' object has no attribute \'break_ties\'"}'



WebserviceException: WebserviceException:
	Message: Received bad response from service. More information can be found by calling `.get_logs()` on the webservice object.
Response Code: 400
Headers: {'Connection': 'keep-alive', 'Content-Length': '77', 'Content-Type': 'application/json', 'Date': 'Fri, 29 Jan 2021 01:15:34 GMT', 'Server': 'nginx/1.10.3 (Ubuntu)', 'X-Ms-Request-Id': 'a87fd729-4e8f-43e7-94f6-74087d267a92', 'X-Ms-Run-Function-Failed': 'False'}
Content: b'{"status_code": 400, "message": "\'SVC\' object has no attribute \'break_ties\'"}'
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Received bad response from service. More information can be found by calling `.get_logs()` on the webservice object.\nResponse Code: 400\nHeaders: {'Connection': 'keep-alive', 'Content-Length': '77', 'Content-Type': 'application/json', 'Date': 'Fri, 29 Jan 2021 01:15:34 GMT', 'Server': 'nginx/1.10.3 (Ubuntu)', 'X-Ms-Request-Id': 'a87fd729-4e8f-43e7-94f6-74087d267a92', 'X-Ms-Run-Function-Failed': 'False'}\nContent: b'{\"status_code\": 400, \"message\": \"\\'SVC\\' object has no attribute \\'break_ties\\'\"}'"
    }
}

AttributeError: 'AciWebservice' object has no attribute 'getlogs'

TODO: In the cell below, print the logs of the web service and delete the service

In [7]:
import azureml.core
print(azureml.core.VERSION)

1.20.0
