# Hyperparameter Tuning using HyperDrive

Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
import azureml.core
from azureml.core import Workspace, Experiment
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import normal, uniform, choice
from azureml.core import Environment
from azureml.core import ScriptRunConfig
import os

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.27.0


## Initialize Workspace

In [3]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

quick-starts-ws-144943
aml-quickstarts-144943
southcentralus
3d1a56d2-7c81-4118-9790-f85d1acf0c77


## Create an Azure ML experiment

In [4]:
experiment_name = 'loan-prediction-h'
project_folder = './loan-prediction-h-project'

experiment=Experiment(ws, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
loan-prediction-h,quick-starts-ws-144943,Link to Azure Machine Learning studio,Link to Documentation


## Create a compute cluster

In [6]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

# NOTE: update the cluster name to match the existing cluster
# Choose a name for your CPU cluster
amlcompute_cluster_name = "cpu-cluster-h"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_DS12_v2',# for GPU, use "STANDARD_NC6"
                                                           #vm_priority = 'lowpriority', # optional
                                                           max_nodes=10)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)
    compute_target.wait_for_completion(show_output=True, min_node_count = 1, timeout_in_minutes = 10)

Found existing cluster, use it.


## Dataset
In this project, we use a [loan prediction problem dataset](https://www.kaggle.com/altruistdelhite04/loan-prediction-problem-dataset) from Kaggle.
The dataset contains 11 features and the target column **Loan_Status**.

The dataset will be retrieved in the **train.py**.

## Hyperdrive Configuration

In this project, we use the [Scikit-learn Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) as a classification algorithm.

We specify two hyperparameters, one is the inverse of regularization strength(**C**) and another is the maximum number of iterations to converge(**max_iter**).

In terms of parameter sampling, we use [Random Parameter Sampling](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.randomparametersampling?view=azure-ml-py). The random sampling supports early termination of low performance runs, therefore, we can save time for training and cost for computing resource and this is good especially for the initial search. 
This time, the choice of 6 parameters for C, and the choice of 5 parameters are applied.

Regarding an early termination policy, we use [Bandit Policy](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.banditpolicy?view=azure-ml-py). 
This policy ends runs when the primary metric isn't withing the specified slack factor/amount of the most successful run.

In Hyperdrive configuration, we specify **Accuracy** as a primary metric which is the same as AutoML project and the primary metric goal is **PrimaryMetricGoal.MAXIMIZE** to maximize the primary metric.

We also specify the following two parameters to limit iterations.
- **max_total_runs**: 1000 (The maximum total number of runs to create)
- **max_concurrent_runs**: 10 (The maximum number of runs to execute concurrently)

Reference:
[HyperDriveConfig Class](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.hyperdriveconfig?view=azure-ml-py)

In [7]:
# TODO: Create an early termination policy. This is not required if you are using Bayesian sampling.
# https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters#early-termination
# https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.banditpolicy?view=azure-ml-py#definition
early_termination_policy = BanditPolicy(evaluation_interval=100, delay_evaluation=200, slack_factor=0.2)

#TODO: Create the different params that you will be using during training
# https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters#random-sampling
# https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.randomparametersampling?view=azure-ml-py
# https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
# https://towardsdatascience.com/dont-sweat-the-solver-stuff-aea7cddc3451
# https://www.kaggle.com/joparga3/2-tuning-parameters-for-logistic-regression
#param_sampling = RandomParameterSampling({
#    "C": choice(0.1, 0.5, 1, 1.5, 2.5, 5, 7.5, 10),
#    "max_iter": choice(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
#})
param_sampling = RandomParameterSampling({
    "C": choice(0.01, 0.1, 1, 5, 7.5, 10),
    "max_iter": choice(10, 25, 50, 75, 100)
})

#if "training" not in os.listdir():
#    os.mkdir("./training")

# Create a SKLearn estimator for use with train.py
# https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.sklearn.sklearn?view=azure-ml-py
sklearn_env = Environment.from_conda_specification(name='sklearn-env', file_path='./conda_dependencies.yml')
src = ScriptRunConfig(source_directory='.',
                     script='./train.py',
                     compute_target=compute_target,
                     environment=sklearn_env)

# Create a HyperDriveConfig using the estimator, hyperparameter sampler, and policy.
# https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.hyperdriveconfig?view=azure-ml-py
hd_config = HyperDriveConfig(run_config=src,
                                    hyperparameter_sampling=param_sampling,
                                    policy=early_termination_policy,
                                    primary_metric_name='Accuracy',
                                    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                    max_total_runs=1000,
                                    max_concurrent_runs=10)

In [8]:
#TODO: Submit your experiment
hd_run = experiment.submit(config=hd_config)

## Run Details

In the cell below, use the `RunDetails` widget to show the different experiments.

In [9]:
RunDetails(hd_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

In [10]:
hd_run.wait_for_completion(show_output=True)

RunId: HD_849bbb66-1586-49df-a3d0-1fe01790910c
Web View: https://ml.azure.com/runs/HD_849bbb66-1586-49df-a3d0-1fe01790910c?wsid=/subscriptions/3d1a56d2-7c81-4118-9790-f85d1acf0c77/resourcegroups/aml-quickstarts-144943/workspaces/quick-starts-ws-144943&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254

Streaming azureml-logs/hyperdrive.txt

"<START>[2021-05-16T10:15:27.142406][API][INFO]Experiment created<END>\n""<START>[2021-05-16T10:15:27.856299][GENERATOR][INFO]Trying to sample '10' jobs from the hyperparameter space<END>\n""<START>[2021-05-16T10:15:28.131403][GENERATOR][INFO]Successfully sampled '10' jobs, they will soon be submitted to the execution target.<END>\n"

Execution Summary
RunId: HD_849bbb66-1586-49df-a3d0-1fe01790910c
Web View: https://ml.azure.com/runs/HD_849bbb66-1586-49df-a3d0-1fe01790910c?wsid=/subscriptions/3d1a56d2-7c81-4118-9790-f85d1acf0c77/resourcegroups/aml-quickstarts-144943/workspaces/quick-starts-ws-144943&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254



{'runId': 'HD_849bbb66-1586-49df-a3d0-1fe01790910c',
 'target': 'cpu-cluster-h',
 'status': 'Completed',
 'startTimeUtc': '2021-05-16T10:15:26.885155Z',
 'endTimeUtc': '2021-05-16T10:29:32.596263Z',
 'properties': {'primary_metric_config': '{"name": "Accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '2c8a3486-1a69-4b20-b894-978b7e521f60',
  'score': '0.875',
  'best_child_run_id': 'HD_849bbb66-1586-49df-a3d0-1fe01790910c_0',
  'best_metric_status': 'Succeeded'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://mlstrg144943.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_849bbb66-1586-49df-a3d0-1fe01790910c/azureml-logs/hyperdrive.txt?sv=2019-02-02&sr=b&sig=V4zotKxN5p8umrWdVHlxgnS%2FcrYM6aK5FOQcb8cJxUc%3D&st=2021-05-16T10%3A19%3A38Z&se=2021-05-16T18%3A29%3A38Z&sp=r'},
 'submittedBy': 'ODL_User 144943'}

## Best Model

In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [11]:
import joblib
# Get your best run and save the model from that run.
# https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters#find-the-best-model
# https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn?view=azure-ml-py#save-and-register-the-model

best_run = hd_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
arguments = best_run.get_details()['runDefinition']['arguments']
print('Best Run Id: ', best_run.id)
print('Accuracy: ', best_run_metrics['Accuracy'])
print('C: ', arguments[1])
print('max_iter: ', arguments[3])

Best Run Id:  HD_849bbb66-1586-49df-a3d0-1fe01790910c_0
Accuracy:  0.875
C:  7.5
max_iter:  100


### Register the best model

In [12]:
# Register the best model
model = best_run.register_model(model_name='loan-prediction-hd-model',
                               model_path='./outputs/model.joblib',
                               tags={'Method':'Hyperdrive'},
                                description='Hyperdrive Model trained on loan prediction data to predict a loan status of customers',
                               properties={'Accuracy':best_run_metrics['Accuracy']})

In [13]:
# Save the model in the local project folder
best_run.download_file('outputs/model.joblib', project_folder + '/outputs/model.joblib')

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

In the cell below, register the model, create an inference config and deploy the model as a web service.

In [5]:
# This is needed only when resuming the project
from azureml.core import Model
model = Model(ws, 'loan-prediction-hd-model')

#### Prepare a service environment

In [15]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
import sklearn


service_env = Environment('my-sklearn-environment')
service_env.python.conda_dependencies = CondaDependencies.create(pip_packages=[
    'azureml-defaults',
    'inference-schema[numpy-support]',
    'joblib',
    'numpy',
    'scikit-learn=={}'.format(sklearn.__version__)
])

In [18]:
# Save the environment definition to the local project folder
service_env.save_to_directory(project_folder + '/service_env')

#### Deploy the best model

In [16]:
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice
from azureml.core import Model

service_name = 'loan-prediction-hd-service'

inference_config = InferenceConfig(entry_script=project_folder + '/score.py', environment=service_env)
aci_config = AciWebservice.deploy_configuration(cpu_cores=1,
                                                memory_gb=1,
                                                enable_app_insights=True,
                                                description="Loan status prediction service")

service = Model.deploy(workspace=ws,
                       name=service_name,
                       models=[model],
                       inference_config=inference_config,
                       deployment_config=aci_config,
                       overwrite=True)
service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-05-16 11:28:11+00:00 Creating Container Registry if not exists.
2021-05-16 11:28:11+00:00 Registering the environment.
2021-05-16 11:28:12+00:00 Building image..
2021-05-16 11:33:26+00:00 Generating deployment configuration.
2021-05-16 11:33:27+00:00 Submitting deployment to compute..
2021-05-16 11:33:31+00:00 Checking the status of deployment loan-prediction-hd-service..
2021-05-16 11:37:32+00:00 Checking the status of inference endpoint loan-prediction-hd-service.
Succeeded
ACI service creation operation finished, operation "Succeeded"


### Test the API by using test dataset
In the cell below, send a request to the web service you deployed to test it.

#### Prepara test data

In [23]:
from azureml.data.dataset_factory import TabularDatasetFactory

web_path = ['https://raw.githubusercontent.com/fnakashima/nd00333-capstone/master/starter_file/dataset/train_u6lujuX_CVtuZ9i.csv']
ds = TabularDatasetFactory.from_delimited_files(path=web_path)


In [24]:
from train import clean_data #import from local file
from sklearn.model_selection import train_test_split
import pandas as pd

x, y = clean_data(ds)
x


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area
1,1,1,1,1,0,4583,1508.0,128.0,360.0,1.0,3
2,1,1,0,1,1,3000,0.0,66.0,360.0,1.0,1
3,1,1,0,0,0,2583,2358.0,120.0,360.0,1.0,1
4,1,0,0,1,0,6000,0.0,141.0,360.0,1.0,1
5,1,1,2,1,1,5417,4196.0,267.0,360.0,1.0,1
...,...,...,...,...,...,...,...,...,...,...,...
609,2,0,0,1,0,2900,0.0,71.0,360.0,1.0,3
610,1,1,3,1,0,4106,0.0,40.0,180.0,1.0,3
611,1,1,1,1,0,8072,240.0,253.0,360.0,1.0,1
612,1,1,2,1,0,7583,0.0,187.0,360.0,1.0,1


In [25]:
# Split data into train and test sets.
# https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
# Default test_size: 0.25
x_train, x_test, y_train, y_test = train_test_split(x, y)

#### Run the service with test data

In [32]:
import json

input_payload = json.dumps({
    'data': x_test[0:3].values.tolist()
})

output = service.run(input_payload)

print(output)

{"result": [1, 0, 1]}


#### Check the result

In [33]:
y_test[0:3].values

array([1, 0, 1])

In the cell below, print the logs of the web service and delete the service

In [34]:
# Show the logs of the web service
print(service.get_logs())

2021-05-16T11:37:27,462069500+00:00 - iot-server/run 
2021-05-16T11:37:27,467475000+00:00 - rsyslog/run 
2021-05-16T11:37:27,486040200+00:00 - gunicorn/run 
2021-05-16T11:37:27,535022400+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_17d14353fe479000f7ea8024c63d2036/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_17d14353fe479000f7ea8024c63d2036/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_17d14353fe479000f7ea8024c63d2036/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_17d14353fe479000f7ea8024c63d2036/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_17d14353fe479000f7ea8024c63d2036/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
EdgeHubC

In [43]:
# Delete the service
service.delete()

In [45]:
from skl2onnx.common.data_types import Int64TensorType
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import StringTensorType

def convert_dataframe_schema(df, drop=None):
    inputs = []
    countint = 0
    count_float = 0
    countstr = 0
    for k, v in zip(df.columns, df.dtypes):
        if drop is not None and k in drop:
            continue
        if v == 'int64':
            t = Int64TensorType([None, 1])
            countint+=1
        elif v == 'float64':
            t = FloatTensorType([None, 1])
            count_float+=1
        else:
            t = StringTensorType([None, 1])
            countstr+=1
        inputs.append(t)
    print("int : {}, float : {}, str : {}".format(countint,count_float,countstr))
    return inputs

In [46]:
inputs = convert_dataframe_schema(x_test)
print(inputs)

int : 7, float : 4, str : 0
[Int64TensorType(shape=[None, 1]), Int64TensorType(shape=[None, 1]), Int64TensorType(shape=[None, 1]), Int64TensorType(shape=[None, 1]), Int64TensorType(shape=[None, 1]), Int64TensorType(shape=[None, 1]), FloatTensorType(shape=[None, 1]), FloatTensorType(shape=[None, 1]), FloatTensorType(shape=[None, 1]), FloatTensorType(shape=[None, 1]), Int64TensorType(shape=[None, 1])]


In [47]:
# Convert into ONNX format
# http://onnx.ai/sklearn-onnx/#:~:text=sklearn%2Donnx%20converts%20models%20in,is%20tested%20with%20this%20backend.
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from onnxmltools.utils import save_model

target_model = joblib.load(project_folder + '/outputs/model.joblib')
initial_type = [
    ('inttype', Int64TensorType([None, 6])),
    ('float_input', FloatTensorType([None, 4])),
    ('inttype2', Int64TensorType([None, 1]))
    ]
onnx = convert_sklearn(target_model, initial_types=initial_type)
#onnx_model = onnxmltools.convert_sklearn(lr_model, initial_types=initial_type)
onnx_model_path = project_folder + "/outputs/model.onnx"
save_model(onnx, onnx_model_path)

Trying to unpickle estimator LogisticRegression from version 0.24.1 when using version 0.22.2.post1. This might lead to breaking code or invalid results. Use at your own risk.


RuntimeError: For operator SklearnLinearClassifier (type: SklearnLinearClassifier), at most 1 input(s) is(are) supported but we got 3 output(s) which are ['inttype', 'float_input', 'inttype2']

In [38]:
onnx_model = Model.register(workspace=ws,
                       model_name='loan-prediction-hd-onnx-model',                  # Name of the registered model in your workspace.
                       model_path=onnx_model_path,              # Local ONNX model to upload and register as a model.
                       model_framework=Model.Framework.ONNX ,      # Framework used to create the model.
                       model_framework_version='1.3',              # Version of ONNX used to create the model.
                       description='Loan status prediction ONNX model')

print('Name:', onnx_model.name)

Registering model loan-prediction-hd-onnx-model
Name: loan-prediction-hd-onnx-model


In [40]:
from azureml.core import Webservice
from azureml.exceptions import WebserviceException

onnx_service_name = 'loan-prediction-hd-onnx-service'

service = Model.deploy(ws, onnx_service_name, [onnx_model])
service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-05-16 12:37:12+00:00 Creating Container Registry if not exists.
2021-05-16 12:37:12+00:00 Registering the environment.
2021-05-16 12:37:13+00:00 Use the existing image.
2021-05-16 12:37:14+00:00 Submitting deployment to compute.
2021-05-16 12:37:17+00:00 Checking the status of deployment loan-prediction-hd-onnx-service..
2021-05-16 12:38:11+00:00 Checking the status of inference endpoint loan-prediction-hd-onnx-service.
Succeeded
ACI service creation operation finished, operation "Succeeded"


In [43]:
import json

input_payload = json.dumps({
    'float_input': x_test[0:3].values.tolist(),
    'method': 'predict'
})

output = service.run(input_payload)

print(output)

Received bad response from service. More information can be found by calling `.get_logs()` on the webservice object.
Response Code: 502
Headers: {'Content-Length': '368', 'Content-Type': 'application/json', 'X-Ms-Request-Id': '17773f42-f6ee-441b-9bfa-d8e618ccf75e', 'Date': 'Sun, 16 May 2021 12:55:55 GMT'}
Content: b'{"error_code": 500, "error_message": "ONNX Runtime Status Code: 6. Non-zero status code returned while running LinearClassifier node. Name:\'LinearClassifier\' Status Message: /onnxruntime/include/onnxruntime/core/framework/op_kernel.h:90 const T* onnxruntime::OpKernelContext::Input(int) const [with T = onnxruntime::Tensor] Missing Input: float_input\\nStacktrace:\\n"}\n'



WebserviceException: WebserviceException:
	Message: Received bad response from service. More information can be found by calling `.get_logs()` on the webservice object.
Response Code: 502
Headers: {'Content-Length': '368', 'Content-Type': 'application/json', 'X-Ms-Request-Id': '17773f42-f6ee-441b-9bfa-d8e618ccf75e', 'Date': 'Sun, 16 May 2021 12:55:55 GMT'}
Content: b'{"error_code": 500, "error_message": "ONNX Runtime Status Code: 6. Non-zero status code returned while running LinearClassifier node. Name:\'LinearClassifier\' Status Message: /onnxruntime/include/onnxruntime/core/framework/op_kernel.h:90 const T* onnxruntime::OpKernelContext::Input(int) const [with T = onnxruntime::Tensor] Missing Input: float_input\\nStacktrace:\\n"}\n'
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Received bad response from service. More information can be found by calling `.get_logs()` on the webservice object.\nResponse Code: 502\nHeaders: {'Content-Length': '368', 'Content-Type': 'application/json', 'X-Ms-Request-Id': '17773f42-f6ee-441b-9bfa-d8e618ccf75e', 'Date': 'Sun, 16 May 2021 12:55:55 GMT'}\nContent: b'{\"error_code\": 500, \"error_message\": \"ONNX Runtime Status Code: 6. Non-zero status code returned while running LinearClassifier node. Name:\\'LinearClassifier\\' Status Message: /onnxruntime/include/onnxruntime/core/framework/op_kernel.h:90 const T* onnxruntime::OpKernelContext::Input(int) const [with T = onnxruntime::Tensor] Missing Input: float_input\\\\nStacktrace:\\\\n\"}\\n'"
    }
}