# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [25]:
import ames # The module for loading external data - Ames Housing dataset
import os
import pandas as pd
import json
import ast
import pickle

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core import Workspace, Dataset, Experiment, Model, Environment, ScriptRunConfig
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.widgets import RunDetails


In [2]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\n')

quick-starts-ws-153813
aml-quickstarts-153813
southcentralus
3d1a56d2-7c81-4118-9790-f85d1acf0c77


In [3]:
# Create compute cluster
# Choose a name for your CPU cluster
cpu_cluster_name = "cpu-cluster"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Dataset

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [4]:
# Try to load the dataset from the workspace. Otherwise, load if from Kaggle
found = False
ds_key = 'Ames-housing-dataset'
ds_desc = 'Ames Housing training data.'

if ds_key in ws.datasets.keys():
    found = True
    dataset = ws.datasets[ds_key]
    print(f'Found registered {ds_key}, use it.')
    
if not found:
    train, test = ames.load_data_clean()
    print(f"train.shape = {train.shape}, test.shape = {test.shape}")
    # Register the train dataset
    blob = ws.get_default_datastore()
    dataset = TabularDatasetFactory.register_pandas_dataframe(train, blob, name=ds_key, description=ds_desc)

Found registered Ames-housing-dataset, use it.


In [54]:
%%writefile train_xgb.py
"""Train, evaluate and log metrics for selected ML algorithm 
in the Azure workspace context."""

import argparse
import os
import numpy as np
import pandas as pd
import joblib
import ames

from azureml.core.run import Run
from azureml.core import Workspace

from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score


ws = Workspace.from_config()
ds_key = 'Ames-housing-dataset'
dataset = ws.datasets[ds_key]

train = dataset.to_pandas_dataframe()

X_train, X_test = train_test_split(ames.label_encode(ames.encode_dtypes(train)))
y_train = X_train.pop('SalePrice')
y_test = X_test.pop('SalePrice')

print(f"X_train.shape = {X_train.shape}, X_test.shape = {X_test.shape}")

run = Run.get_context()

parser = argparse.ArgumentParser()

parser.add_argument('--learning_rate', type=float, default=0.1,
                   help='Step size shrinkage used in update to prevent overfiffing')

parser.add_argument('--gamma', type=float, default=2,
                   help='Minimum loss reduction required to make a further partition on a leaf node of the tree')

parser.add_argument('--max_depth', type=int, default=3,
                   help='Maximum depth of a tree')

args = parser.parse_args()
run.log('Learning rate', np.float(args.learning_rate))
run.log('Gamma', np.float(args.gamma))
run.log('Maximum depth', np.float(args.max_depth))

model = XGBRegressor(learning_rate=args.learning_rate, gamma=args.gamma, max_depth=args.max_depth, objective='reg:squarederror')
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
r2 = r2_score(y_test, y_pred)

run.log("r2_score", np.float(r2))
print(f'Writting r2 score = {r2} into a log.')

os.makedirs('./outputs', exist_ok=True)
joblib.dump(model, './outputs/model.joblib')

Overwriting train_xgb.py


In [50]:
! python train_xgb.py

X_train.shape = (1095, 79), X_test.shape = (365, 79)
Attempted to log scalar metric Learning rate:
0.1
Attempted to log scalar metric Gamma:
2.0
Attempted to log scalar metric Maximum depth:
3.0
  if getattr(data, 'base', None) is not None and \
  data.base is not None and isinstance(data, np.ndarray) \
Attempted to log scalar metric r2_score:
0.9165099581562921
Writting r2 score = 0.9165099581562921 into a log.


In [45]:
%%writefile conda_env.yml

dependencies:
- python=3.6.2
- pip:
  - azureml-defaults==1.32.0
- scikit-learn
- xgboost

Overwriting conda_env.yml


In [55]:
# Define an Azure ML environment
# Dependencies are the same as for AutoML experiment
env = Environment.from_conda_specification(name='env', file_path='conda_env.yml')

# Configure the training job
src = ScriptRunConfig(source_directory=".",
                     script='train_xgb.py',
                     arguments=['--learning_rate', 0.01, '--gamma', 5, '--max_depth', 5],
                     compute_target=cpu_cluster,
                     environment=env)

## Hyperdrive Configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

In [56]:
# Choose a name for an experiment
experiment_name = 'Ames-housing-hdr'

experiment=Experiment(ws, experiment_name)

In [57]:
# Test the script
run = experiment.submit(src)

In [None]:
# TODO: Create an early termination policy. This is not required if you are using Bayesian sampling.
early_termination_policy = <your policy here>

#TODO: Create the different params that you will be using during training
param_sampling = <your params here>

#TODO: Create your estimator and hyperdrive config
estimator = <your estimator here>

hyperdrive_run_config = <your config here?

In [None]:
#TODO: Submit your experiment

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [58]:
RunDetails(run).show()
run.wait_for_completion(show_output=True)

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

RunId: Ames-housing-hdr_1628257240_405a3e2a
Web View: https://ml.azure.com/runs/Ames-housing-hdr_1628257240_405a3e2a?wsid=/subscriptions/3d1a56d2-7c81-4118-9790-f85d1acf0c77/resourcegroups/aml-quickstarts-153813/workspaces/quick-starts-ws-153813&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254

Streaming azureml-logs/55_azureml-execution-tvmps_3cfbfa5418d554cc015e702d766426ff9357468d96655e0d092db89d30f96195_d.txt

2021-08-06T13:40:53Z Successfully mounted a/an Blobfuse File System at /mnt/batch/tasks/shared/LS_root/jobs/quick-starts-ws-153813/azureml/ames-housing-hdr_1628257240_405a3e2a/mounts/workspaceblobstore
2021-08-06T13:40:54Z The vmsize standard_d2_v2 is not a GPU VM, skipping get GPU count by running nvidia-smi command.
2021-08-06T13:40:54Z Starting output-watcher...
2021-08-06T13:40:54Z IsDedicatedCompute == True, won't poll for Low Pri Preemption
2021-08-06T13:40:54Z Executing 'Copy ACR Details file' on 10.0.0.5
2021-08-06T13:40:54Z Copy ACR Details file succeeded on 10.0.0.5. Output


Streaming azureml-logs/70_driver_log.txt

2021/08/06 13:41:00 Starting App Insight Logger for task:  runTaskLet
2021/08/06 13:41:00 Version: 3.0.01676.0004 Branch: 2021-07-23 Commit: 2766ca7
2021/08/06 13:41:00 Attempt 1 of http call to http://10.0.0.5:16384/sendlogstoartifacts/info
2021/08/06 13:41:00 Send process info logs to master server succeeded
2021/08/06 13:41:00 Attempt 1 of http call to http://10.0.0.5:16384/sendlogstoartifacts/status
2021/08/06 13:41:00 Send process info logs to master server succeeded
[2021-08-06T13:41:00.143560] Entering context manager injector.
[2021-08-06T13:41:00.677274] context_manager_injector.py Command line Options: Namespace(inject=['ProjectPythonPath:context_managers.ProjectPythonPath', 'RunHistory:context_managers.RunHistory', 'TrackUserError:context_managers.TrackUserError'], invocation=['train_xgb.py', '--learning_rate', '0.01', '--gamma', '5', '--max_depth', '5'])
Script type = None
[2021-08-06T13:41:00.682604] Entering Run History Context M

{'runId': 'Ames-housing-hdr_1628257240_405a3e2a',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-08-06T13:40:53.328481Z',
 'endTimeUtc': '2021-08-06T13:45:31.683389Z',
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': '050999db-ac8c-4eb4-a6a2-f557f701e232',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json',
  'azureml.RuntimeType': 'Hosttools'},
 'inputDatasets': [{'dataset': {'id': '9dcb2684-b597-40f4-a301-50963b0ee143'}, 'consumptionDetails': {'type': 'Reference'}}],
 'outputDatasets': [],
 'runDefinition': {'script': 'train_xgb.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': ['--learning_rate', '0.01', '--gamma', '5', '--max_depth', '5'],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'cpu-cluster',
  'dataReferences': {},
  'data': {},
  'outputData': {},
  'datacaches': [],
  'jobName': None,


## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [None]:
#TODO: Save the best model

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service

In [59]:
# Delete() is used to deprovision and delete the AmlCompute target. 
cpu_cluster.delete()

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

