# Hyperparameter Tuning pipeline examples

In this example, we'll build a pipeline for Hyperparameter tuning. This pipeline will test multiple hyperparameter permutations and then register the best model.

**Note:** This example requires that you've ran the notebook from the first tutorial, so that the dataset and compute cluster are set up.

In [1]:
import os
import azureml.core
from azureml.core import Workspace, Experiment, Dataset, RunConfiguration
from azureml.pipeline.core import Pipeline, PipelineData
from azureml.pipeline.steps import PythonScriptStep, HyperDriveStep, HyperDriveStepRun
from azureml.data.dataset_consumption_config import DatasetConsumptionConfig
from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal
from azureml.train.hyperdrive import choice, loguniform, uniform
from azureml.core import ScriptRunConfig

print("Azure ML SDK version:", azureml.core.VERSION)

Azure ML SDK version: 1.28.0


First, we will connect to the workspace. The command `Workspace.from_config()` will either:
* Read the local `config.json` with the workspace reference (given it is there) or
* Use the `az` CLI to connect to the workspace and use the workspace attached to via `az ml folder attach -g <resource group> -w <workspace name>`

In [2]:
ws = Workspace.from_config()
print(f'WS name: {ws.name}\nRegion: {ws.location}\nSubscription id: {ws.subscription_id}\nResource group: {ws.resource_group}')

WS name: demo-ent-ws
Region: westeurope
Subscription id: bcbf34a7-1936-4783-8840-8f324c37f354
Resource group: demo


# Preparation

Let's reference the dataset from the first tutorial:

In [3]:
training_dataset = Dataset.get_by_name(ws, "german-credit-train-tutorial")
training_dataset_consumption = DatasetConsumptionConfig("training_dataset", training_dataset).as_download()

Here, we define the parameter sampling (defines the search space for our hyperparameters we want to try), early termination policy (allows to kill poorly performing runs early), then we put this togehter as a `HyperDriveConfig` and execute it in an `HyperDriveStep`. Lastly, we have a short step to register the best model.

In [6]:
runconfig = RunConfiguration.load("runconfig.yml")
script_run_config = ScriptRunConfig(source_directory="./",
                                    run_config=runconfig)
script_run_config.data_references = None

ps = RandomParameterSampling(
    {
        '--c': uniform(0.1, 1.9)
    }
)

# Any run that doesn't fall within:
#   - the slack factor (the ratio used to calculate the allowed distance from the best performing experiment run) or
#   - the slack amount (The absolute distance allowed from the best performing run)
# of the evaluation metric with respect to the best performing run will be terminated.
#
# This policy is applied 'evaluation_interval' times. Each time the training script logs the primary metric
# counts as one interval.
early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

hd_config = HyperDriveConfig(run_config=script_run_config, # <-- it contains the reference to the training script
                             hyperparameter_sampling=ps,
                             policy=early_termination_policy,
                             primary_metric_name='Test accuracy', 
                             primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, 
                             max_total_runs=4,
                             max_concurrent_runs=1)

hd_step = HyperDriveStep(name='hyperparameter-tuning',
                         hyperdrive_config=hd_config,
                         estimator_entry_script_arguments=['--data-path', training_dataset_consumption],
                         inputs=[training_dataset_consumption],
                         outputs=None)

register_step = PythonScriptStep(script_name='register.py',
                                 runconfig=runconfig,
                                 name="register-model",
                                 compute_target="cluster",
                                 arguments=['--model_name', 'best_model'],
                                 allow_reuse=False)

# Explicitly state that registration runs after training, as there is not direct dependency through inputs/outputs
register_step.run_after(hd_step)

steps = [hd_step, register_step]

Finally, we can create our pipeline object and validate it. This will check the input and outputs are properly linked and that the pipeline graph is a non-cyclic graph:

In [7]:
pipeline = Pipeline(workspace=ws, steps=steps, description="HyperDrive Pipeline")
pipeline.validate()

Step hyperparameter-tuning is ready to be created [3dffb913]
Step register-model is ready to be created [dc3415ce]


[]

Lastly, we can submit the pipeline against an experiment:

In [8]:
pipeline_run = Experiment(ws, 'mlops-workshop-pipelines-20210524').submit(pipeline)

Created step hyperparameter-tuning [3dffb913][f0c46008-95c3-4a60-8b2d-88fea1a8209c], (This step will run and generate new outputs)
Created step register-model [dc3415ce][16803518-3427-4765-b5b6-68a80aed09fe], (This step will run and generate new outputs)
Submitted PipelineRun 5e0672cf-874e-421e-a5d5-2ea8370a5bb8
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/5e0672cf-874e-421e-a5d5-2ea8370a5bb8?wsid=/subscriptions/bcbf34a7-1936-4783-8840-8f324c37f354/resourcegroups/demo/workspaces/demo-ent-ws&tid=1f053027-5c7a-4f10-8444-ca55e5715f27


In [9]:
from azureml.widgets import RunDetails
RunDetails(pipeline_run).show()

_PipelineWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', …

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

In [10]:
pipeline_run.wait_for_completion()

PipelineRunId: 5e0672cf-874e-421e-a5d5-2ea8370a5bb8
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/5e0672cf-874e-421e-a5d5-2ea8370a5bb8?wsid=/subscriptions/bcbf34a7-1936-4783-8840-8f324c37f354/resourcegroups/demo/workspaces/demo-ent-ws&tid=1f053027-5c7a-4f10-8444-ca55e5715f27

PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': '5e0672cf-874e-421e-a5d5-2ea8370a5bb8', 'status': 'Completed', 'startTimeUtc': '2021-05-25T10:33:16.109613Z', 'endTimeUtc': '2021-05-25T10:44:31.706553Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': 'SDK', 'runType': 'SDK', 'azureml.parameters': '{}'}, 'inputDatasets': [], 'outputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://demoentws5367325393.blob.core.windows.net/azureml/ExperimentRun/dcid.5e0672cf-874e-421e-a5d5-2ea8370a5bb8/logs/azureml/executionlogs.txt?sv=2019-02-02&sr=b&sig=HclgJLClJgP9RKo7VUqLI7ohsqrBW%2BsXMAcQaOUWn2w%3D&st=2021-05-25T10%3A23%3A23Z&se=2021-05-25T18%3

'Finished'