![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.png)

# 使用HyperDriveStep的Azure機器學習管道



## Azure機器學習和管道SDK特定的導入


In [5]:
import os
import shutil
import urllib
import azureml.core
from azureml.core import Workspace, Experiment
from azureml.core.datastore import Datastore
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.exceptions import ComputeTargetException
from azureml.data.data_reference import DataReference
from azureml.pipeline.steps import HyperDriveStep
from azureml.pipeline.core import Pipeline, PipelineData
from azureml.train.dnn import TensorFlow
from azureml.train.hyperdrive import *

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.0.57


## 初始化工作區

從持續配置初始化工作區object。 如果使用的是Azure機器學習Notebook VM，則一切就緒。 否則，請確保配置文件位於。\ config.json中

In [6]:
import os

subscription_id = "yoursubscription_id"
resource_group = "test20191105"
workspace_name = "test1106ws"
workspace_region = "eastus2"
from azureml.core.authentication import InteractiveLoginAuthentication
from azureml.core import Workspace

try:
    ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
    # write the details of the workspace to a configuration file to the notebook library
    ws.write_config()
   
    print("Workspace configuration succeeded. Skip the workspace creation steps below")
except:
    print("Workspace not accessible. Change your parameters or create a new workspace below")



Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code BSVP4RQGE to authenticate.
Interactive authentication successfully completed.
Workspace not accessible. Change your parameters or create a new workspace below


In [None]:
auth = InteractiveLoginAuthentication(tenant_id = 'yourtenant_id')
ws = Workspace.from_config(auth = auth)
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

## 建立Azure ML實驗
創建一個名為“ tf-mnist”的實驗和一個用於存放訓練腳本的文件夾。 腳本運行將記錄在Azure實驗中


In [9]:
script_folder = './tf-mnist'
os.makedirs(script_folder, exist_ok=True)

exp = Experiment(workspace=ws, name='tf-mnist')

## 下載MNIST數據集
為了訓練MNIST數據集，首先需要直接從Yan LeCun的網站下載它，並將其保存在本地的“ data”文件夾中。

In [10]:
os.makedirs('./data/mnist', exist_ok=True)

urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')

('./data/mnist/test-labels.gz', <http.client.HTTPMessage at 0x7f2971900358>)

# 將MNIST數據集上傳到Blob數據存儲區
數據存儲區是可以存儲數據的地方，然後可以通過將數據裝入或複製到計算目標使Run可以訪問它。 在下一步中，我們將使用Azure Blob存儲並將培訓和測試集上傳到Azure Blob數據存儲中，然後將其安裝在Batch AI群集中進行培訓。

In [11]:
ds = ws.get_default_datastore()
ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)

Uploading an estimated of 4 files
Uploading ./data/mnist/test-images.gz
Uploading ./data/mnist/test-labels.gz
Uploading ./data/mnist/train-images.gz
Uploading ./data/mnist/train-labels.gz
Uploaded ./data/mnist/train-labels.gz, 1 files out of an estimated total of 4
Uploaded ./data/mnist/test-labels.gz, 2 files out of an estimated total of 4
Uploaded ./data/mnist/test-images.gz, 3 files out of an estimated total of 4
Uploaded ./data/mnist/train-images.gz, 4 files out of an estimated total of 4
Uploaded 4 files


$AZUREML_DATAREFERENCE_5673bdf244224f82a9304441c6f83d90

# 取回或創建Azure機器學習計算
Azure機器學習計算是一項用於配置和管理Azure虛擬機群集以運行機器學習工作負載的服務。 讓我們獲取當前工作空間中的默認Azure機器學習計算。 然後，我們將在此計算目標上運行訓練腳本。

In [23]:
#compute_target = ws.get_default_compute_target("C")
cpu_cluster_name = "cpucluster"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print("Found existing cpucluster")
except ComputeTargetException:
    print("Creating new cpucluster")

Found existing cpucluster


## 將訓練文件複製到腳本文件夾中

In [24]:
# the training logic is in the tf_mnist.py file.
shutil.copy('./tf_mnist.py', script_folder)

# the utils.py just helps loading data from the downloaded MNIST dataset into numpy arrays.
shutil.copy('./utils.py', script_folder)

'./tf-mnist/utils.py'

## 建立 TensorFlow 估算器
構造一個“ azureml.train.dnn.TensorFlow”估計器對象，使用Batch AI集群作為計算目標，並將數據存儲的安裝點作為參數傳遞給訓練代碼。
TensorFlow估算器提供了一種在計算目標上啟動TensorFlow訓練作業的簡單方法。 它會自動提供一個安裝了TensorFlow的docker映像-如果需要其他pip或conda軟件包，則可以通過`pip_packages`和`conda_packages`參數傳入它們的名稱，並將它們包含在生成的docker中。

In [25]:
est = TensorFlow(source_directory=script_folder,                 
                 compute_target=compute_target,
                 entry_script='tf_mnist.py', 
                 use_gpu=False)



## 智慧超參數調整
已經用一組超參數訓練了模型，現在讓我們來說明如何通過在集群上啟動多個運行來進行超參數調整。 首先，讓我們使用隨機採樣定義參數空間。

在此示例中，我們將使用隨機採樣來嘗試不同的超參數配置集，以最大程度地提高我們的主要指標，最佳驗證精度（validation_acc）。

In [26]:
ps = RandomParameterSampling(
    {
        '--batch-size': choice(25, 50, 100),
        '--first-layer-neurons': choice(10, 50, 200, 300, 500),
        '--second-layer-neurons': choice(10, 50, 200, 500),
        '--learning-rate': loguniform(-6, -1)
    }
)

# 定義一個早期終止策略。
“BanditPolicy”基本上聲明每2次迭代檢查一次作業。 如果主要指標（稍後定義）超出了前10％的範圍，則Azure ML終止作業。 這使我們免於繼續探索沒有顯示出幫助實現目標指標的希望的超參數。

In [27]:
early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

配置運行配置對象，並指定在訓練運行中記錄的主要指標validation_acc。 如果返回訪問培訓腳本，則會注意到在每個時期（完整的批次設置）之後都記錄了該值。 

In [29]:
hd_config = HyperDriveConfig(estimator=est, 
                             hyperparameter_sampling=ps,
                             policy=early_termination_policy,
                             primary_metric_name='validation_acc', 
                             primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, 
                             max_total_runs=1,
                             max_concurrent_runs=1)

## 添加HyperDrive作為管道的步驟

為hyperdrive步驟的輸入設置數據參考

In [30]:
data_folder = DataReference(
    datastore=ds,
    data_reference_name="mnist_data")

### HyperDriveStep
HyperDriveStep可作為管道中的步驟用於運行HyperDrive作業。
-名稱：步驟名稱
-hyperdrive_config： HyperDriveConfig，它定義了此HyperDrive運行的配置
-estimator_entry_script_arguments：估計器輸入腳本的命令行參數列表
-inputs：輸入端口綁定列表
-輸出：輸出端口綁定列表
-metrics_output：可選值，用於指定將HyperDrive運行指標存儲為JSON文件的位置
-allow_reuse：是否允許重用
-版本：版本


In [31]:
metrics_output_name = 'metrics_output'
metirics_data = PipelineData(name='metrics_data',
                             datastore=ds,
                             pipeline_output_name=metrics_output_name)

hd_step = HyperDriveStep(
    name="hyperdrive_module",
    hyperdrive_config=hd_config,
    estimator_entry_script_arguments=['--data-folder', data_folder],
    inputs=[data_folder],
    metrics_output=metirics_data)

### Run the pipeline

In [32]:
pipeline = Pipeline(workspace=ws, steps=[hd_step])
pipeline_run = Experiment(ws, 'Hyperdrive_Test').submit(pipeline)

Created step hyperdrive_module [ef6a87db][8220afe4-5eb4-4edc-8e0e-f35278004015], (This step is eligible to reuse a previous run's output)
Using data reference mnist_data for StepId [8d91f553][6b6e635c-45c9-4f24-a7ec-3afba465fd32], (Consumers of this data are eligible to reuse prior runs.)
Submitted pipeline run: 0669da49-2c05-4eae-93b2-507cdb282df6


### 等待此管道運行完成

In [34]:
pipeline_run.wait_for_completion()

PipelineRunId: 0669da49-2c05-4eae-93b2-507cdb282df6
Link to Portal: https://mlworkspace.azure.ai/portal/subscriptions/f01533c9-b5ce-48c8-8ff4-9f472eb56574/resourceGroups/test20191105/providers/Microsoft.MachineLearningServices/workspaces/test1106ws/experiments/Hyperdrive_Test/runs/0669da49-2c05-4eae-93b2-507cdb282df6

PipelineRun Execution Summary
PipelineRun Status: Finished
{'runId': '0669da49-2c05-4eae-93b2-507cdb282df6', 'status': 'Completed', 'startTimeUtc': '2019-11-07T04:46:10.020454Z', 'endTimeUtc': '2019-11-07T04:56:34.35419Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': None, 'runType': 'HTTP', 'azureml.parameters': '{}'}, 'logFiles': {'logs/azureml/executionlogs.txt': 'https://test1106ws4305527614.blob.core.windows.net/azureml/ExperimentRun/dcid.0669da49-2c05-4eae-93b2-507cdb282df6/logs/azureml/executionlogs.txt?sv=2019-02-02&sr=b&sig=Ha3nnCxoASldfFi8p3OnQ1yKxU7BExveB0XXOf48B5c%3D&st=2019-11-07T04%3A52%3A12Z&se=2019-11-07T13%3A02%3A12Z&sp=r', 'log

'Finished'

### Retrieve the metrics
Outputs of above run can be used as inputs of other steps in pipeline. In this tutorial, we will show the result metrics.

In [35]:
metrics_output = pipeline_run.get_pipeline_output(metrics_output_name)
num_file_downloaded = metrics_output.download('.', show_progress=True)

Downloading azureml/ad8dd174-a73d-4004-b519-6b6c0c988211/metrics_data
Downloaded azureml/ad8dd174-a73d-4004-b519-6b6c0c988211/metrics_data, 1 files out of an estimated total of 1


In [36]:
import pandas as pd
import json
with open(metrics_output._path_on_datastore) as f:  
    metrics_output_result = f.read()
    
deserialized_metrics_output = json.loads(metrics_output_result)
df = pd.DataFrame(deserialized_metrics_output)
df

Unnamed: 0,Hyperdrive_Test_1573101975911402_0
final_acc,[0.9742000102996826]
training_acc,"[0.9900000095367432, 0.9800000190734863, 1, 0...."
validation_acc,"[0.9463000297546387, 0.9560999870300293, 0.966..."
