# Train HydraNet for WikiSQL benchmark

## Prerequisites:
- Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)
- Install the Python SDK:  make sure to install notebook, and contrib
    ```
    conda create -n azureml -y Python=3.6
    source activate azureml
    pip install --upgrade azureml-sdk[notebooks,contrib]
    conda install ipywidgets
    jupyter nbextension install --py --user azureml.widgets
    jupyter nbextension enable azureml.widgets --user --py
    ```
 
You will need to restart jupyter after this
Detailed instructions are here: https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python

In [1]:
import json

from azureml.core import Workspace, Experiment, Run, RunConfiguration, ComputeTarget
# Import AzureML Libraries
import azureml.core
from azureml.core import Datastore, Dataset,Experiment, Workspace, RunConfiguration, ContainerRegistry, Environment
from azureml.core.compute_target import ComputeTargetException
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import MpiConfiguration

from azureml.train.estimator import Estimator
from azureml.train.dnn import PyTorch
from azureml.data.datapath import DataPath, DataPathComputeBinding
from azureml.data.data_reference import DataReference

from azureml.widgets import RunDetails

In [2]:
azureml.core.VERSION

'1.11.0'

In [3]:
# Retrieve your workspace
ws = Workspace.get(name="xiaoyzhu-turingrg",
                 subscription_id='a6c2a7cc-d67e-4a1a-b765-983f08c0423a',
                  resource_group='xiaoyzhu-turingrg')

In [4]:
# Create the compute cluster
gpu_cluster_name = "nd40-ssh-2" 

# Verify that the cluster doesn't exist already
try:
    gpu_compute_target = ComputeTarget(workspace=ws, name=gpu_cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_ND40rs_v2', min_nodes=0, max_nodes=1)
    
    # create the cluster
    gpu_compute_target = ComputeTarget.create(ws, gpu_cluster_name, compute_config)
    gpu_compute_target.wait_for_completion(show_output=True)

Found existing compute target.


In [5]:
# The output will be stored in this blob container in the following blobs:
# azureml/<Run Id>/output/*

default_ds = ws.get_default_datastore()

print('Workspace name: ' + default_ds.workspace.name,
      'Datastore name: ' + default_ds.name,
      'Datastore type: ' + default_ds.datastore_type,
      'Container name: ' + default_ds.container_name, sep = '\n')

Workspace name: xiaoyzhu-turingrg
Datastore name: workspaceblobstore
Datastore type: AzureBlob
Container name: azureml-blobstore-1df38ad3-d561-413a-b8c5-603f11808774


In [8]:
script_params = {
    '--gpu': '0,1,2,3,4,5,6,7'
}

train_est = PyTorch(source_directory='..', 
                    script_params=script_params,
                    compute_target=gpu_compute_target,
                    entry_script='notebooks/prep_and_train.py',
                    framework_version='1.4',
                    use_gpu=True,
                    pip_requirements_file='requirements.txt')

Run the training. On a Standard_ND40rs_v2 VM with 8 V100 GPUs, it takes about 2 hours.

In [None]:
experiment = Experiment(ws, name="HydraNet")
run = experiment.submit(train_est)
RunDetails(run).show()
