## Training DLWP on Azure with Microsoft Azure Machine Learning service
For a reference on getting started with the Microsoft Azure Machine Learning service, refer to the [Microsoft documentation](https://docs.microsoft.com/en-us/azure/machine-learning/service/).

First, let's import the core AzureML Python modules.

In [1]:
import azureml.core
from azureml.core import Workspace
from azureml.core import Experiment

from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget

import os

#### Create or import a workspace
In this example, we assume a workspace already exists, but it is easy to create a workspace on-the-fly with `Workspace.create()`. Use environment variables to load sensitive information such as `subscription_id` and authentication passwords.

In [4]:
ws = Workspace.get(
    name='dlwp-ml-1',
    subscription_id=os.environ.get('AZURE_SUBSCRIPTION_ID'),
    resource_group='DLWP'
)

#### Set up the compute cluster
This code, adapted from the Microsoft documentation example, checks for existing compute resources in the workspace or creates them if they do not exist. We use GPU nodes, of which there are a few choices:
- STANDARD_NC6: Tesla K80
- STANDARD_NC6_v2: Tesla P100
- STANDARD_NC6_v3: Tesla V100
- STANDARD_ND6: Tesla P40
- STANDARD_NV6: Tesla M60

In [5]:
# Name of the cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "dlwp-compute-1")
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 2)

# Set a GPU VM type
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_NV6")

if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target (%s)' % compute_name)
else:
    print('creating a new compute target (%s)' % compute_name)
    provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                min_nodes = compute_min_nodes, 
                                                                max_nodes = compute_max_nodes)

    # create the cluster
    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
    
    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=10)

found compute target (dlwp-compute-1)


#### Copy data to the compute cluster
This optional step is needed if data hasn't yet been uploaded to a storage blob connected to the workspace.

In [6]:
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)

# ds.upload(src_dir=data_folder, target_path='DLWP', overwrite=True, show_progress=True)

AzureBlob dlwpml10633119839 azureml-blobstore-ba10431a-baca-4271-b9e8-283bd838c07e


#### Create the experiment

In [7]:
experiment_name = 'dlwp'

exp = Experiment(workspace=ws, name=experiment_name)

#### Create a TensorFlow estimator
Now we create an image for a TensorFlow estimator that will be used as the VM for the compute cluster. Azure creates a Docker image the first time this is run; in the future, it can re-use existing images to run faster.

In [8]:
from azureml.train.dnn import TensorFlow

script_params = {
    '--root-directory': ds.path('DLWP').as_mount(),
    '--predictor-file': 'cfs_1979-2010_hgt-thick_300-500-700_NH_T2.nc',
    '--model-file': 'dlwp_tau-lstm',
    '--log-directory': 'logs/tau-lstm',
    '--temp-dir': '/mnt/tmp'
}

tf_est = TensorFlow(source_directory=os.path.join(os.getcwd(), os.pardir),
                    script_params=script_params,
                    compute_target=compute_target,
                    entry_script=os.path.join(os.getcwd(), 'train_tf.py'),
                    conda_packages=['scikit-learn', 'netCDF4', 'dask', 'xarray'],
                    pip_packages=['keras'],
                    use_gpu=True)

#### Run the experiment

In [9]:
run = exp.submit(config=tf_est)
run

Experiment,Id,Type,Status,Details Page,Docs Page
dlwp,dlwp_1553550932_94c28310,azureml.scriptrun,Starting,Link to Azure Portal,Link to Documentation


In [10]:
run.cancel()