#  Training using Azure Machine Learning Compute


<img src='https://cdn.thenewstack.io/media/2018/10/2e4f0988-az-ml-0.png'>

In [15]:
import sys
sys.version

'3.6.2 |Anaconda, Inc.| (default, Sep 30 2017, 18:42:57) \n[GCC 7.2.0]'

In [16]:
# Check core SDK version number
import azureml.core

print("Azure ML service version :", azureml.core.VERSION)

Azure ML service version : 1.0.69


In [17]:
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

azuremlservice
azuremlserviceresourcegroup
westeurope
70b8f39e-8863-49f7-b6ba-34a80799550c


## Création expérimentation

In [18]:
from azureml.core import Experiment
experiment_name = 'analysesamlcompute'
experiment = Experiment(workspace = ws, name = experiment_name)

## Introduction to AmlCompute

Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created **within your workspace region** and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user. 

Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service. 

For more information on Azure Machine Learning Compute, please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)

**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

The training script `train.py` is already created for you. Let's have a look.

## Submit an AmlCompute run in a few different ways

First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since AmlCompute is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D2_V2') is supported.

You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)

In [28]:
cts = ws.compute_targets
for ct in cts:
    print(ct)

aks-aml-visual
VMDS3V2
aks-cluster01
vmDS15V2
StandardDS4V2
automlD2V2
automlD2v2
monclusterDS2V2
cpu-cluster


In [19]:
from azureml.core.compute import ComputeTarget, AmlCompute

AmlCompute.supported_vmsizes(workspace = ws)
#AmlCompute.supported_vmsizes(workspace = ws, location='southcentralus')

[{'gpus': 0,
  'maxResourceVolumeMB': 51200,
  'memoryGB': 3.5,
  'name': 'Standard_D1',
  'vCPUs': 1},
 {'gpus': 0,
  'maxResourceVolumeMB': 102400,
  'memoryGB': 7.0,
  'name': 'Standard_D2',
  'vCPUs': 2},
 {'gpus': 0,
  'maxResourceVolumeMB': 204800,
  'memoryGB': 14.0,
  'name': 'Standard_D3',
  'vCPUs': 4},
 {'gpus': 0,
  'maxResourceVolumeMB': 409600,
  'memoryGB': 28.0,
  'name': 'Standard_D4',
  'vCPUs': 8},
 {'gpus': 0,
  'maxResourceVolumeMB': 102400,
  'memoryGB': 14.0,
  'name': 'Standard_D11',
  'vCPUs': 2},
 {'gpus': 0,
  'maxResourceVolumeMB': 204800,
  'memoryGB': 28.0,
  'name': 'Standard_D12',
  'vCPUs': 4},
 {'gpus': 0,
  'maxResourceVolumeMB': 409600,
  'memoryGB': 56.0,
  'name': 'Standard_D13',
  'vCPUs': 8},
 {'gpus': 0,
  'maxResourceVolumeMB': 819200,
  'memoryGB': 112.0,
  'name': 'Standard_D14',
  'vCPUs': 16},
 {'gpus': 0,
  'maxResourceVolumeMB': 51200,
  'memoryGB': 3.5,
  'name': 'Standard_D1_v2',
  'vCPUs': 1},
 {'gpus': 0,
  'maxResourceVolumeMB': 1024

In [20]:
import os
import shutil

project_folder = './train-on-amlcompute'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('train.py', project_folder)

'./train-on-amlcompute/train.py'

In [21]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

myenv = Environment("myenv")

myenv.docker.enabled = True
myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])

### Provision as a persistent compute target

You can provision a persistent AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.

* `vm_size`: VM family of the nodes provisioned by AmlCompute. Simply choose from the supported_vmsizes() above
* `max_nodes`: Maximum nodes to autoscale to while running a job on AmlCompute

In [27]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

cpu_cluster_name = "cpu-cluster"

try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned


In [29]:
cts = ws.compute_targets
for ct in cts:
    print(ct)

aks-aml-visual
VMDS3V2
aks-cluster01
vmDS15V2
StandardDS4V2
automlD2V2
automlD2v2
monclusterDS2V2
cpu-cluster


### Configure & Run

In [24]:
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import DEFAULT_CPU_IMAGE

src = ScriptRunConfig(source_directory=project_folder, script='train.py')

src.run_config.target = cpu_cluster.name

src.run_config.environment = myenv
 
run = experiment.submit(config=src)
run

Experiment,Id,Type,Status,Details Page,Docs Page
analysesamlcompute,analysesamlcompute_1571927654_31de7d13,azureml.scriptrun,Starting,Link to Azure Portal,Link to Documentation


Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run).

In [25]:
run.wait_for_completion(show_output=True)

RunId: analysesamlcompute_1571927654_31de7d13
Web View: https://mlworkspace.azure.ai/portal/subscriptions/70b8f39e-8863-49f7-b6ba-34a80799550c/resourceGroups/azuremlserviceresourcegroup/providers/Microsoft.MachineLearningServices/workspaces/azuremlservice/experiments/analysesamlcompute/runs/analysesamlcompute_1571927654_31de7d13

Streaming azureml-logs/55_azureml-execution-tvmps_4f7016796afdd59e6bb505add6e4ac1ccde9f4b49658b8a7188066b7eca3eb14_d.txt

2019-10-24T14:37:50Z Starting output-watcher...
Login Succeeded
Using default tag: latest
latest: Pulling from azureml/azureml_30e66b3edd8c80fa7056c857b57fdf50
a1298f4ce990: Pulling fs layer
04a3282d9c4b: Pulling fs layer
9b0d3db6dc03: Pulling fs layer
8269c605f3f1: Pulling fs layer
6504d449e70c: Pulling fs layer
4e38f320d0d4: Pulling fs layer
b0a763e8ee03: Pulling fs layer
11917a028ca4: Pulling fs layer
a6c378d11cbf: Pulling fs layer
6cc007ad9140: Pulling fs layer
6c1698a608f3: Pulling fs layer
78b5115f88e4: Pulling fs layer
f92d957afa9e: 

{'endTimeUtc': '2019-10-24T14:40:01.497601Z',
 'inputDatasets': [],
 'logFiles': {'azureml-logs/55_azureml-execution-tvmps_4f7016796afdd59e6bb505add6e4ac1ccde9f4b49658b8a7188066b7eca3eb14_d.txt': 'https://azuremlservice8628362969.blob.core.windows.net/azureml/ExperimentRun/dcid.analysesamlcompute_1571927654_31de7d13/azureml-logs/55_azureml-execution-tvmps_4f7016796afdd59e6bb505add6e4ac1ccde9f4b49658b8a7188066b7eca3eb14_d.txt?sv=2018-11-09&sr=b&sig=08j4lZv5G8kyWZJ8Ph4LLYV6ExNf7I0Kq4bSSl0VX8M%3D&st=2019-10-24T14%3A30%3A02Z&se=2019-10-24T22%3A40%3A02Z&sp=r',
  'azureml-logs/65_job_prep-tvmps_4f7016796afdd59e6bb505add6e4ac1ccde9f4b49658b8a7188066b7eca3eb14_d.txt': 'https://azuremlservice8628362969.blob.core.windows.net/azureml/ExperimentRun/dcid.analysesamlcompute_1571927654_31de7d13/azureml-logs/65_job_prep-tvmps_4f7016796afdd59e6bb505add6e4ac1ccde9f4b49658b8a7188066b7eca3eb14_d.txt?sv=2018-11-09&sr=b&sig=teGua8WHMaEOGAXlb6UMN5%2BRj4CuFMlG5aQ2tiKekpg%3D&st=2019-10-24T14%3A30%3A02Z&se=2019

## Résultats

In [26]:
run.get_metrics()

{'alpha': [0.0,
  0.1,
  0.2,
  0.30000000000000004,
  0.4,
  0.5,
  0.6000000000000001,
  0.7000000000000001,
  0.8,
  0.9],
 'mse': [3424.3166882137343,
  3372.649627810032,
  3325.2946794678764,
  3302.6736334017255,
  3295.741064355809,
  3298.9096058070622,
  3308.7042707723226,
  3322.8983149039614,
  3340.0246620321604,
  3359.0935697484424]}

> Fin