# Azure ML Compute

<img src='https://github.com/retkowsky/images/blob/master/AzureMLservicebanniere.png?raw=true'>

Documentation:<br>
https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target <br>
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets

## 1. Intro

In [1]:
import sys
sys.version

'3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) \n[GCC 7.3.0]'

In [2]:
import datetime
now = datetime.datetime.now()
print(now)

2020-03-10 14:44:18.091224


In [3]:
import azureml.core
print("Version Azure ML service : ", azureml.core.VERSION)

Version Azure ML service :  1.0.83


## 2. Workspace

Initialize a workspace object from persisted configuration

In [4]:
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

AzureMLWorkshop
AzureMLWorkshopRG
westeurope
70b8f39e-8863-49f7-b6ba-34a80799550c


## 3. Expérimentation

**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments.

In [5]:
from azureml.core import Experiment
experiment_name = 'Exemple5-amlcompute'
experiment = Experiment(workspace = ws, name = experiment_name)

## 4. Introduction AmlCompute

Azure Machine Learning Compute is a managed-compute infrastructure that allows the user to easily create a single or multi-node compute. The compute is created within your workspace region as a resource that can be shared with other users in your workspace. The compute scales up automatically when a job is submitted, and can be put in an Azure Virtual Network. The compute executes in a containerized environment and packages your model dependencies in a Docker container.

You can use Azure Machine Learning Compute to distribute the training process across a cluster of CPU or GPU compute nodes in the cloud. For more information on the VM sizes that include GPUs, see GPU-optimized virtual machine sizes.

> https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets

## Liste des compute servers définis

In [6]:
cts = ws.compute_targets
for ct in cts:
    print(ct)

cpu-cluster-aml


### 4.1 Liste serveurs AML Compute disponibles

In [7]:
from azureml.core.compute import ComputeTarget, AmlCompute

AmlCompute.supported_vmsizes(workspace = ws)

[{'name': 'Standard_D1_v2',
  'vCPUs': 1,
  'gpus': 0,
  'memoryGB': 3.5,
  'maxResourceVolumeMB': 51200},
 {'name': 'Standard_D2_v2',
  'vCPUs': 2,
  'gpus': 0,
  'memoryGB': 7.0,
  'maxResourceVolumeMB': 102400},
 {'name': 'Standard_D3_v2',
  'vCPUs': 4,
  'gpus': 0,
  'memoryGB': 14.0,
  'maxResourceVolumeMB': 204800},
 {'name': 'Standard_D4_v2',
  'vCPUs': 8,
  'gpus': 0,
  'memoryGB': 28.0,
  'maxResourceVolumeMB': 409600},
 {'name': 'Standard_D11_v2',
  'vCPUs': 2,
  'gpus': 0,
  'memoryGB': 14.0,
  'maxResourceVolumeMB': 102400},
 {'name': 'Standard_D12_v2',
  'vCPUs': 4,
  'gpus': 0,
  'memoryGB': 28.0,
  'maxResourceVolumeMB': 204800},
 {'name': 'Standard_D13_v2',
  'vCPUs': 8,
  'gpus': 0,
  'memoryGB': 56.0,
  'maxResourceVolumeMB': 409600},
 {'name': 'Standard_D14_v2',
  'vCPUs': 16,
  'gpus': 0,
  'memoryGB': 112.0,
  'maxResourceVolumeMB': 819200},
 {'name': 'Standard_DS1_v2',
  'vCPUs': 1,
  'gpus': 0,
  'memoryGB': 3.5,
  'maxResourceVolumeMB': 7168},
 {'name': 'Standar

### 4.2 Répertoire

Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on

In [8]:
import os
import shutil

project_folder = './train-on-amlcompute'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('train_aml.py', project_folder)

'./train-on-amlcompute/train_aml.py'

### 4.3 Environnement

Create Docker based environment with scikit-learn installed.

In [9]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

myenv = Environment("myenv")

myenv.docker.enabled = True
myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])

> Documentation : https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute<br>
> Pricing : https://azure.microsoft.com/en-us/pricing/details/machine-learning/

In [10]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "cpu-cluster-aml"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           min_nodes = 1, #Mettre à 0 pour statut inactif
                                                           max_nodes = 4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned


In [11]:
#Liste des compute servers disponibles
listecomputeservers = ws.compute_targets
for ct in listecomputeservers:
    print(ct)

cpu-cluster-aml


In [12]:
cpu_cluster.get_status().serialize()

{'currentNodeCount': 1,
 'targetNodeCount': 1,
 'nodeStateCounts': {'preparingNodeCount': 0,
  'runningNodeCount': 0,
  'idleNodeCount': 1,
  'unusableNodeCount': 0,
  'leavingNodeCount': 0,
  'preemptedNodeCount': 0},
 'allocationState': 'Steady',
 'allocationStateTransitionTime': '2020-03-10T14:40:28.280000+00:00',
 'errors': None,
 'creationTime': '2020-03-10T14:36:34.097268+00:00',
 'modifiedTime': '2020-03-10T14:36:49.717444+00:00',
 'provisioningState': 'Succeeded',
 'provisioningStateTransitionTime': None,
 'scaleSettings': {'minNodeCount': 1,
  'maxNodeCount': 4,
  'nodeIdleTimeBeforeScaleDown': 'PT120S'},
 'vmPriority': 'Dedicated',
 'vmSize': 'STANDARD_D2_V2'}

In [13]:
# Statut
cpu_cluster.list_nodes()

[{'nodeId': 'tvmps_4283fd75cca10be2cac635f443cba13b1593525228604f420864f74da89cba9e_d',
  'port': 50000,
  'publicIpAddress': '40.114.171.192',
  'privateIpAddress': '10.0.0.4',
  'nodeState': 'idle'}]

### 4.4 Configuration et exécution du run

In [14]:
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import DEFAULT_CPU_IMAGE

src = ScriptRunConfig(source_directory=project_folder, script='train_aml.py')

# Set compute target to the one created in previous step
src.run_config.target = cpu_cluster.name

# Set environment
src.run_config.environment = myenv


> C'est parti ! On exécute le run

In [15]:
# Execution run
run = experiment.submit(config=src)
run

Experiment,Id,Type,Status,Details Page,Docs Page
Exemple5-amlcompute,Exemple5-amlcompute_1583851477_7457f7fe,azureml.scriptrun,Starting,Link to Azure Machine Learning studio,Link to Documentation


### 4.5 Widget disponible pour suivre l'avancement du run

In [16]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

### 4.6 Informations additionnelles

> **run.get_details** pour suivre **l'avancement du run**. <br>Si le cluster est inactif, cela peut nécessiter plus de temps de traitement.

In [22]:
# Statut du run
run.get_details()

{'runId': 'Exemple5-amlcompute_1583851477_7457f7fe',
 'target': 'cpu-cluster-aml',
 'status': 'Completed',
 'startTimeUtc': '2020-03-10T14:44:56.949961Z',
 'endTimeUtc': '2020-03-10T14:46:56.9485Z',
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': 'fde88168-7c98-44b0-954b-a895055d64e9',
  'azureml.git.repository_uri': 'https://github.com/retkowsky/WorkshopAML2020',
  'mlflow.source.git.repoURL': 'https://github.com/retkowsky/WorkshopAML2020',
  'azureml.git.branch': 'master',
  'mlflow.source.git.branch': 'master',
  'azureml.git.commit': '92bcd73fc9ec1037078710902a207fd495a95825',
  'mlflow.source.git.commit': '92bcd73fc9ec1037078710902a207fd495a95825',
  'azureml.git.dirty': 'True',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [],
 'runDefinition': {'script': 'train_aml.py',
  'useAbsolutePath': False,
  'arguments': [],
  'sourceDirectoryDataStore': None,
  'fram

In [23]:
# Statut
cpu_cluster.list_nodes()

[{'nodeId': 'tvmps_4283fd75cca10be2cac635f443cba13b1593525228604f420864f74da89cba9e_d',
  'port': 50000,
  'publicIpAddress': '40.114.171.192',
  'privateIpAddress': '10.0.0.4',
  'nodeState': 'idle'}]

> Pour voir les métriques de l'expérimentation (uniquement en fin de run). Les métriques sont aussi visibles dans le portail Azure.

In [24]:
run.get_metrics()

{'alpha': [0.0,
  0.05,
  0.1,
  0.15000000000000002,
  0.2,
  0.25,
  0.30000000000000004,
  0.35000000000000003,
  0.4,
  0.45,
  0.5,
  0.55,
  0.6000000000000001,
  0.65,
  0.7000000000000001,
  0.75,
  0.8,
  0.8500000000000001,
  0.9,
  0.9500000000000001],
 'mse': [3424.3166882137343,
  3408.9153122589296,
  3372.649627810032,
  3345.1496434741894,
  3325.2946794678764,
  3311.5562509289744,
  3302.6736334017255,
  3297.658733944204,
  3295.741064355809,
  3296.316884705675,
  3298.9096058070622,
  3303.1400555275163,
  3308.7042707723226,
  3315.3568399622563,
  3322.8983149039614,
  3331.1656169285875,
  3340.0246620321604,
  3349.3646443486023,
  3359.0935697484424,
  3369.1347399130477]}

> Informations sur le compute server:

In [25]:
#get_status () gets the latest status of the AmlCompute target
cpu_cluster.get_status().serialize()

{'currentNodeCount': 1,
 'targetNodeCount': 1,
 'nodeStateCounts': {'preparingNodeCount': 0,
  'runningNodeCount': 1,
  'idleNodeCount': 0,
  'unusableNodeCount': 0,
  'leavingNodeCount': 0,
  'preemptedNodeCount': 0},
 'allocationState': 'Steady',
 'allocationStateTransitionTime': '2020-03-10T14:40:28.280000+00:00',
 'errors': None,
 'creationTime': '2020-03-10T14:36:34.097268+00:00',
 'modifiedTime': '2020-03-10T14:36:49.717444+00:00',
 'provisioningState': 'Succeeded',
 'provisioningStateTransitionTime': None,
 'scaleSettings': {'minNodeCount': 1,
  'maxNodeCount': 4,
  'nodeIdleTimeBeforeScaleDown': 'PT120S'},
 'vmPriority': 'Dedicated',
 'vmSize': 'STANDARD_D2_V2'}

In [26]:
cpu_cluster.list_nodes()

[{'nodeId': 'tvmps_4283fd75cca10be2cac635f443cba13b1593525228604f420864f74da89cba9e_d',
  'port': 50000,
  'publicIpAddress': '40.114.171.192',
  'privateIpAddress': '10.0.0.4',
  'nodeState': 'idle'}]

> On peut changer la configuration du compute server :

In [29]:
#Update () takes in the min_nodes, max_nodes and idle_seconds_before_scaledown and updates the AmlCompute target

cpu_cluster.update(min_nodes=0) # On passe à 0 min node
#cpu_cluster.update(max_nodes=10)
cpu_cluster.update(idle_seconds_before_scaledown=300) # On change le timeout
#cpu_cluster.update(min_nodes=2, max_nodes=4, idle_seconds_before_scaledown=600)

> Suppression du compute server :

In [31]:
#Pour supprimer le compute server
cpu_cluster.delete()

In [33]:
#On visualise que le compute server a bien été supprimé
cts = ws.compute_targets
for ct in cts:
    print(ct)

cpu-cluster-aml
pipeline


<img src="https://github.com/retkowsky/images/blob/master/Powered-by-MS-Azure-logo-v2.png?raw=true" height="300" width="300">