# Azure ML Compute

<img src='https://github.com/retkowsky/images/blob/master/AzureMLservicebanniere.png?raw=true'>

Documentation:<br>
https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target <br>
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets


**Azure Machine Learning Compute** is a **managed-compute infrastructure** that allows the user to easily create a single or multi-node compute. The compute is created within your workspace region as a resource that can be shared with other users in your workspace. The compute **scales up automatically when a job is submitted**, and can be put in an Azure Virtual Network. The compute executes in a containerized environment and packages your model dependencies in a **Docker container**.

You can use Azure Machine Learning Compute to distribute the training process across a cluster of **CPU or GPU** compute nodes in the cloud. For more information on the VM sizes that include GPUs, see GPU-optimized virtual machine sizes.

Azure Machine Learning Compute has default limits, such as the number of cores that can be allocated. For more information, see Manage and request quotas for Azure resources.

You can create an Azure Machine Learning compute environment **on demand** when you schedule a run, or as a **persistent resource**.


## 1. Intro

In [None]:
import sys
sys.version

In [None]:
import datetime
now = datetime.datetime.now()
print(now)

In [None]:
import azureml.core
print("Version Azure ML service :", azureml.core.VERSION)

## 2. Workspace

Initialize a workspace object from persisted configuration

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()

## 3. Expérimentation

**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments.

In [None]:
from azureml.core import Experiment
experiment_name = 'Exemple5-amlcompute'
experiment = Experiment(workspace = ws, name = experiment_name)

## 4. Introduction AmlCompute

> https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets

## Liste des compute servers définis

In [None]:
cts = ws.compute_targets
for ct in cts:
    print(ct)

### 4.1 Liste serveurs AML Compute disponibles

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute

AmlCompute.supported_vmsizes(workspace = ws)

### 4.2 Répertoire

Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on

In [None]:
import os
import shutil

project_folder = './train-on-amlcompute'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('train_aml.py', project_folder)

In [None]:
with open(os.path.join('./train-on-amlcompute/train_aml.py'), 'r') as f:
    print(f.read())

### 4.3 Environnement

In [None]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

myenv = Environment("myenv")

myenv.docker.enabled = True

myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn==0.22.0'])

> Documentation : https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute<br>
> Pricing : https://azure.microsoft.com/en-us/pricing/details/machine-learning/

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Nom
cpu_cluster_name = "cpu-cluster"

try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           min_nodes = 1, #Mettre à 0 pour automatic shutdown
                                                           max_nodes = 4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

In [None]:
#Liste des compute servers disponibles
listecomputeservers = ws.compute_targets
for liste in listecomputeservers:
    print(liste)

In [None]:
cpu_cluster.get_status().serialize()

In [None]:
# Statut du compute server
cpu_cluster.list_nodes()

### 4.4 Configuration et exécution du run

In [None]:
# Fichier Python à exécuter
!ls train_aml.py -l

In [None]:
#Visu du fichier
with open(os.path.join('./train_aml.py'), 'r') as f:
    print(f.read())

In [None]:
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import DEFAULT_CPU_IMAGE

src = ScriptRunConfig(source_directory=project_folder, script='train_aml.py')

# Set compute target to the one created in previous step
src.run_config.target = cpu_cluster.name

# Set environment
src.run_config.environment = myenv

> C'est parti ! On exécute le run

In [None]:
# Définition de tags pour le run
tagsdurun = {"Type": "test" , "Langage" : "Python" , "Framework" : "Scikit-Learn", "Team" : "DataScience" , "Pays" : "France"}

In [None]:
# Execution run
run = experiment.submit(config=src, tags=tagsdurun)
run

### 4.5 Widget disponible pour suivre l'avancement du run

In [None]:
from azureml.widgets import RunDetails
RunDetails(run).show()

### 4.6 Informations additionnelles

> **run.get_details** pour suivre **l'avancement du run**. <br>Si le cluster est inactif, cela peut nécessiter plus de temps de traitement.

In [None]:
# Statut du run
run.get_details()

In [None]:
# Statut
cpu_cluster.list_nodes()

> Pour voir les métriques de l'expérimentation (uniquement en fin de run). <br>Les métriques sont aussi visibles dans le portail Azure.

In [None]:
print("Liste des métriques :")
run.get_metrics()

### Informations sur le compute server:

In [None]:
print("Status du cluster :")
cpu_cluster.get_status().serialize()

In [None]:
print("Noeuds du cluster :")
cpu_cluster.list_nodes()

### On peut changer la configuration du compute server :

In [None]:
#cpu_cluster.update(min_nodes=0) # On passe à 0 min node

In [None]:
print("Status du cluster")
cpu_cluster.get_status().serialize()

In [None]:
#cpu_cluster.update(max_nodes=10)

In [None]:
#cpu_cluster.update(idle_seconds_before_scaledown=1200) # On change le timeout

In [None]:
print("Status du cluster")
cpu_cluster.get_status().serialize()

In [None]:
#cpu_cluster.update(min_nodes=2, max_nodes=4, idle_seconds_before_scaledown=600)

### Suppression du compute server :

In [None]:
#Pour supprimer le compute server
#cpu_cluster.delete()

In [None]:
compute_targets = ws.compute_targets
for name, ct in compute_targets.items():
    print(name, " - " , ct.type, " - ", ct.provisioning_state)

<img src="https://github.com/retkowsky/images/blob/master/Powered-by-MS-Azure-logo-v2.png?raw=true" height="300" width="300">