# Lab 3: Azure ML compute clusters

<img src='https://github.com/retkowsky/images/blob/master/AzureMLservicebanniere.png?raw=true'>


**Azure Machine Learning Compute** is a **managed-compute infrastructure** that allows the user to easily create a single or multi-node compute. The compute is created within your workspace region as a resource that can be shared with other users in your workspace. The compute **scales up automatically when a job is submitted**, and can be put in an Azure Virtual Network. The compute executes in a containerized environment and packages your model dependencies in a **Docker container**.

You can use Azure Machine Learning Compute to distribute the training process across a cluster of **CPU or GPU** compute nodes in the cloud. For more information on the VM sizes that include GPUs, see GPU-optimized virtual machine sizes.

Azure Machine Learning Compute has default limits, such as the number of cores that can be allocated. For more information, see Manage and request quotas for Azure resources.

You can create an Azure Machine Learning compute environment **on demand** when you schedule a run, or as a **persistent resource**.


Documentation:<br>
- https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target <br>
- https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets

## 1. Intro

In [1]:
import sys
sys.version

'3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) \n[GCC 7.3.0]'

In [2]:
import datetime
now = datetime.datetime.now()
print("Date:", now)

Date: 2021-04-12 08:35:37.669497


In [3]:
import azureml.core
print("You are using Azure ML", azureml.core.VERSION)

You are using Azure ML 1.26.0


## 2. Workspace

Initialize a workspace object from persisted configuration

In [4]:
from azureml.core import Workspace
ws = Workspace.from_config()

## 3. Experiment

**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments.

In [5]:
from azureml.core import Experiment
experiment_name = 'Lab3-AzureMLCompute'

experiment = Experiment(workspace = ws, name = experiment_name)

### List of experiments in your workspace

In [6]:
#list_experiments = Experiment.list(ws)
#print("List of experiments :")
#for expname in list_experiments:
#    print(expname.name)

## 4. Azure ML compute clusters

> Documentation: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets

In [7]:
print("Compute instances in your workspace Azure ML:")
cts = ws.compute_targets
for ct in cts:
    print('-', ct)

Compute instances in your workspace Azure ML:
- AzureDatabricks
- instanceaks
- my-aks-9
- automl
- computeinstancenb
- cpu-cluster
- computeinstanceds12
- automlclus551001


### 4.1 Available Azure ML Compute clusters

In [8]:
#from azureml.core.compute import ComputeTarget, AmlCompute
#AmlCompute.supported_vmsizes(workspace = ws)

### 4.2 Directory

Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on

In [9]:
import os
import shutil

project_folder = './train-on-amlcompute'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('train_aml.py', project_folder)

'./train-on-amlcompute/train_aml.py'

### Let's view the python code we want to submit:

In [10]:
with open(os.path.join('./train-on-amlcompute/train_aml.py'), 'r') as f:
    print(f.read())

# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.

from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from azureml.core.run import Run
from sklearn.externals import joblib
import os
import numpy as np

os.makedirs('./outputs', exist_ok=True)

X, y = load_diabetes(return_X_y=True)

run = Run.get_context()

X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=0)
data = {"train": {"X": X_train, "y": y_train},
        "test": {"X": X_test, "y": y_test}}

# list of numbers from 0.0 to 1.0 with a 0.05 interval
alphas = np.arange(0.0, 1.0, 0.05)

for alpha in alphas:
    # Use Ridge algorithm to create a regression model
    reg = Ridge(alpha=alpha)
    reg.fit(data["train"]["X"], data["train"]["y"

In [11]:
import sklearn
print('You are using scikit-learn =', sklearn.__version__)

You are using scikit-learn = 0.22.2.post1


### 4.3 Environment

In [12]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

myenv = Environment("myenv")
myenv.docker.enabled = True
myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn==0.20.3'])

'enabled' is deprecated. Please use the azureml.core.runconfig.DockerConfiguration object with the 'use_docker' param instead.


> Documentation : https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute<br>
> Pricing : https://azure.microsoft.com/en-us/pricing/details/machine-learning/

### We are going to create a compute cluster based on a predefined VM instance. Then you can see the compute cluster from the Azure ML Studio

In [13]:
%%time
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Use an unique name
cpu_cluster_name = 'clustertest'

# Tags
clusttags= {"Type": "CPU", 
            "Priority":"Dedicated",
            "Team": "DataScience", 
            "Country": "France"}

try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           vm_priority='dedicated',
                                                           min_nodes = 0, # Min nodes of the cluster
                                                           max_nodes = 4, # Max nodes of the cluster
                                                           tags=clusttags, 
                                                           description="Compute Clusters Std D2V2",
                                                           idle_seconds_before_scaledown=18000) #Timeout for scaling down
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Creating....
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
CPU times: user 109 ms, sys: 7.54 ms, total: 117 ms
Wall time: 21.1 s


In [14]:
#List of available compute cluster in your workspace
listcomputeservers = ws.compute_targets
for list in listcomputeservers:
    print(list)

AzureDatabricks
instanceaks
my-aks-9
automl
computeinstancenb
cpu-cluster
computeinstanceds12
automlclus551001
clustertest


In [15]:
# Compute cluster information
cpu_cluster.get_status().serialize()

{'currentNodeCount': 0,
 'targetNodeCount': 0,
 'nodeStateCounts': {'preparingNodeCount': 0,
  'runningNodeCount': 0,
  'idleNodeCount': 0,
  'unusableNodeCount': 0,
  'leavingNodeCount': 0,
  'preemptedNodeCount': 0},
 'allocationState': 'Steady',
 'allocationStateTransitionTime': '2021-04-12T08:35:47.680000+00:00',
 'errors': None,
 'creationTime': '2021-04-12T08:35:44.921574+00:00',
 'modifiedTime': '2021-04-12T08:36:00.340921+00:00',
 'provisioningState': 'Succeeded',
 'provisioningStateTransitionTime': None,
 'scaleSettings': {'minNodeCount': 0,
  'maxNodeCount': 4,
  'nodeIdleTimeBeforeScaleDown': 'PT18000S'},
 'vmPriority': 'Dedicated',
 'vmSize': 'STANDARD_D2_V2'}

### 4.4 Run

### This is the Python file we want to submit

In [16]:
# This is the python code we want to execute
!ls train_aml.py -l

-rwxrwxrwx 1 root root 1538 Nov 16 13:54 train_aml.py


In [17]:
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import DEFAULT_CPU_IMAGE

# 1. Python file
src = ScriptRunConfig(source_directory=project_folder, script='train_aml.py')

# 2. Set compute target to the one created in previous step
src.run_config.target = cpu_cluster.name

# 3. Set python environment
src.run_config.environment = myenv

In [18]:
# 4. Some tags for the run
runtags= {"Type": "test" , 
          "Langage" : "Python" , 
          "Framework" : "Scikit-Learn", 
          "Team" : "DataScience" , 
          "Country" : "France"}

In [19]:
# Let's submit
run = experiment.submit(config=src, tags=runtags)
run

Experiment,Id,Type,Status,Details Page,Docs Page
Lab3-AzureMLCompute,Lab3-AzureMLCompute_1618216566_d24e1d92,azureml.scriptrun,Preparing,Link to Azure Machine Learning studio,Link to Documentation


### 4.5 Interactive Notebook widget for viewing the run status

In [20]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

### 4.6 Checking run status

You can check the run status using the widget, the Azure ML studio or using this python code:

### Status: 
preparing -> running -> finalizing -> completed

In [33]:
print("Status =", run.get_status())

Status = Completed


In [34]:
run.get_details()

{'runId': 'Lab3-AzureMLCompute_1618216566_d24e1d92',
 'target': 'clustertest',
 'status': 'Completed',
 'startTimeUtc': '2021-04-12T08:46:08.749027Z',
 'endTimeUtc': '2021-04-12T08:49:17.993994Z',
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': 'eff3c2c9-03e5-47d1-8783-82a1c882dd35',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': 'train_aml.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': [],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'clustertest',
  'dataReferences': {},
  'data': {},
  'outputData': {},
  'jobName': None,
  'maxRunDurationSeconds': 2592000,
  'nodeCount': 1,
  'priority': None,
  'credentialPassthrough': False,
  'identity': None,
  'environment': {'name': 'myenv',
   'version': 'Autosave_2021-04-12T08:36:08Z_334fe002',
 

In [35]:
run.get_metrics()

{'alpha': [0.0,
  0.05,
  0.1,
  0.15000000000000002,
  0.2,
  0.25,
  0.30000000000000004,
  0.35000000000000003,
  0.4,
  0.45,
  0.5,
  0.55,
  0.6000000000000001,
  0.65,
  0.7000000000000001,
  0.75,
  0.8,
  0.8500000000000001,
  0.9,
  0.9500000000000001],
 'mse': [3424.3166882137343,
  3408.9153122589296,
  3372.649627810032,
  3345.14964347419,
  3325.294679467878,
  3311.5562509289744,
  3302.6736334017264,
  3297.658733944204,
  3295.74106435581,
  3296.316884705676,
  3298.9096058070622,
  3303.140055527517,
  3308.7042707723226,
  3315.3568399622573,
  3322.898314903962,
  3331.1656169285875,
  3340.024662032161,
  3349.364644348603,
  3359.093569748443,
  3369.1347399130477]}

In [36]:
run.get_metrics('mse')

{'mse': [3424.3166882137343,
  3408.9153122589296,
  3372.649627810032,
  3345.14964347419,
  3325.294679467878,
  3311.5562509289744,
  3302.6736334017264,
  3297.658733944204,
  3295.74106435581,
  3296.316884705676,
  3298.9096058070622,
  3303.140055527517,
  3308.7042707723226,
  3315.3568399622573,
  3322.898314903962,
  3331.1656169285875,
  3340.024662032161,
  3349.364644348603,
  3359.093569748443,
  3369.1347399130477]}

### Let's the results in the experiment

In [37]:
experiment

Name,Workspace,Report Page,Docs Page
Lab3-AzureMLCompute,AMLworkshop,Link to Azure Machine Learning studio,Link to Documentation


### Your compute cluster:

In [38]:
print("Status:")
cpu_cluster.get_status().serialize()

Status:


{'currentNodeCount': 1,
 'targetNodeCount': 1,
 'nodeStateCounts': {'preparingNodeCount': 0,
  'runningNodeCount': 1,
  'idleNodeCount': 0,
  'unusableNodeCount': 0,
  'leavingNodeCount': 0,
  'preemptedNodeCount': 0},
 'allocationState': 'Steady',
 'allocationStateTransitionTime': '2021-04-12T08:45:43.619000+00:00',
 'errors': None,
 'creationTime': '2021-04-12T08:35:44.921574+00:00',
 'modifiedTime': '2021-04-12T08:36:00.340921+00:00',
 'provisioningState': 'Succeeded',
 'provisioningStateTransitionTime': None,
 'scaleSettings': {'minNodeCount': 0,
  'maxNodeCount': 4,
  'nodeIdleTimeBeforeScaleDown': 'PT18000S'},
 'vmPriority': 'Dedicated',
 'vmSize': 'STANDARD_D2_V2'}

In [39]:
print("Nodes:")
cpu_cluster.list_nodes()

Nodes:


[{'nodeId': 'tvmps_ed4c230b615ed40c6e01f3ed848baacc30345fa763043c5b5b42595b31549dbb_d',
  'port': 50001,
  'publicIpAddress': '51.105.255.98',
  'privateIpAddress': '10.0.0.5',
  'nodeState': 'idle'}]

In [40]:
cpu_cluster.cluster_location

'westeurope'

In [41]:
cpu_cluster.created_on

datetime.datetime(2021, 4, 12, 8, 35, 44, 921574, tzinfo=tzlocal())

In [42]:
cpu_cluster.vm_size

'STANDARD_D2_V2'

### We can change some settings of the compute clusters using Python

In [43]:
cpu_cluster.get_status().serialize()

{'currentNodeCount': 1,
 'targetNodeCount': 1,
 'nodeStateCounts': {'preparingNodeCount': 0,
  'runningNodeCount': 1,
  'idleNodeCount': 0,
  'unusableNodeCount': 0,
  'leavingNodeCount': 0,
  'preemptedNodeCount': 0},
 'allocationState': 'Steady',
 'allocationStateTransitionTime': '2021-04-12T08:45:43.619000+00:00',
 'errors': None,
 'creationTime': '2021-04-12T08:35:44.921574+00:00',
 'modifiedTime': '2021-04-12T08:36:00.340921+00:00',
 'provisioningState': 'Succeeded',
 'provisioningStateTransitionTime': None,
 'scaleSettings': {'minNodeCount': 0,
  'maxNodeCount': 4,
  'nodeIdleTimeBeforeScaleDown': 'PT18000S'},
 'vmPriority': 'Dedicated',
 'vmSize': 'STANDARD_D2_V2'}

In [44]:
#cpu_cluster.update(min_nodes=1)

In [45]:
#cpu_cluster.update(max_nodes=6)

In [46]:
#cpu_cluster.update(idle_seconds_before_scaledown=120)

In [47]:
cpu_cluster.update(min_nodes=2, max_nodes=4, idle_seconds_before_scaledown=600)

### Deleting the compute cluster

In [48]:
cpu_cluster.delete()

In [49]:
cpu_cluster.provisioning_state

'Deleting'

Current provisioning state of AmlCompute is "Deleting"



> You can now open the Lab5 notebook

<img src="https://github.com/retkowsky/images/blob/master/Powered-by-MS-Azure-logo-v2.png?raw=true" height="300" width="300">