Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

#  Train using Azure Machine Learning Compute

* Initialize a Workspace
* Create an Experiment
* Introduction to AmlCompute
* Submit an AmlCompute run in a few different ways
    - Provision as a run based compute target 
    - Provision as a persistent compute target (Basic)
    - Provision as a persistent compute target (Advanced)
* Additional operations to perform on AmlCompute
* Find the best model in the run

## Prerequisites
If you are using an Azure Machine Learning Notebook VM, you are all set.  Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace.

In [1]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)

SDK version: 1.0.85


## Initialize a Workspace

Initialize a workspace object from persisted configuration

In [2]:
from azureml.core import Workspace

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

aiml50dg2ai
mlops-demo-rg
westeurope
6be80d16-1b1d-465b-bbdd-a298a0cbbf89


## Create An Experiment

**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments.

In [3]:
from azureml.core import Experiment
experiment_name = 'train-on-amlcompute'
experiment = Experiment(workspace = ws, name = experiment_name)

## Introduction to AmlCompute

Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created **within your workspace region** and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user. 

Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service. 

For more information on Azure Machine Learning Compute, please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)

If you are an existing BatchAI customer who is migrating to Azure Machine Learning, please read [this article](https://aka.ms/batchai-retirement)

**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.


The training script `train.py` is already created for you. Let's have a look.

### Run customer container training


In [31]:
import os
import shutil

project_folder = './workspace'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('train.py', project_folder)

'./workspace/train.py'

In [32]:
# use a custom Docker image
from azureml.core.container_registry import ContainerRegistry
from azureml.core.compute import ComputeTarget
from azureml.train.estimator import Estimator
from azureml.widgets import RunDetails

# this is an image available in Docker Hub
image_name = 'continuumio/anaconda3'

# you can also point to an image in a private ACR
image_registry_details = ContainerRegistry()
image_registry_details.address = "myregistry.azurecr.io"
image_registry_details.username = "username"
image_registry_details.password = "password"

# Find the compute
cpu_cluster_name = "cpu-cluster"
cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)

est = Estimator(source_directory=project_folder, 
                # We can also pass params:
                # script_params={
                #     '--numbers-in-sequence': 10
                # }, 
                compute_target= cpu_cluster.name, # We can run even local if docker is present 'local', 
                entry_script='train.py',
                # If I have already all my dependencies baked in the image like in our case
                # don't let the system build a new conda environment
                user_managed=True,
                # Other wise we can define conda dependencies to install which will 
                # build a custom image on top of the one we specified. E.g. if we
                # select miniconda3 which doesn't have any data science packages prebaked
                # you needed to uncomment the following:
                # user_managed=False, # Optional since this is the default value
                # conda_packages=['scikit-learn'],
                custom_docker_image=image_name,
                # uncomment below line to use your private ACR
                # image_registry_details=image_registry_details,
                # The following is needed if using default training images 
                # se_gpu = true,
                )

est.run_config.save("./SampleCustomImage.runconfig")

run = experiment.submit(est)
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

## Storage


In [None]:
from azureml.core import Datastore
# Default datastore, the Azure Blob Store associated with your workspace.
def_blob_store = ws.get_default_datastore() 
# The following call GETS the Azure Blob Store associated with your workspace.
# Note that workspaceblobstore is **the name of this store and CANNOT BE CHANGED and must be used as is** 
def_blob_store = Datastore(ws, "workspaceblobstore")
print("Blobstore's name: {}".format(def_blob_store.name))


# Upload a file there
def_blob_store.upload_files(["./SampleCustomImage.runconfig"], target_path="SampleUpload", overwrite=True)
print("Upload call completed")

In [None]:
!pip install azure.storage

In [18]:
from azure.storage.common.cloudstorageaccount import CloudStorageAccount
from azure.storage.common.models import AccessPolicy
from azure.storage.blob import BlockBlobService, PageBlobService, AppendBlobService
from azure.storage.models import CorsRule, Logging, Metrics, RetentionPolicy, ResourceTypes, AccountPermissions
from azure.storage.blob.models import BlobBlock, ContainerPermissions, ContentSettings
from datetime import datetime, timedelta
import time

import json
settings= {}
with open('./settings.json') as f:
    settings = json.load(f)


account_name = settings["STORAGE_ACCOUNT_NAME"]
account_key = settings["STORAGE_ACCOUNT_KEY"]

account = CloudStorageAccount(account_name, account_key)

blobService = account.create_block_blob_service()

container_name = "testattach"

policyId = "2020-04-29-readlist-access"

# Set access policy on container
access_policy = AccessPolicy(permission=ContainerPermissions(read=True, list=True),
                                     expiry=datetime.utcnow() + timedelta(hours=10))
identifiers = {policyId: access_policy}
acl = blobService.set_container_acl(container_name, identifiers)

# Wait 30 seconds for acl to propagate
time.sleep(30)


# Indicates to use the access policy set on the container
token = blobService.generate_container_shared_access_signature(
            container_name,
            id=policyId
)

print("Token: {}".format(token))

Token: dsadsadasdsadsads


In [19]:
from azureml.core.datastore import Datastore

datastore_name="test_datastore"


iris_data = Datastore.register_azure_blob_container(ws, 
                      datastore_name=datastore_name, 
                      container_name= container_name, 
                      account_name=account_name, 
                      sas_token=token,                              
                      overwrite=True)

In [29]:
from azureml.core.dataset import Dataset
from azureml.data.datapath import DataPath
import os

# The following call GETS the Azure Blob Store associated with your workspace.
# Note that workspaceblobstore is **the name of this store and CANNOT BE CHANGED and must be used as is** 
datastore = Datastore(ws, "test_datastore")

datastore_path = [
  DataPath(datastore, '*.txt')
]

dataset = Dataset.File.from_files(path=datastore_path)
dataset_name = 'txt_dataset'

# Register the dataset
dataset.register(workspace=ws,
                 name=dataset_name,
                 description='Text files in test_datastore',
                 create_new_version=True)

# Optionally you can create a temp mounting path
# import tempfile
# mounted_path = tempfile.mkdtemp()
# print (mounted_path)
# And you can mount in specific location
# with dataset.mount(mounted_path) as mount_context:

with dataset.mount() as mount_context:
    mount_context.start()
    # This is the point where the sataset is mounted
    print(mount_context.mount_point)
    # list top level mounted files and folders in the dataset
    print(os.listdir(mount_context.mount_point))

/tmp/tmp6lv1fnze
['Install.txt']


In [39]:
# Let's attach the dataset to the estimator
# First let's show how to get the dataset by name
dataset = Dataset.get_by_name(workspace=ws, name=dataset_name)

script_params = {
    # mount the dataset on the remote compute and pass the mounted path as an argument to the training script
    # the as_named_input also exposes the mount point as the DATA_FOLDER environment variable
    # It will also be accessible via run.input_datasets['DATA_FOLDER'] if you reference the azureML SDK
    # https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.abstract_dataset.abstractdataset?view=azure-ml-py#as-named-input-name-
    '--data-folder': dataset.as_named_input('DATA_FOLDER').as_mount()
}

est = Estimator(source_directory=project_folder, 
                # Pass the param to make the script ls the dir
                script_params = script_params, 
                compute_target= cpu_cluster.name, # We can run even local if docker is present 'local', 
                entry_script='train.py',
                user_managed=True,
                custom_docker_image=image_name,
                )

# est.run_config.save("./WithMount_CustomImage.runconfig")

run = experiment.submit(est)
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

# BLOB CONTAINER LEASE STATUS

In [None]:
####  RESET LEASE STATUS OF A CONTAINER ###########
from azure.storage.common.cloudstorageaccount import CloudStorageAccount

# Retrieve the storage account and the storage key
import json
settings= {}
with open('./settings.json') as f:
    settings = json.load(f)
account_name = settings["STORAGE_ACCOUNT_NAME"]
account_key = settings["STORAGE_ACCOUNT_KEY"]

# The container that has Lease status broken
container_name='test-lease'

# Create a blobservice client from a storage account client
account = CloudStorageAccount(account_name, account_key)
blobService = account.create_block_blob_service()
    
# Get a container lease
lease_id=blobService.acquire_container_lease(container_name, lease_duration=-1)
# Release that lease
blobService.release_container_lease(container_name,lease_id)