## Managing Azure Machine Learning Services
Using Azure Machine Learning Services as a platform for developing and operationalizing machine learning services is dependent upon configuring a number of assets that work together to form that platform. Completing this process involves:
* Configuring an Azure Machine Learning **Workspace**
* Configuring one or more Azure Machine Learning **Dataspaces**, and registering them to the AML Workspace
* Configuring one or more Azure Machine Learning **Compute Targets**, and registering them to the AML Workspace
* Configuring the execution **Environment** that runs on the Compute targets, and that facilitates the execution of Experiments (runs)

The following sections demonstrate executing these steps to enable the **Team Data Science Process** using Azure Machine Learning services. 

## Managing Azure Machine Learning Workspaces
Creating Azure Machine Learning Services workspaces is a very straight-forward operation having only four (4) parameters:
1. An Azure **Subscription ID** (GUID)
2. A **Workspace Name** for the new workspace being created
3. The name of a new or existing Azure **Resource Group**
4. An Azure Datacenter **Location**; e.g., US East, US West 2, North Europe, West Europe

What's more, Azure Machine Learning Service workspaces can be managed using a variety of tools and approaches:
* Using the Azure Portal interface
* Using an Azure Resource Manager (ARM) template
* Using the **Azure Machine Learning SDK for Python** from a variety of IDE's:
  * Azure Jupyter Notebooks
  * Locally installed Jupyter Notebooks
  * Visual Studio Code
  * Azure Data Science Virtual Machines
  * Azure Databricks
* Using the **CLI Extension for Azure Machine Learning** from one of the following Command-Line Interfaces:
  * Azure CLI
  * Azure PowerShell

### Create a New Workspace using the Azure Machine Learning SDK for Python
The following steps demonstrate how easily Azure ML workspaces can be managed using Python

#### Import Required Libraries

In [None]:
import os
from azureml.core import Workspace
from azureml.core import Datastore
#from azureml.core.model import Model

#### Define and Initialize Globals

In [None]:
const_subscription_id = os.getenv("SUBSCRIPTION_ID", default="<your subscription id here>")

workspace_name = os.getenv("WORKSPACE_NAME", default="customer-churn")
resource_group_name = os.getenv("RESOURCE_GROUP_NAME", default="amls-rg")
create_new_rg = os.getenv("CREATE_NEW_RG", default=True)
location_name = os.getenv("LOCATION_NAME", default="eastus")

#### Create the New Workspace

In [None]:
# First, try to locate an existing Workspace that matches your specifications
try:
    ws = Workspace(subscription_id = const_subscription_id,
                   resource_group = resource_group_name,
                   workspace_name = workspace_name)
    print('An existing Workspace matching your specification was referenced.')
    
except:    
    ws = Workspace.create(name=workspace_name,
                          subscription_id = const_subscription_id,
                          resource_group = resource_group_name,
                          create_resource_group = create_new_rg,
                          location = location_name)

#### View Workspace Configuration Details 

In [None]:
ws.get_details()

### Saving the Workspace Configuration Details
Saving the new workspace's configuration details to a local file enables loading the same workspace from other Jupyter Notebooks or Python scripts. Since by default the configuration file **.azureml\config.json** is created in the directory containing the Jupyter Notebook file used to create the workspace, it can only be loaded by Jupyter Notebooks and Python scripts that reside in that directory or its sub-directories. However, by simply copying the **.azureml\config.json** file to a different directory it can then be used to load this workspace from Jupyter Notebooks and Python scripts residing in that directory or its sub-directories. 

In [None]:
ws.write_config()

# Subsequently, the following code is used to load
# the workspace from scripts and Jupyter Notebooks

# ws = Workspace.from_config()

## Managing Azure Machine Learning Datastores
In Azure Machine Learning services, datastores are managed independently from compute resources to provide a layer of abstraction between those resources. This configuration allows for architectural flexibility by enabling storage resources to be added or removed without requiring any coding changes.  

#### Get a Reference to the Workspace's Default Datastore
A Datastore is automatically created as part of the Workspace provisioning process.

In [None]:
ds = ws.get_default_datastore()

#### Defining a Different Default Datastore for the Current Workspace

In [None]:
ws.set_default_datastore('datastore name')

#### Enumerate All Datastores Currently Registered in the Current Workspace

In [None]:
datastores = ws.datastores
for name, ds in datastores.items():
    print(name, ds.datastore_type)

#### Get a Reference to a Specific Datastore Currently Registered in the Current Workspace

In [None]:
ds = Datastore.get(ws, datastore_name='workspaceblobstore')  # datastore name

### Registering New Datastores with the Workspace
* Use the **register_azure_blob_container()** method to register an Azure Blob Container with a Workspace
* Use the **register_azure_file_share()** method to register an Azure File Share with a Workspace

In [None]:
ds = Datastore.register_azure_blob_container(workspace=ws, datastore_name='datastore name', 
                                             container_name='azure blob container name',
                                             account_name='storage account name', account_key='storage account key',
                                             create_if_not_exists=True) 

In [None]:
ds = Datastore.register_azure_file_share(workspace=ws, datastore_name='datastore name', 
                                         file_share_name='file share name',
                                         account_name='storage account name', account_key='storage account key',
                                         create_if_not_exists=True)

### Working with Data in Azure Machine Learning Datastores
* Use the **upload()** method of the Datastore object to load files from a local source directory to the Datastore
* Use the **download()** method of the Datastore object to copy files from the Datastore to a local target directory

In [None]:
import azureml.data
from azureml.data.azure_storage_datastore import AzureFileDatastore, AzureBlobDatastore

ds.upload(src_dir='your source directory',
          target_path='your remote target path',
          overwrite=True, show_progress=True)

In [None]:
ds.download(target_path='your local target path',
            prefix='your prefix', show_progress=True)

### Accessing Datastores for Training and Evaluating Machine Learning Models
* Use the **as_mount()** method to mount a Datastore on a compute target
* Use the **as_download()** method to download Datastore contents to the location specified by the **path_on_compute** parameter
* Use the **as_upload()** method to upload a file to the Datastore from the location specified by the **path_on_compute** parameter
* Use the **path()** method to reference a specific folder or file in the Datastore

In [None]:
from azureml.data.data_reference import DataReference

ds.as_mount()
ds.as_download(path_on_compute='your path on compute')
ds.as_upload(path_on_compute='yourfilename')

# Download the contents of the `./Data` directory in ds to the compute target
ds.path('./Data').as_download()

## Managing Azure Machine Learning Compute Targets
Implementing the **Team Data Science Process** customarily involves performing activities like exploratory data analysis (EDA), feature selection, dimensionality reduction, and the training and testing of machine learning models. These tasks are often initially undertaken using small samples extracted from data sources, and as such are easily achieved using local resources such as a laptop computer or desktop workstation. As the development cycle progresses it often becomes necessary to perform more computationally expensive operations such as hyper-parameter tuning, training models with larger amounts of data, and cross-validation. Ultimately the production solution must then be deployed to an operational platform capable of scaling to accommodate potentially massive data volumes. To that end, Azure Machine Learning services is capable of using a spectrum of **compute targets** in order to accommodate the development lifecycle. AML **Compute targets** are managed compute infrastructure that enable the rapid provisioning of single to multi-node compute resources. They are created within **Workspace** regions and are available to all authenticated **Workspace** users. They autoscale by default when a job is submitted, and they execute in containerized **Environments** that package all user-specified dependencies. These compute targets include:  
* Local Computer
* Azure Machine Learning Compute
* Remote Virtual Machine
* Azure Databricks
* Azure Data Lake Analytics
* Azure HDInsight
* Azure Batch

When an Azure Machine Learning services Workspace is created using Python, the local computer is automatically attached as the default compute target. This is true regardless of whether that target is a laptop, a desktop, or a virtual machine running in the cloud; e.g., Azure Notebooks, Data Science Virtual Machine (DSVM). When the time comes to execute Python machine learning experiments at scale, an appropriate compute target can be created, attached and configured so that it contains the Python environment needed to execute those scripts; including all of the dependencies referenced in those scripts.

#### Managing Environments and their Dependencies using Run Configurations
One primary benefit of Azure Machine Learning services is that code developed on a development computer can then be promoted to a highly scalable compute target, such as an Azure Machine Learning Services cluster, without making any changes to the code. This greatly simplifies the deployment process and promotes adopting a DevOps approach to delivery machine learning and artificial intelligence solutions. 

* **User-Managed Environments:** When you are using your local development computer as the compute target no further action is needed since you will have already configured it with all the resources (dependecies) required to conduct the experiments; e.g., Scikit-Learn, CNTK, Keras.

* **System-Managed Environment:** When your are ready to promote your experiments to a more scalable compute target you must ensure all dependecies are satisfied.  This is most easily and most commonly accomplished using Conda to manage the Python environment, and is assumed by default.  You acccomplish this by using the **CondaDependency** class to add a **conda_dependencies.yml** file to your Workspace.  When you provision a new compute target, the YML file defines the configurations for a new Docker containerized environment that contains all the packages and model dependencies required to execute your Python scripts.

What's more, Azure Machine Learning services compute targets can either be created on-demand when a run is scheduled, or they can be created as a persistance resource.

* **Run-based Compute Creation:** As the naming implies, run-based (on-demand) compute targets are created as part of the schedule run (execution) and subsequently deleted automatically when the script execution completes. As of the time of this writing, the  run-based creation feature is still in Preview, and doesn't yet support automated hyper-parameter tuning or automated machine learning.  For these reasons I currently recommend creating Persistent compute targets whenever possible.  

* **Persistent Compute Creation:** As the naming implies, persistent compute targets are persistently registered with the Azure Machine Learning services Workspace, and are therefore available for use across jobs.  What's more, they can be shared with other users in the Workspace; a capability that helps teams to conserve resources while ensuring consistency on team projects.

#### Example: Provision a Persistant Azure Machine Learning Compute Target
The following code demonstrates provisioning an Azure Machine Learning cluster as a compute target that will be available to all users of the Workspace for multiple job executions (runs). Setting the **gpu_enabled** parameter to **True** creates a GPU enabled cluster.

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

gpu_enabled = False

# Choose a name and size for your CPU cluster
if gpu_enabled:
    cluster_name = "gpucluster"
    vm_spec = "STANDARD_NC6"
    
else:
    cluster_name = "cpucluster"
    vm_spec = 'STANDARD_D2_V2'
    

# Verify that cluster does not exist already
try:
    aml_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
    
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size=vm_spec, min_nodes=0, max_nodes=4)
    aml_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

aml_cluster.wait_for_completion(show_output=True)

#### Example: Create a Run Configuration for the Persistant Compute Target
The following code demonstrates how to specify a **system-managed environment** for the persistent Azure Machine Learning **compute target** created in the cell above.  This operation will add all the required packages and other model dependencies to the Docker container in which the AML cluster executes.

In [None]:
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import DEFAULT_CPU_IMAGE

# Create a new runconfig object
run_amlcompute = RunConfiguration()

# Use the cpu_cluster you created above. 
run_amlcompute.target = cpu_cluster

# Enable Docker
run_amlcompute.environment.docker.enabled = True

# Set Docker base image to the default CPU-based image
run_amlcompute.environment.docker.base_image = DEFAULT_CPU_IMAGE

# Use conda_dependencies.yml to create a system-managed conda environment in the Docker image
run_amlcompute.environment.python.user_managed_dependencies = False

# Auto-prepare the Docker image when used for execution (if it is not already prepared)
run_amlcompute.auto_prepare_environment = True

# Specify CondaDependencies obj, add necessary packages
run_amlcompute.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])

## Managing Azure Machine Learning Runs
The preceding sections have described how to provision various resources required to establish an Azure Machine Learning services operating environment; e.g., a Workspace, storage resources, compute resources, and run configurations. Having completed these tasks machine learning experiments can finally be executed; these are referred to as **runs**.

#### Create an Experiment

In [None]:
from azureml.core import Experiment
experiment_name = 'my_experiment'

exp = Experiment(workspace=ws, name=experiment_name)

#### Submit an Experiment

In [None]:
from azureml.core import ScriptRunConfig
import os 

script_folder = os.getcwd()
src = ScriptRunConfig(source_directory = script_folder, script = 'train.py', run_config = run_local)
run = exp.submit(src)
run.wait_for_completion(show_output = True)