# Intro
In Azure Machine Learning, data scientsts can run *experiments* based on scripts that process data, train machine learning models, adn perform other data science tasks. The runtime context for each experiment run consists of two elements:
- The *environment* for the script, which includes all packages on which the script depends.
  - Creating Environments
  - Registering and Reusing Environments
- The *compute* target on which the environment will be deployed and the script run
  - create managed compute


# Learning Objectives
- Create and use environments
- Create and use compute targets

# Useful Links
- [Reuse environments for training and deployment by using Azure Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments)
- For more information about compute targets in Azure Machine Learning, see [What are compute targets in Azure Machine Learning?](https://docs.microsoft.com/azure/machine-learning/concept-compute-target).

# Ryan's Thoughts
## Environments

`4:30 pm July 29:` In my `kaggle_bike-sharing-demand-asmlMOD1-Copy1` I read a number of warnings about my sklearn not being the same version as that used in the model training (on workspace). This had me worried that I was getting bad numbers due to this. After testing on the Notebook on my ML workspace in Azure, that didn't seem to be the problem, but it highlights how running on a certain environment and having consistent packages is important!!

## Compute targets

`10:34 am July 30:` In my `kaggle_bike-...` I picked the compute target to be the Compute Instance to be `myCIWorkstationRyan` that I set up early in the tutorials in AZ-900. Local did not work because I was trying to run it from my desktop. from the error docs it appears to be because I didn't have Docker + other dependencies. The reason I tried local first is that I copied the code I used to run my model with a script in my `myCIWorkstationRyan` notebook. since the compute target was `local` when I ran on my home computer it tried to run in the same way on the machine where the code was stored. Since I changed to my own computer it didn't work. 

If I install docker, will this allow me to use my local desktop or laptop? I suspect yes with proper configuration.

`12:54 pm July 30:` Reading the lab, it says that when you run a Python script as an experiment, a conda environment is created to define the execution context for the script. Perhaps another problem could be that my computer doesn't have conda (I think?)

`1:03 pm July 30:` Cool experience, I ran the DP-100 LP2 MOD4 lab code with the target set to 'local' - which I ran on my desktop machine. in the error code, it told me to check the 60_control_log.txt with the error below:

In [None]:
[2020-07-30T19:01:15.435066] Using urllib.request Python 3.0 or later
Streaming log file azureml-logs/60_control_log.txt
FileNotFoundError(2, 'The system cannot find the file specified', None, 2, None)

Docker was not found on the target, check that it is installed and on the path.

Starting the daemon thread to refresh tokens in background for process with pid = 15892
Logging error in history service: SystemExit: 1

Uploading control log...


As we can see, it was the fact that Docker was not found on the target. I'm thinking I will install it on my computer...

`1:27 PM July 30:` Docker Desktop requires me to upgrade to Windows 10 Pro. This could take time so while I think I will upgrade, I am going to hold off until later to do so, for now I'll try just running on my CI instance


# Intro to Environments
Python code runs in the context of a *virtual environment* that defines the version of the Python runtime to be used as well as the installed packages available to the code. In most Python installations, packages are installed and managed in environments using ```Conda``` or ```pip```.

## Environments in Azure Machine Learning
In general, Azure Machine Learning handles environment creation and package installation for you - usually through the creation of Docker containers. You can specify the Conda or pip packages you need, and have Azure Machine Learning create an environment for the experiment.

In an enterprise machine learning solution, where experiments may be run in a variety of compute contexts, it can be important to be aware of the environments in which your experiment code is running. Environments are encapsulated by the Environment class; which you can use to create environments and specify runtime configuration for an experiment.

You can have Azure Machine Learning manage environment creation and package installation to define an environment, and then register it for reuse. Alternatively, you can manage your own environments and register them. This makes it possible to define consistent, reusable runtime contexts for your experiments - regardless of where the experiment script is run.

# Creating environments
There are multiple wasys to create environments in Azure Machine Learning:

## Creating an environment from a specification file

You can use a Conda or pip specification file to define the packages required in a Python environment, and use it to create an `Environment` object.

For Example, you could save the following Conda configuration settings in a file named `conda.yml`

You could then use the following code to create an Azure Machine Learning environment from the saved specification file:

In [None]:
from azureml.core import Environment

env = Environment.from_conda_specification(name = 'training_environment',
                                          file_path = './conda.yml')

## Creating environment from an existing Conda environment
If you have an existing Conda environment defined on your workstation, you can use it to define an Azure ML environment

In [None]:
from azureml.core import Environment

env = Environment.from_existing_conda_environment(name='training_environment',
                                                  conda_environment_name='py_env')

## Creating an environment by specifying packages
You can define an environment by specifying the Conda and pip pakcages you need in a `CondaDependencies` object, like this:

In [None]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

env = Environment('training_environment')
deps = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas', 'numpy'], 
                                pip_packages=['azureml-defaults'])
env.python.conda_dependencies = deps

# Registering and reusing environments
After you've created an environment, you can register it in your workspace and reuse it for future experiments that have the same Python dependencies.

## Registering an environment
Use the `register` method of an `Environment` object to register an environment:

In [1]:
env.register(workspace = ws)

NameError: name 'env' is not defined

You can view the registered environments in your workspace like this:

In [None]:
from azureml.core import Environment

env_names=Environment.list(workspace=ws)
for env_name in env_names:
    print('Name:', env__name)

## Retrieving and using an environment
You can retrieve a registered environment using the `get` method of the `Environment` class, and then assign it to a `ScriptRunConfig` or `Estimator`.

For example, the following code sample retrieves the *training_environment* registered environment, and assigns it to an estimator.

In [2]:
from azureml.core import Environment
from azureml.train.estimator import Estimator

training_env = Environment.get(workspace=ws, name='training_environment')
estimator = Estimator(source_directory='experiment_folder',
                      entry_script='training_script.py',
                      compute_target='local',
                      environment_definition = training_env
                     )

NameError: name 'ws' is not defined

When an experiment based on the estimator is run, Azure Machine Learning will look for an existing environment that matches the definition, and if none is found a new environment will be created based on the registered environment specification.

# Introduction to compute targets
In Azure Machine Learning, Compute Targets are physical or virtual computers on which experiments are run.

The ability to assign experiment runs to specific compute targets helps you implement a flexible data science ecosystem in the following ways:

- Code can be developed and tested on local or low-cost compute, and then moved to more scalable compute for production workloads.
- You can run individual processes on the compute target that best fits its needs. For example, by using GPU-based compute to train deep learning models, and switching to lower-cost CPU-only compute to test and register the trained model.

One of the core benefits of cloud computing is the ability to manage costs by paying only for what you use. In Azure Machine Learning, you can take advantage of this principle by defining compute targets that:

- Start on-demand and stop automatically when no longer required.
- Scale automatically based on workload processing needs.

## Types of Compute

Azure Machine Learning supports multiple types of compute for experimentation and training, and for production inferencing. This enables you to select the most appropriate type of compute target for your particular needs.

### Local compute
You can specify a local compute target for most processing tasks in Azure Machine Learning. This runs the experiment on the same compute target as the code used to initiate the experiment, which may be your physical workstation or a virtual machine such as an Azure Machine Learning compute instance on which you are running a notebook.

Local compute is generally a great choice during development and testing with low to moderate volumes of data.

### Compute clusters
For experiment workloads with high scalability requirements, you can use Azure Machine Learning compute clusters; which are multi-node clusters of Virtual Machines that automatically scale up or down to meet demand. This is a cost-effective way to run experiments that need to handle large volumes of data or use parallel processing to distribute the workload and reduce the time it takes to run.

### Inference clusters
To deploy trained models as production services, you can use Azure Machine Learning inference clusters, which use containerization technologies to enable rapid initialization of compute for on-demand inferencing.

### Attached compute
If you already use an Azure-based compute environment for data science, such as a virtual machine or an Azure Databricks cluster, you can attach it to your Azure Machine Learning workspace and use it as a compute target for certain types of workload.

#### More Information: 
For more information about the types of compute target supported in Azure Machine Learning, see What are compute targets in Azure Machine Learning? in the Azure Machine Learning documentation.

# Create compute targets

The most common way to creat or attach a compute target are the use the **Compute** page in Azure Machine Learning studio, or to use the Azure Machine Learning SDK to provision compute targets in code.

## Creating a managed compute target with the SDK
A *managed* compute target is one that is managed by Azure Machine Learning, such as an Azure Machine Learning Compute Cluster

To create an Azure Machine Learning compute cluster, use the `azureml.core.compute.ComputeTarget` class and the `AmlCompute` class, like this;

In [3]:
from azureml.core import Workspace
from azureml.core.compute import ComputeTarget, AmlCompute

#Load the workspace from the saved config file
we = Workspace.from_config()

# Specify a name for the compute (unique within the workspace)
compute_name = 'aml-cluster'

#Define compute configuration
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',
                                                       min_nodes=0, max_nodes=4,
                                                       vm_priority='dedicated'
                                                      )

# Create the compute
aml_cluster = ComputeTarget.create(ws, compute_name, compute_config)
aml_cluster.wait_for_completion(show_output=True)

NameError: name 'ws' is not defined

In this example, a cluster with up to four nodes that is based on the STANDARD_DS12_v2 virtual machine image will be created. The priority for the virtual machines (VMs) is set to dedicated, meaning they are reserved for use in this cluster (the alternative is to specify lowpriority, which has a lower cost but means that the VMs can be preempted if a higher-priority workload requires the compute).

More information here on `AmlCompute` class here: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.amlcompute.amlcompute?view=azure-ml-py

## Attaching an unmanaged compute target with the SDK
An *unmanaged* compute target is one that is defined and managed outside of the Azure Machine Learning workspace; for example, an Azure virtual machine or an Azure Databricks cluster.

The code to attach an existing unmanaged compute target is similar to the code used to create a managed compute target, except that you must use the `ComputeTarget.attach()` method to attach the existing compute based on its target-specific configuration settings.

For example, the following code can be used to attach an existing Azure Databricks cluster:

In [None]:
from azureml.core import Workspace
from azureml.core.compute import ComputeTarget, DatabricksCompute

# Load the workspace from the saved config file
ws = Workspace.from_config()

# Specify a name for the compute (unique within the workspace)
compute_name = 'db_cluster'

# Define configuration for existing Azure Databricks cluster
db_workspace_name = 'db_workspace'
db_resource_group = 'db_resource_group'
db_access_token = '1234-abc-5678-defg-90...'
db_config = DatabricksCompute.attach_configuration(resource_group=db_resource_group,
                                                   workspace_name=db_workspace_name,
                                                   access_token=db_access_token
                                                  )

# Create the compute
databricks_compute = ComputeTarget.attach(ws, compute_name, db_config)
databricks_compute.wait_for_completion(True) #do they mean show_output=True? there are 3 arguments to this function with their own tag (not the right word)

## Checking for an existing compute target
In many cases, you will want to check for the existence of a compute target, an only create a new one if there isn't already one with the specified name. To accomplish this, you can catch the `ComputeTargetException` exception, like this:


In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

compute_name = "aml-cluster"

# Check if the compute target exists
try:
    aml_cluster = ComputeTarget(workspace=ws, name=compute_name)
    print('Fuond existing cluster.')
except ComputeTargetException: #ComputeTargetException is a class
    # If not, create it
    compute_config = AmlCompute.provisioning_configuration(vm_size = 'STANDARD_DS12_V2',
                                                           max_nodes=4
                                                          )
    aml_cluster = ComputeTarget.create(ws,compute_name, compute_config)
    
aml_cluster.wait_for_completion(show_output=True)

# Use compute targets
After you've created environments and compute targets in your workspace, you can use them to run specific workloads; such as experiments

To use a particular compute target, you can specify it in the appropriate parameter for an experiment run configuration or estimator. For example, the following code configures an estimator to use the compute target named *aml-cluster*:

In [4]:
from azureml.core import Environment
from azureml.train.estimator import Estimator

compute_name = 'aml-cluster'

training_env = Environment.get(workspace=ws, name='training_environment')

estimator = Estimator(source_directory='experiment_folder',
                      entry_script='training_script.py',
                      environment_definition=training_env,
                      compute_target=compute_name
                     )

NameError: name 'ws' is not defined

When an experiment for the estimator is submitted, the run will be queued while the compute target is started and the specified environment deployed to it, and then the run will be processed on the compute environment.

Instead of specifying the name of the compute target, you can specify a `ComputeTarget` object, like this:

In [None]:
from azureml.core import Environment
from azureml.train.estimator import Estimator
from azureml.core.compute import ComputeTarget

compute_name = 'aml-cluster'

training_cluster = ComputeTarget(workspace=ws, name=compute_name)

training_env = Environment.get(workspace=ws, name='training_environment')

estimator = Estimator(source_directory='experiment_folder',
                      entry_script='training_script.py',
                      environment_definition=training_env,
                      compute_target=training_cluster)