# Train on AML Compute

Train MLflow Projects on Azure Machine Learning Compute.

## Table of Contents
1. Prerequisites
    - 1.1 Initialize Tracking Store and Experiment
    - 1.2 Configure the Backend Configuration object
    - 1.3 Modify your Environment specification
3. Submit Run


# Prerequisites 
Ensure you have done the following before running this notebook,
- Connected to an AML Workspace
- Have an existing Compute cluster in that Workspace
- Have an MLproject file with an environment specification

In [None]:
# Prereq Checks

# Workspace check
from azureml.core import Workspace

workspace = Workspace.from_config()
print(workspace.name, workspace.resource_group, workspace.location, workspace.subscription_id, sep = '\n')

# Existing compute check
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

cpu_cluster_name = "cpu-cluster"
try:
    cpu_cluster = ComputeTarget(workspace = workspace, name = cpu_cluster_name)
    print("Found existing cluster, yay!")
except ComputeTargetException:
    print("This compute target is not associated with your workspace!")


## Initialize Tracking Store and Experiment

### Set Tracking URI 

Set the MLflow tracking URI to point to your Azure ML Workspace. The subsequent logging calls from MLflow APIs will go to Azure ML services and will be tracked under your Workspace.

In [None]:
from azureml import core
from azureml.core import Workspace

import mlflow

workspace = Workspace.from_config()  
mlflow.set_tracking_uri(workspace.get_mlflow_tracking_uri())

### Create Experiment

Create an Mlflow Experiment to organize your runs. It can be set either by passing the name as a **parameter** in the mlflow.projects.run call or by the following,

In [None]:
experiment_name = "mlflow-example"
mlflow.set_experiment(experiment_name)

## Create the Backend Configuration Object

The backend configuration object will store necesary information for the integration such as the compute target and whether to use your local managed environment or a system managed environment. 

The integration will accept "COMPUTE" and "USE_CONDA" as parameters where "COMPUTE" is set to the name of your remote compute cluster and "USE_CONDA" which creates a new environment for the project from the environment configuration file. If "COMPUTE" is present in the object, the project will be automatically submitted to the remote compute and ignore "USE_CONDA". Mlflow accepts a dictionary object or a JSON file.

In [None]:
# dictionary
backend_config = {"COMPUTE": "cpu-cluster", "USE_CONDA": False}


## Modify your Environment specification

Add the azureml-mlflow package as a pip dependency to your environment configuration file. The project can run without this addition, but key artifacts and metrics will not be logged to your Workspace. Adding it to to the file will look like this,

```
name: mlflow-example
channels:
  - defaults
  - anaconda
  - conda-forge
dependencies:
  - python=3.6
  - scikit-learn=0.19.1
  - pip
  - pip:
    - mlflow
    - azureml-mlflow
```

# Submit Run 



In [None]:
remote_mlflow_run = mlflow.projects.run(uri=".", 
                                    parameters={"alpha":0.3},
                                    backend = "azureml",
                                    backend_config = backend_config)