# Azure ML Local Run
In this notebook, we create an Azure ML workspace, and use it to locally run the training script.
## Imports and definitions

In [None]:
import os
import shutil
import json
import pandas as pd
from azureml.core import Workspace, Experiment, ScriptRunConfig
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.compute import ComputeTarget, BatchAiCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.train.estimator import Estimator
import azureml.core
print('azureml.core.VERSION={}'.format(azureml.core.VERSION))

## Read in the Azure ML workspace
Read in the the workspace created in a previous notebook.

In [None]:
ws = Workspace.from_config()

## Configure a Batch AI cluster
Define the properties of the cluster.

In [None]:
batchai_cluster_name = 'mabouhype'
provisioning_config = BatchAiCompute.provisioning_configuration(
        vm_size='Standard_D4_v2',
        cluster_min_nodes=0,
        cluster_max_nodes=16,
        autoscale_enabled=True)

Create a configured Batch AI cluster, if it doesn't already exist.

In [None]:
if batchai_cluster_name in ws.compute_targets:
    compute_target = ws.compute_targets[batchai_cluster_name]
    if type(compute_target) is not BatchAiCompute:
        raise Exception('Compute target {} is not a Batch AI cluster.'
                        .format(batchai_cluster_name))
    print('Using pre-existing Batch AI cluster {}'
         .format(batchai_cluster_name))
else:
    # Create the cluster
    compute_target = ComputeTarget.create(ws, batchai_cluster_name, provisioning_config)

    # You can poll for a minimum number of nodes and set a specific timeout. 
    # If min node count is provided, priovisioning will use the scale settings for the cluster.
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

Print a detailed view of BatchAI cluster status.    

In [None]:
pd.Series(compute_target.get_status().serialize()).to_frame()

## Upload the data to the cloud

In [None]:
ds = ws.get_default_datastore()
ds.upload(src_dir=os.path.join('.', 'data'), target_path='data', overwrite=True, show_progress=True)

Create an estimator that specifies the location of the script, sets up its parameters, including the location of the data, defines the compute target, and specifies the packages needed to run the script.

In [None]:
est = Estimator(source_directory=os.path.join('.', 'scripts'), 
                entry_script='TrainTestClassifier.py',
                script_params={'--data-folder': ds.as_mount(),
                               '--estimators': '1000',
                               '--match': '5',
                               '--ngrams': '2',
                               '--min_child_samples': '10'},
                compute_target=compute_target,
                conda_packages=['pandas==0.23.4',
                                'scikit-learn==0.20.0'],
                pip_packages=[# 'azureml-sdk',
                              'lightgbm==2.1.2'])

Get an experiment to run the script; create it if it doesn't already exist.

In [None]:
exp = Experiment(workspace=ws, name='mabouhypelocal')

Submit the script to be run. This should return almost immediately, and the value will be a run object.

In [None]:
run = exp.submit(est)
run

In [None]:
run.get_status()

The experiment returns a table with a link to the `Details Page` in the Azure Portal. That page will let you monitor the status of this run of the experiment, and that of previous runs of that experiment. By clicking on a particular run, you can see its details, files output by the script, and the logs of the run, including the `driver.log` with the script's print outs.

Get an object associated with the latest run. Using this object, you can programmatically control the job. This object was the value returned by the `exp.submit(src)` call.

In [None]:
run = list(exp.get_runs())[0]

Wait for the run to complete. This returns a `dict` with detailed information about the run. Here, we see that the run has `Completed`. Other states include `Running` and `Failed`.

In [None]:
run_status = run.wait_for_completion()
run_status['status']

We can also get the metrics logged by the script during its execution.

In [None]:
run.get_metrics()