# Train an ML.NET model in Azure ML

In this tutorial, you will train a regression model to predict house prices using the Azure ML SDK and an ML.NET .exe

You’ll use Azure Machine Learning to: 

- Initialize a workspace 
- Create a compute cluster
- Define a training environment
- Train a model remotely
- Register your model
- Generate predictions locally

## Prerequisities

- If you are using an Azure Machine Learning Notebook VM, your environment already meets these prerequisites. Otherwise, go through the [configuration notebook](../../../../../configuration.ipynb) to install the Azure Machine Learning Python SDK and [create an Azure ML Workspace](https://docs.microsoft.com/azure/machine-learning/how-to-manage-workspace#create-a-workspace).


In [24]:
# Check core SDK version number
import azureml.core

print("SDK version:", azureml.core.VERSION)


SDK version: 1.0.85


## Diagnostics

Opt-in diagnostics for better experience, quality, and security in future releases.

In [25]:
from azureml.telemetry import set_diagnostics_collection

set_diagnostics_collection(send_diagnostics=True)

Turning diagnostics collection on. 


## Initialize a workspace

Initialize a [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`, using the [from_config()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py#from-config-path-none--auth-none---logger-none---file-name-none-) method.

In [26]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

Workspace name: gopalv-ws
Azure region: westus2
Subscription id: 15ae9cb6-95c1-483d-a0e3-b1a1a3b06324
Resource group: aifxdemo


## Create or attach existing Azure ML Managed Compute

You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/concept-compute-target) for training your model. In this tutorial, we use [Azure ML managed compute](https://docs.microsoft.com/azure/machine-learning/how-to-set-up-training-targets#amlcompute) for our remote training compute resource. Specifically, the below code creates a `STANDARD_NC6` GPU cluster that autoscales from 0 to 4 nodes.

**Creation of Compute takes approximately 5 minutes.** If the Aauzre ML Compute with that name is already in your workspace, this code will skip the creation process. 

As with other Azure servies, there are limits on certain resources associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/how-to-manage-quotas) on the default limits and how to request more quota.

> Note that the below code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`.

In [27]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException


# choose a name for your cluster
cluster_name = 'cpu-cluster'

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', 
                                                           max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

Found existing compute target.
{'currentNodeCount': 1, 'targetNodeCount': 0, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 1, 'preemptedNodeCount': 0}, 'allocationState': 'Resizing', 'allocationStateTransitionTime': '2020-03-18T21:07:55.464000+00:00', 'errors': None, 'creationTime': '2020-03-12T01:21:20.794053+00:00', 'modifiedTime': '2020-03-12T01:21:36.212615+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_D2_V2'}


## Define a training environment

### Create a project directory
Create a directory that will contain all the code from your local machine that you will need access to on the remote resource. This includes the training script an any additional files your training script depends on.

In [28]:
#import os

project_folder = 'linux-x64'

# try:
#     os.makedirs(project_folder, exist_ok=False)
# except FileExistsError:
#     print('project folder exists, moving on...'.format(project_folder))

### Create an experiment

In [29]:
from azureml.core import Experiment

experiment_name = 'mlnet-train'
experiment = Experiment(ws, name=experiment_name)

### Specify dependencies with an environment

There are a number of ways to [use environments](https://docs.microsoft.com/azure/machine-learning/how-to-use-environments) for specifying dependencies during model training. In this case, we use a default environment.

In [30]:
from azureml.core import Environment
envs = Environment.list(workspace=ws)
env = Environment.get(workspace=ws, name="AzureML-Minimal")




### Create a ScriptRunConfig

Use the [ScriptRunConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py) class to define your run. Specify the source driectory, compute target, and environment.

In [31]:
from azureml.core import ScriptRunConfig

exe = 'IowaHouse'
data_file = 'data.csv'
script_file = 'mlnet-script.py'
output_dir = 'outputs'
model_file = 'model.zip'

script_args = [
    '--exe', exe,
    '--data_file', data_file,
    '--output_dir', output_dir,
    '--model_file', model_file,
]
# Add training script to run config
runconfig = ScriptRunConfig(
    source_directory=project_folder,
    script=script_file,
    arguments=script_args)

# Attach compute target to run config
runconfig.run_config.target = cluster_name

# Uncomment the line below if you want to try this locally first
# runconfig.run_config.target = "local"

# Attach environment to run config
runconfig.run_config.environment = env

## Train remotely

### Submit your run

In [32]:
# Submit run 
run = experiment.submit(runconfig)

# to get more details of your run
print(run.get_details())



### Monitor your run

Use a widget to keep track of your run. You can also view the status of the run within the [Azure Machine Learning service portal](https://ml.azure.com).

In [33]:
from azureml.widgets import RunDetails

RunDetails(run).show()
run.wait_for_completion(show_output=True)

RunId: mlnet-train_1584565960_cdbd8d9b
Web View: https://ml.azure.com/experiments/mlnet-train/runs/mlnet-train_1584565960_cdbd8d9b?wsid=/subscriptions/15ae9cb6-95c1-483d-a0e3-b1a1a3b06324/resourcegroups/aifxdemo/workspaces/gopalv-ws

Streaming azureml-logs/55_azureml-execution-tvmps_4bf23342065c5fda68f8a9f1b37b010e80ff9eb651d510f375ddbfbf6d05bdd5_d.txt

2020-03-18T21:19:38Z Starting output-watcher...
2020-03-18T21:19:38Z IsDedicatedCompute == True, won't poll for Low Pri Preemption
Login Succeeded
Using default tag: latest
latest: Pulling from azureml/azureml_eb3a146896d6ce750b6d0565097198b9
a1298f4ce990: Pulling fs layer
04a3282d9c4b: Pulling fs layer
9b0d3db6dc03: Pulling fs layer
8269c605f3f1: Pulling fs layer
6504d449e70c: Pulling fs layer
4e38f320d0d4: Pulling fs layer
b0a763e8ee03: Pulling fs layer
11917a028ca4: Pulling fs layer
a6c378d11cbf: Pulling fs layer
6cc007ad9140: Pulling fs layer
6c1698a608f3: Pulling fs layer
a64bf7316c94: Pulling fs layer
d751131ab532: Pulling fs laye

{'runId': 'mlnet-train_1584565960_cdbd8d9b',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2020-03-18T21:19:31.367614Z',
 'endTimeUtc': '2020-03-18T21:21:38.659785Z',
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json',
  'ContentSnapshotId': '072fb72c-f3f1-499b-986f-60ba501644d5',
  'azureml.git.repository_uri': 'git@github.com:gvashishtha/mlnet-azureml.git',
  'mlflow.source.git.repoURL': 'git@github.com:gvashishtha/mlnet-azureml.git',
  'azureml.git.branch': 'master',
  'mlflow.source.git.branch': 'master',
  'azureml.git.commit': 'c59606a7270b607db805f3f548659015e5c211e4',
  'mlflow.source.git.commit': 'c59606a7270b607db805f3f548659015e5c211e4',
  'azureml.git.dirty': 'True'},
 'inputDatasets': [],
 'runDefinition': {'script': 'mlnet-script.py',
  'useAbsolutePath': False,
  'arguments': ['--exe',
   'IowaHouse',
   '--data_file',
   'data.c

## Test your model

Now that we are done training, let's see how well this model actually performs.

### Get your latest run
First, pull the latest run using `experiment.get_runs()`, which lists runs from `experiment` in reverse chronological order.

In [34]:
from azureml.core import Run

last_run = next(experiment.get_runs())

In [35]:
last_run.get_file_names()

['azureml-logs/55_azureml-execution-tvmps_4bf23342065c5fda68f8a9f1b37b010e80ff9eb651d510f375ddbfbf6d05bdd5_d.txt',
 'azureml-logs/65_job_prep-tvmps_4bf23342065c5fda68f8a9f1b37b010e80ff9eb651d510f375ddbfbf6d05bdd5_d.txt',
 'azureml-logs/70_driver_log.txt',
 'azureml-logs/75_job_post-tvmps_4bf23342065c5fda68f8a9f1b37b010e80ff9eb651d510f375ddbfbf6d05bdd5_d.txt',
 'azureml-logs/process_info.json',
 'azureml-logs/process_status.json',
 'logs/azureml/140_azureml.log',
 'logs/azureml/job_prep_azureml.log',
 'logs/azureml/job_release_azureml.log',
 'outputs/model.zip']

### Register your model
Next, [register the model](https://docs.microsoft.com/azure/machine-learning/concept-model-management-and-deployment#register-package-and-deploy-models-from-anywhere) from your run. Registering your model assigns it a version and helps you with auditability.

In [36]:
import os

model_name = 'regression'
last_run.register_model(model_name=model_name, model_path=os.path.join(output_dir, model_file))

Model(workspace=Workspace.create(name='gopalv-ws', subscription_id='15ae9cb6-95c1-483d-a0e3-b1a1a3b06324', resource_group='aifxdemo'), name=regression, id=regression:3, version=3, tags={}, properties={})

### Download your model
Next, download this registered model. Notice how we can initialize the `Model` object with the name of the registered model, rather than a path to the file itself.

In [39]:
from azureml.core import Model

target_dir = os.path.join(project_folder, 'model')
#os.makedirs(target_dir, exist_ok=True)

model = Model(workspace=ws, name=model_name)
path = model.download(target_dir=target_dir, exist_ok=True)

## Next steps

Congratulations! You just trained a Mask R-CNN model with PyTorch in Azure Machine Learning. As next steps, consider:
1. Learn more about using PyTorch in Azure Machine Learning service by checking out the [README](./README.md]
2. Try exporting your model to [ONNX](https://docs.microsoft.com/azure/machine-learning/concept-onnx) for accelerated inferencing.