# Train Detection Model

This notebook provides a basic introduction to submitting a detection model training to AML leveraging the azure_utils and tfod_utils packages in this repo.

Before executing the code please ensure you have followed the setup in the wiki and ensured you have the following:
- AML Workspace
- Blob or fileshare containing images and label version files
- Pretrained model from TF model zoo in the same storage
- Built docker image registered to ACR
- Local conda environment with the requirements and packages installed

In [None]:
import os
import shutil

from azure_utils.azure import load_config
from azure_utils.experiment import AMLExperiment

## 1. Define Run Paramters

Below sets the run paramters including the dockerfile path, base model and datasets. Note that if you datasets are in the same date naming convention you can use the latest keyword to automatically retrieve the latest version.

For more information around the different parameters visit the wiki documentation: "Model-Training.md".

In [None]:
# Run params    
env_config_file = "dev_config.json"

# Train with TensorFlow 1 use docker built from tf_1 - "csaddevamlacr.azurecr.io/tfod_tf1:test"
# Train with TensorFlow 2 use docker built from tf_2 - "csaddevamlacr.azurecr.io/tfod_tf2:test"
docker_image = "add_name_here"

# Train with TF1 use - "train.py"
# Train with TF2 use - "train_tf2.py"
training_script_name = "train_tf2.py"

# Description
desc = "Add description here"
# Experiment name
experiment_name = "pothole"
    
# Training and test data selction
store_name = "test_data"
img_type = "pothole"
train_csv = "latest"
test_csv = "latest"

# Base model Selection
base_model = "faster_rcnn_inception_resnet_v2_1024x1024_coco17_tpu-8"

# Model Params
steps = 1000
eval_conf = 0.5
# If using TF1 use batch_size = 1
batch_size = 1

# Compute Params
cluster_name = "train-dev-2"
vm_type = "STANDARD_NC6"
nodes = 1

## 2. Initialise Experiment Class

Below creates and instance of the experiment class int he Azure utils package, it takes a config file to point to a speciifc AML workspace and the experiment name for usecase grouping

In [None]:
aml_exp = AMLExperiment(experiment_name, config_file=env_config_file)

## 3. Set AML Datastore reference

This package makes use of the old approach of mounting the entire datastore in order to access images, dataset files and base models. Below sets up the defined datastore to mount on execution.

In [None]:
aml_exp.set_datastore(store_name)
aml_exp.set_data_reference()

## 4. Create/Set Compute

Below checks if a compute with the provided name exists in the AML workspace and if not creates based on the spec. It takes arguments for node count and vm type with the base compute set to "STANDARD_NC" in order to provide GPU support.

In [None]:
aml_exp.set_compute(cluster_name, vm_type=vm_type, node_count=nodes)

## 5. Set script params and path

In [None]:
script_params = [
    '--desc', desc,
    '--data_dir', str(aml_exp.data_ref),
    '--image_type', img_type,
    '--train_csv', train_csv,
    '--test_csv', test_csv,
    '--base_model', base_model,
    '--steps', steps,
    '--eval_conf', eval_conf,
    '--batch_size', batch_size]

# Copy train file to /notebooks
shutil.copy(os.path.join(r'..\src\training\scripts', training_script_name), os.path.join('.'))

## 6. Create Run Config

Create run config brings together the compute, script , params and docker image to form a script run configuration.

In [None]:
scripts = os.path.join('.')
aml_exp.set_runconfig(scripts,
                      training_script_name,
                      script_params,
                      docker_image=docker_image)

## 7. Submit

Finally execute the configuration defined above to AML. The execution can then be monitored from the AML studio.

In [None]:
aml_exp.submit_training()