Copyright (c) Microsoft Corporation.<br>
Licensed under the MIT License.

# 3. Train a Model in the Cloud with the Azure ML Python SDK

In this notebook we will:
- Use Azure ML Python SDK as a tool to train a TensorFlow object detection model on your own data

This notebook trains an SSD-MobileNet V2 model using the TensorFlow (v1) Object Detection API.  Here, we have created a custom dockerfile that sets up the Object Detection API library and dependencies for training the model in the cloud.

Note:  the first time the experiment is run with Azure ML, it could take up to 30 min to finish as it will need to build the TF Object Detection API (one-time process).  Subsequent runs will be faster.


## Prerequisites
- Azure ML Workspace - [Create in Azure Portal](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace?tabs=azure-portal)
- Data uploaded to the default DataStore
- COCO formatted labels file from the Azure ML Labeling Project, downloaded to this folder

## Imports

In [None]:
from azureml.core import Workspace
from azureml.core import Dataset
from azureml.core import Environment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core import ScriptRunConfig
from azureml.core import Experiment, Run
from azureml.core.conda_dependencies import CondaDependencies
from azureml.data.data_reference import DataReference

import os
import sys
from uuid import uuid4
import cv2
import matplotlib.pyplot as plt

from azureml.core import VERSION
print(VERSION)

In [None]:
# User defined (CHANGE AS NEEDED!)
num_classes = 3
num_epochs = 3000
# For training experiment (these are good to experiment with if you do not see good results)
batch_size = 6 # goes up as increase # images usually
learning_rate = 0.0004 # goes down with increased epochs/iterations over data usually

######################################## KEEP REST IN CELL THE SAME ########################################
config_fname = 'project_files/ssdlite_mobilenet_retrained.config'
fine_tune_checkpoint = '"'+'/home/data/ssdlite_mobilenet_v2_coco_2018_05_09/model.ckpt' + '"'
input_path_train = '"' + './outputs/objects_train.record-00000-of-00001' + '"'
label_map_path = '"' + 'objects.pbtxt' + '"'
input_path_eval = '"' + './outputs/objects_val.record-00000-of-00001' + '"'

## Connect to the Azure ML Workspace

This step automatically looks for the `config.json` file base directory. You may download your `config.json` from the Azure Portal Azure ML Workspace resource - in the Overview pane.  Then you may drag and drop the `config.json` from your local file system/machine into the file explorer to the left in JupyterLab .

The first time you run this cell it will ask you to perform interactive log in to Azure in another browser window.

In [None]:
ws = Workspace.from_config()

## Create project directory (continue on here _after_ you have labeled data)

This directory is special in that the contents are uploaded to the Azure ML Compute for training.

In [None]:
if not os.path.exists('project_files'):
    os.makedirs('project_files', exist_ok=True)
else:
    print('project_scripts directory already exists')

Download your COCO formated annotations from the Azure ML Studio labeling project and drag-and-drop them into this new `project_files` folder.

## Create and/or reuse a GPU cluster for training

Here, the GPU cluster is being made.  You may wish to set `max_nodes` to limit the number of compute nodes that get used.  There is a dedicated VM cluster type and another type that uses low priority nodes for when you are not in a hurry and want to save on resources.

In [None]:
# Choose a name for your CPU cluster
gpu_cluster_name = "gpu-cluster"

# Verify that the cluster does not exist already
try:
    gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
                                                           idle_seconds_before_scaledown=2400,
                                                           min_nodes=0,
                                                           max_nodes=4)
    gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)

gpu_cluster.wait_for_completion(show_output=True)

## Define an environment

Here, instead of using an Azure ML base image, we are creating our own.  In fact, we are also managing all of our own dependencies.  The next cell creates a Python Environment for our training experiment with the TensorFlow Object Detection API and other necessary components like a pretrained model.  View the `TFObjectDetectionAPI.dockerfile` to see more on how this is made.  Note:  the first time this experiment is run, this training docker image is built and it could take up to 30min to complete an experiment due to this.

In [None]:
tf_env = Environment("tfobjectdetection")
tf_env.docker.base_image = None
tf_env.docker.base_dockerfile = "./docker/TFObjectDetectionAPI.dockerfile"
tf_env.python.user_managed_dependencies = True

## Labels

Add more labels as needed in the format (1-indexed):

```
item {
  id: <number in sequence>
  name: 'object_name'
}
```


In [None]:
%%writefile project_files/objects.pbtxt
item {
  id: 1
  name: 'mouse'
}

item {
  id: 2
  name: 'keyboard'
}

item {
  id: 3
  name: 'headphones'
}

## Scripts to format data and train model

**Converting from COCO format annotations and images to TFRecord**

This is a utlitiy script that converts COCO format annotations and images to TFRecord format.  We need the annotation file from Azure ML labeling project and the images placed into the defualt Datastore as a Dataset (we should have made this when labeling).

In [None]:
%%writefile project_files/tf_record_utils.py

import hashlib
import io
import json
import os
import contextlib2
import numpy as np
import PIL.Image

from pycocotools import mask
import tensorflow as tf

from object_detection.dataset_tools import tf_record_creation_util
from object_detection.utils import dataset_util
from object_detection.utils import label_map_util


def create_tf_example(image,
                      annotations_list,
                      image_dir,
                      category_index):
    """Converts image and annotations to a tf.Example proto.
    Args:
    image: dict with keys:
      [u'license', u'file_name', u'coco_url', u'height', u'width',
      u'date_captured', u'flickr_url', u'id']
    annotations_list:
      list of dicts with keys:
      [u'segmentation', u'area', u'iscrowd', u'image_id',
      u'bbox', u'category_id', u'id']
      Notice that bounding box coordinates in the official COCO dataset are
      given as [x, y, width, height] tuples using absolute coordinates where
      x, y represent the top-left (0-indexed) corner.  This function converts
      to the format expected by the Tensorflow Object Detection API (which is
      which is [ymin, xmin, ymax, xmax] with coordinates normalized relative
      to image size).
    image_dir: directory containing the image files.
    category_index: a dict containing COCO category information keyed
      by the 'id' field of each category.  See the
      label_map_util.create_category_index function.
    include_masks: Whether to include instance segmentations masks
      (PNG encoded) in the result. default: False.
    Returns:
    example: The converted tf.Example
    num_annotations_skipped: Number of (invalid) annotations that were ignored.
    Raises:
    ValueError: if the image pointed to by data['filename'] is not a valid JPEG
    """
    image_height = 616 # use fixed from data collection notebook for Percept instead of image['height']
    image_width = 816 # use fixed from data collection notebook for Percept instead image['width']
    filename = image['file_name']
    # Remove first part of path because we don't need (name of dataset)
    filename = (os.sep).join(filename.split(os.sep)[1:])
    image_id = image['id']

    full_path = os.path.join(image_dir, filename)
    print(full_path)
    with tf.gfile.GFile(full_path, 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = PIL.Image.open(encoded_jpg_io)
    key = hashlib.sha256(encoded_jpg).hexdigest()

    xmin = []
    xmax = []
    ymin = []
    ymax = []
    is_crowd = []
    category_names = []
    category_ids = []
    area = []
    encoded_mask_png = []
    num_annotations_skipped = 0
    for object_annotations in annotations_list:
        (x, y, width, height) = tuple(object_annotations['bbox'])
        x, width = int(x * image_width), int(width * image_width)
        y, height = int(y * image_height), int(height * image_height)
        if width <= 0 or height <= 0:
            num_annotations_skipped += 1
            continue
        if x + width > image_width or y + height > image_height:
            num_annotations_skipped += 1
            continue
        xmin.append(float(x) / image_width)
        xmax.append(float(x + width) / image_width)
        ymin.append(float(y) / image_height)
        ymax.append(float(y + height) / image_height)
        category_id = int(object_annotations['category_id'])
        category_ids.append(category_id)
        category_names.append(category_index[category_id]['name'].encode('utf8'))
        area.append(object_annotations['area'])

    feature_dict = {
      'image/height':
          dataset_util.int64_feature(image_height),
      'image/width':
          dataset_util.int64_feature(image_width),
      'image/filename':
          dataset_util.bytes_feature(filename.encode('utf8')),
      'image/source_id':
          dataset_util.bytes_feature(str(image_id).encode('utf8')),
      'image/key/sha256':
          dataset_util.bytes_feature(key.encode('utf8')),
      'image/encoded':
          dataset_util.bytes_feature(encoded_jpg),
      'image/format':
          dataset_util.bytes_feature('jpeg'.encode('utf8')),
      'image/object/bbox/xmin':
          dataset_util.float_list_feature(xmin),
      'image/object/bbox/xmax':
          dataset_util.float_list_feature(xmax),
      'image/object/bbox/ymin':
          dataset_util.float_list_feature(ymin),
      'image/object/bbox/ymax':
          dataset_util.float_list_feature(ymax),
      'image/object/class/text':
          dataset_util.bytes_list_feature(category_names),
      'image/object/is_crowd':
          dataset_util.int64_list_feature(is_crowd),
      'image/object/area':
          dataset_util.float_list_feature(area),
    }
    example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
    return key, example, num_annotations_skipped


def create_tf_record_from_coco_annotations(annotations_file,
                                           image_dir,
                                           output_path,
                                           num_shards=10):
    """Loads COCO annotation json files and converts to tf.Record format.
    Args:
    annotations_file: JSON file containing bounding box annotations.
    image_dir: Directory containing the image files.
    output_path: Path to output tf.Record file.
    include_masks: Whether to include instance segmentations masks
      (PNG encoded) in the result. default: False.
    num_shards: number of output file shards.
    """
    with contextlib2.ExitStack() as tf_record_close_stack, \
            tf.gfile.GFile(annotations_file, 'r') as fid:
        output_tfrecords = tf_record_creation_util.open_sharded_output_tfrecords(
            tf_record_close_stack, output_path, num_shards)
        groundtruth_data = json.load(fid)
        images = groundtruth_data['images']
        category_index = label_map_util.create_category_index(
            groundtruth_data['categories'])

        annotations_index = {}
        if 'annotations' in groundtruth_data:
            tf.logging.info(
                'Found groundtruth annotations. Building annotations index.')
            for annotation in groundtruth_data['annotations']:
                image_id = annotation['image_id']
                if image_id not in annotations_index:
                    annotations_index[image_id] = []
                annotations_index[image_id].append(annotation)
        missing_annotation_count = 0
        for image in images:
            image_id = image['id']
            if image_id not in annotations_index:
                missing_annotation_count += 1
                annotations_index[image_id] = []
        tf.logging.info('%d images are missing annotations.',
                        missing_annotation_count)

        total_num_annotations_skipped = 0
        for idx, image in enumerate(images):
            if idx % 100 == 0:
                tf.logging.info('On image %d of %d', idx, len(images))
            annotations_list = annotations_index[image['id']]
            _, tf_example, num_annotations_skipped = create_tf_example(
                image, annotations_list, image_dir, category_index)
            total_num_annotations_skipped += num_annotations_skipped
            shard_idx = idx % num_shards
            output_tfrecords[shard_idx].write(tf_example.SerializeToString())
        tf.logging.info('Finished writing, skipped %d annotations.',
                    total_num_annotations_skipped)


In [None]:
%%writefile project_files/__init__.py
"""
Created to be able to import the util function in our train.py script
"""
from .tf_record_utils import create_tf_record_from_coco_annotations

Prepare model config variables, hyperparameters and file paths for a tensorflow configuration file for SSDLite-Mobilenetv2 model training.

IMPORTANT:  update `num_classes`, `batch_size` and `learning_rate` as needed.

In [None]:
%%writefile $config_fname
model {
  ssd {
    num_classes: $num_classes
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 3
        use_depthwise: true
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v2'
      min_depth: 16
      depth_multiplier: 1.0
      use_depthwise: true
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 3
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: $batch_size
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: $learning_rate
          decay_steps: $decay_steps
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: $fine_tune_checkpoint
  fine_tune_checkpoint_type:  "detection"
  num_steps: $num_epochs
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: $input_path_train
  }
  label_map_path: $label_map_path
}

eval_config: {
  num_examples: 100
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: $input_path_eval
  }
  label_map_path: $label_map_path
  shuffle: false
  num_readers: 1
}

In [None]:
# Update config file paths
with open(config_fname, 'r') as f:
    config_file = f.read()

config_file = config_file.replace('$fine_tune_checkpoint', fine_tune_checkpoint)
config_file = config_file.replace('$label_map_path', label_map_path)
config_file = config_file.replace('$input_path_train', input_path_train)
config_file = config_file.replace('$input_path_eval', input_path_eval)
config_file = config_file.replace('$num_classes', str(num_classes))
config_file = config_file.replace('$batch_size', str(batch_size))
config_file = config_file.replace('$learning_rate', str(learning_rate))
config_file = config_file.replace('$decay_steps', str(int(0.8*num_epochs)))
config_file = config_file.replace('$num_epochs', str(num_epochs))

update_file = open(config_fname, 'w')
update_file.writelines(config_file)
update_file.close()

**Train script**

Write the training script, `train.py`, to the project folder, `project_files` so that it is uploaded to the Azure ML Compute when the run is started.

In [None]:
%%writefile project_files/train.py
"""
Azure ML training script for TF object detection experiment
"""
from PIL import Image
import os
import shutil
import shutil
import glob
import urllib
import tarfile
import urllib.request
from uuid import uuid4
import matplotlib.pyplot as plt
import random
import numpy as np

from azureml.core import Datastore
from azureml.core.run import Run
from azureml.data.data_reference import DataReference

import tensorflow.compat.v1 as tf
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

from tf_record_utils import create_tf_record_from_coco_annotations


# Test greeting
def greeting():
    """Run a test greeting and check on GPU availability"""
    print("Welcome to TF OD container!")
    print(tf.__version__)
    print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
greeting()

def get_args():
    """Parse arguments to script"""
    parser = argparse.ArgumentParser()
    parser.add_argument('--ds-path', type=str,
                        dest='datastore_path', help='The path to the data on the default datastore')
    parser.add_argument('--annot-file', type=str,
                        dest='annot_file', help='The COCO format annotation file from Azure ML labeling')
    parser.add_argument('--num-epochs', type=int, default=1000,
                        dest='num_epochs', help='Number of epochs to train')
    parser.add_argument('--num-classes', type=int, default=1,
                        dest='num_classes', help='Number of classes')
    
    args = parser.parse_args()
    return args

def get_data(datastore_path):
    """Get the data from the default datastore of the workspace
    and return the data folder where it is located on compute"""
    data_folder = os.sep + \
                  os.sep.join(os.getcwd().split(os.sep)[:-2]) + \
                  os.sep + datastore_path
    return data_folder

def prep_data(annot_file, data_folder):
    """Convert coco data format to TFRecord (with image files in the
    default datastore)"""
    # Look at data folder
    print('Data folder is at:', data_folder)
    print('List all files/folders in data folder: ', os.listdir(data_folder))
    
    # What is our current working directory?
    print("Current working directory: {}".format(os.getcwd()))
    print("Contents of directory: ")
    os.system("ls")
    print("Contents of /home: ")
    os.system("ls /home")
    
    # Outputs folder - visible in AzureML Studio
    os.makedirs('./outputs', exist_ok=True)

    train_output_path = os.path.join('./outputs', 'objects_train.record')
    val_output_path = os.path.join('./outputs', 'objects_val.record')

    # Create the TFRecords from the COCO annotations
    create_tf_record_from_coco_annotations(
      annot_file,
      data_folder,
      train_output_path,
      num_shards=1)
    create_tf_record_from_coco_annotations(
      annot_file,
      data_folder,
      val_output_path,
      num_shards=1)

def decompress_model(model_file):
    """Decompress archived model located in the /home/data folder on compute"""
    tar = tarfile.open(model_file)
    tar.extractall()
    tar.close()
    shutil.move('./ssdlite_mobilenet_v2_coco_2018_05_09',
                '/home/data/')

def move_config(config_fname):
    """Move the config file to the TF models repo"""
    shutil.copyfile('ssdlite_mobilenet_retrained.config',
                    config_fname)

    
def train(config_fname, num_epochs):
    """Run the os command to train the model using transfer learning
    and an SSDLite-MobilenetV2 base model"""
    
    
    cmd = "PYTHONPATH=$PYTHONPATH:/home/models/research:/home/models/research/slim \
          python /home/models/research/object_detection/model_main.py \
          --pipeline_config_path={} \
          --model_dir={} \
          --alsologtostderr \
          --num_train_steps={} \
          --num_eval_steps={}".format(config_fname, './outputs', num_epochs, num_epochs)
    
    os.system(cmd)

def export_frozen_graph(config_fname, model_path_prefix):
    """Export frozen graph from checkpoint"""
    
    cmd = "PYTHONPATH=$PYTHONPATH:/home/models/research:/home/models/research/slim \
            python /home/models/research/object_detection/export_inference_graph.py \
            --input_type=image_tensor \
            --pipeline_config_path={} \
            --output_directory=./outputs/ \
            --trained_checkpoint_prefix={}".format(config_fname, model_path_prefix)
    os.system(cmd)
    
def test_single_image(data_folder, num_classes):
    """Run inference with frozen graph on a single image from train dataset
    for sanity check"""
    # Path to the frozen graph:
    path_to_frozen_graph = './outputs/frozen_inference_graph.pb'

    # Path to the label map
    path_to_label_map = './objects.pbtxt'

    # Test image (first image in list)
    image_path = glob.glob(os.path.join(data_folder,'**','*.jpg'), 
                                         recursive=True)[0]

    # Minimum confidence value needed to display the bounding box on the image. In range [0.0, 1.0].
    min_threshold = 0.7

    # Read the frozen graph
    detection_graph = tf.Graph()
    with detection_graph.as_default():
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(path_to_frozen_graph, 'rb') as fid:
            serialized_graph = fid.read()
            od_graph_def.ParseFromString(serialized_graph)
            tf.import_graph_def(od_graph_def, name='')

    label_map = label_map_util.load_labelmap(path_to_label_map)
    categories = label_map_util.convert_label_map_to_categories(label_map,
                                                                max_num_classes=num_classes,
                                                                use_display_name=True)
    category_index = label_map_util.create_category_index(categories)

    # Detection
    with detection_graph.as_default():
        with tf.Session(graph=detection_graph) as sess:
            # Read in the image
            image_np = plt.imread(image_path)

            # Expand dimensions since the model expects images to have shape: [1, None, None, 3] 
            image_np_expanded = np.expand_dims(image_np, axis=0)

            # Extract image tensor
            image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

            # Extract detection boxes
            boxes = detection_graph.get_tensor_by_name('detection_boxes:0')

            # Extract detection scores
            scores = detection_graph.get_tensor_by_name('detection_scores:0')

            # Extract detection classes
            classes = detection_graph.get_tensor_by_name('detection_classes:0')

            # Extract number of detections
            num_detections = detection_graph.get_tensor_by_name('num_detections:0')

            # Actual detection.
            boxes, scores, classes, num_detections = sess.run([boxes, scores, classes, num_detections], feed_dict={image_tensor: image_np_expanded})
            print(f"BOXES (shaped {boxes.shape}):\n{boxes}")
            print(f"SCORES (shaped {scores.shape}):\n{scores}")
            print(f"CLASSES (shaped {classes.shape}):\n{classes}")
            print(f"NDETECTIONS (shaped {num_detections.shape}):\n{num_detections}")

            # Visualization of the results of a detection.
            image_np_copy = image_np.copy()
            vis_util.visualize_boxes_and_labels_on_image_array(
                image_np_copy,
                np.squeeze(boxes),
                np.squeeze(classes).astype(np.int32),
                np.squeeze(scores),
                category_index,
                use_normalized_coordinates=True,
                line_thickness=3,
                min_score_thresh=min_threshold
                )
            plt.imsave('./outputs/test_detections.jpg', image_np_copy)
    

# Steps:
args = get_args()

config_fname = '/home/models/research/object_detection/samples/configs/ssdlite_mobilenet_retrained.config'
model_path_prefix = './outputs/model.ckpt-{}'.format(args.num_epochs)

# Download data
data_folder = get_data(args.datastore_path)
# Convert annotations and data to TFRecord format
prep_data(args.annot_file, data_folder)
# Decompress model from archive file
decompress_model(os.sep + os.path.join('home', 'data', 
                                       'ssdlite_mobilenet_v2_coco_2018_05_09.tar.gz'))
move_config(config_fname)
# Train
train(config_fname, args.num_epochs)
# Convert TF checkpoint to a TF frozen graph for inferencing
export_frozen_graph(config_fname, model_path_prefix)
# Test on a single image from our data for sanity check
test_single_image(data_folder, num_classes=args.num_classes)

## Run experiment

The `ScriptRunConfig` starts the training experiment in the Azure ML Workspace on our GPU cluster defined above as our `compute_target`.  The training script will:
- Convert the COCO annotations and images to TFRecord as our input data
- Train our model with the TensorFlow Object Detection API we built for our compute
- Convert the model checkpoint output into a frozen graph and place it in the `outputs` folder in the Workspace

In the following cell replace `<add-your-coco-annot-filename-here.json>` with the name of the COCO annotation file that you placed in your `project_files` folder.

IMPORTANT:  this cell will run until Azure ML has completed training the model (it is a blocking method).

In [None]:
# Training script args - take a look at training script to see how they are used
script_args = ['--ds-path', 'office_supplies', # may wish to change this name
               '--annot-file', '<add-your-coco-annot-filename-here.json>',
               '--num-epochs', num_epochs,
               '--num-classes', num_classes]

tf_config = ScriptRunConfig(source_directory='project_files',
                                script='train.py',
                                arguments=script_args,
                                compute_target=gpu_cluster,
                                environment=tf_env)

# Create experiment object and give experiment a name - change name as needed
exp = Experiment(ws, 'tf-od-custom-office-supplies')

# Submit experiment.  This is blocking - comment out "wait_for_completion" if you do not wish for this
run_dict = exp.submit(tf_config).wait_for_completion()

Now, go to the Azure Portal and Azure ML Studio to check on the progress of the Run.  You may move on in the notebook once the Run has finished.

https://ml.azure.com

## Register and download model

Run the following cell **when the run above has completed** in the Azure Portal.  The run will not be able to register the model files, otherwise.

Here, we want to register the model based on the experiment and run above, so that we can download the files.  We register the entire `outputs` directory from Azure ML Compute so that we can access multiple files as our model checkpoint consists of several files (registering an entire directory is an option when you register models with Azure ML).

We also download the model files and anything in the `outputs` folder from the compute to the `experiment_outputs` directory, a new directory for the outputs of the Azure ML experiment locally.

In [None]:
# Get the correct run based on what was returned in experiment above
# If you do not have the "run_dict" object, then search directly by run id
# (which you can find in the Azure ML Studio from the Portal)
print(run_dict.id)

run_list = exp.get_runs()
run = None
for run_in_list in run_list:
    if run_in_list.id == run_dict.id:
#     if run_in_list.id == 'a specific run id':
        run = run_in_list
    
print(run.id)

In [None]:
# Register model files from the outputs folder - tags can be any dictionary
model = run.register_model(model_name='tf_object_detection_api_custom',
                           tags={'task': 'object detection',
                                 'dataset': 'Office supplies',
                                 'framework': 'TensorFlow'},
                           model_path='outputs/frozen_inference_graph.pb')
print(model.name, model.id, model.version, sep='\t')

Now, let's download the model files locally (may take some time).  They will appear in a folder called `experiment_outputs` along with the TFRecord dataset files.

In [None]:
# Download model
if not os.path.exists('experiment_outputs'):
    os.makedirs('experiment_outputs', exist_ok=True)
model.download(target_dir='./experiment_outputs', exist_ok=True)

## Next steps

- Convert the model to OpenVINO format
