# UNDER CONSTRUCTION

This notebook is still a work in progress and may not work as intended. You have been warned!

# Boots 'n' Cats 2c: Modelling with a Custom MXNet Algorithm

In this notebook we'll try another approach to build our boots 'n' cats detector: a YOLOv3 implementation on SageMaker's [MXNet container](https://sagemaker.readthedocs.io/en/stable/using_mxnet.html).

SageMaker supports fully custom containers, but also offers pre-optimized environments for the major ML frameworks TensorFlow, PyTorch, and MXNet; which streamline typical workflows.

The interface mechanisms (channels, endpoints, etc) work the same as for the built-in algorithms, but now we're authoring a Python package loaded by the framework application inside the base container: So need to understand the interfaces through which our code consumes inputs and exposes results and parameters.

**You'll need to** have gone through the first notebook in this series (*Intro and Data Preparation*) to complete this example.

## About the Algorithm: YOLOv3

As discussed with reference to benchmarks on the project [website](https://pjreddie.com/darknet/yolo/) and detailed in the [original paper](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf) (Redmon et al, 2016), YOLO ("You Only Look Once") is another highly successful object detection algorithm alongside SSD as implemented in the SageMaker built-in algorithm.

Both YOLO and SSD are "one-stage detectors", in contrast to previous R-CNN group methods which separately 1) propose and then 2) validate and adjust bounding boxes. Tackling the region proposal and classification/validation problems together gives these architectures significant speed benefits at comparable accuracy.

Whereas SSD creates convolutional "feature maps" at different scales and learns to predict the offset of "anchor boxes" relative to those; YOLO in parallel computes "class probabilities" on subdivided grid squares of the image, and bounding box coordinates for likely objects - before correlating the two together.

A nice comparison of the methods is presented [here](https://lilianweng.github.io/lil-log/2018/12/27/object-detection-part-4.html). Recent advances in YOLO (v2 and v3 releases) have led to it achieving better accuracy than SSD at comparable model sizes / speeds in some benchmarks, as shown in the below graph reproduced from the [GluonCV Model Zoo](https://gluon-cv.mxnet.io/model_zoo/detection.html)

<img src="BlogImages/GluonCVYOLOvsSSD.png"/>

## Step 0: Dependencies and configuration

As usual we'll start by loading libraries, defining configuration, and connecting to the AWS SDKs:

In [None]:
%load_ext autoreload
%autoreload 1

# Built-Ins:
import csv
import os
from collections import defaultdict
import json

# External Dependencies:
import boto3
import imageio
import numpy as np
import sagemaker
from sagemaker.mxnet import MXNet as SageMakerMXNet
from IPython.display import display, HTML

# Local Dependencies:
%aimport util

Next we re-load configuration from the intro & data processing notebook:

In [None]:
%store -r BUCKET_NAME
assert BUCKET_NAME, "BUCKET_NAME missing from IPython store"
%store -r CHECKPOINTS_PREFIX
assert CHECKPOINTS_PREFIX, "CHECKPOINTS_PREFIX missing from IPython store"
%store -r DATA_PREFIX
assert DATA_PREFIX, "DATA_PREFIX missing from IPython store"
%store -r MODELS_PREFIX
assert MODELS_PREFIX, "MODELS_PREFIX missing from IPython store"
%store -r CLASS_NAMES
assert CLASS_NAMES, "CLASS_NAMES missing from IPython store"
%store -r test_image_folder
assert test_image_folder, "test_image_folder missing from IPython store"

%store -r attribute_names
assert attribute_names, "attribute_names missing from IPython store"
%store -r n_samples_training
assert n_samples_training, "n_samples_training missing from IPython store"
%store -r n_samples_validation
assert n_samples_validation, "n_samples_validation missing from IPython store"

Here we just connect to the AWS SDKs we'll use, and validate the choice of S3 bucket:

In [None]:
role = sagemaker.get_execution_role()
session = boto3.session.Session()
region = session.region_name
s3 = session.resource("s3")
bucket = s3.Bucket(BUCKET_NAME)
smclient = session.client("sagemaker")

bucket_region = \
    session.client("s3").head_bucket(Bucket=BUCKET_NAME)["ResponseMetadata"]["HTTPHeaders"]["x-amz-bucket-region"]
assert (
    bucket_region == region
), f"Your S3 bucket {BUCKET_NAME} and this notebook need to be in the same region."

if (region != "us-east-1"):
    print("WARNING: Rekognition Custom Labels functionality is only available in us-east-1 at launch")
    
# Initialise some empty variables we need to exist:
predictor_std = None
predictor_hpo = None

## Step 1: Review our algorithm details

We'll use the GluonCV (a deep learning framework built on MXNet) implementation of YOLOv3, and run it on top of the SageMaker-provided MXNet container.

As detailed in the [SageMaker Python SDK docs](https://sagemaker.readthedocs.io/en/stable/using_mxnet.html), our job is to implement a Python file (or bundle of files with a designated entry point) that:

* When run as a script, performs model training and saves the resultant model artifacts
* When imported as a module, defines functions which the framework server application can call to load the model; perform inference; and deserialize/serialize data from and to the web.

In cases like this one where extra dependencies (or newer versions) are required vs the base container, there are two options:

* Define a custom container, and take on the effort of re-implementing (or inheriting) the framework server application code
* Performing a `pip install` in the code itself, executed when the file is loaded.

The latter option increases the billable execution time in training, and the latency for new container instances to spin-up in deployed endpoint auto-scaling... But for a small number of additional packages these costs can be preferable versus the complexity of fully customizing the container.

Take some time to look through our implementation at the location below in this repository:

In [None]:
entry_point="yolo_train.py"
source_dir="src"

As with built-in algorithms, the choices we make in implementation will have consequences including for example:

* Whether distributed training is supported
* Whether GPU-accelerated instances will provide any performance benefits
* What data formats are supported for training and inference
* How data is loaded into the container at training time

## Step 2: Set up input data channels

**TODO: Notes on how & why this differs from built-in algo**

In [None]:
train_channel = sagemaker.session.s3_input(
    f"s3://{BUCKET_NAME}/{DATA_PREFIX}/train.manifest",
    distribution="FullyReplicated",  # In case we want to try distributed training
    #content_type="application/x-recordio",
    #s3_data_type="ManifestFile",
    #record_wrapping="RecordIO",
    s3_data_type="S3Prefix",
    attribute_names=attribute_names  # In case the manifest contains other junk to ignore (it does!)
)
                                        
validation_channel = sagemaker.session.s3_input(
    f"s3://{BUCKET_NAME}/{DATA_PREFIX}/validation.manifest",
    distribution="FullyReplicated",
    #content_type="application/x-recordio",
    #record_wrapping="RecordIO",
    #s3_data_type="ManifestFile",
    s3_data_type="S3Prefix",
    attribute_names=attribute_names
)

image_channel = sagemaker.session.s3_input(
    f"s3://{BUCKET_NAME}/",
    s3_data_type="S3Prefix"
)

## Step 3: Configure the algorithm

**TODO: Notes on how & why this differs from built-in algo**

In [None]:
estimator = SageMakerMXNet(
    role=role,
    entry_point=entry_point,
    source_dir=source_dir,
    framework_version="1.4.1",
    py_version="py3",
    input_mode="File",
    train_instance_count=1,
    train_instance_type="ml.p3.8xlarge",
    train_max_run=5*60*60,
    train_use_spot_instances=True,
    train_max_wait=5*60*60,
    metric_definitions=[
        {'Name': 'validation:MeanAP', 'Regex': 'Validation: VOCMeanAP=(.*?) ;'},
        {'Name': 'train:MeanAP', 'Regex': 'Train: VOCMeanAP=(.*?) ;'},
    ],
    base_job_name="bootsncats-yolo",
    output_path=f"s3://{BUCKET_NAME}/{MODELS_PREFIX}",
    checkpoint_s3_uri=f"s3://{BUCKET_NAME}/{CHECKPOINTS_PREFIX}",
    hyperparameters={
        "epochs": 10,
        "num-workers": 4,
        "batch-size": 4,
        "num-gpus": 4,
        "data-shape": 300
    }
)

## Step 4: Train the model

As with the built-in algorithms, we have the choice between fitting our model with the given set of hyperparameters or performing automatic hyperparameter tuning:

In [None]:
WITH_HPO = # TODO: True first, then False?

In [None]:
%%time
if (not WITH_HPO):
    estimator.fit(
        {
            "train": train_channel,
            "test": validation_channel,
            "images": image_channel
        },
        logs=True
    )
else:
    hyperparameter_ranges = {
        "learning_rate": sagemaker.tuner.ContinuousParameter(0.0001, 0.1),
        "momentum": sagemaker.tuner.ContinuousParameter(0.0, 0.99),
        "weight_decay": sagemaker.tuner.ContinuousParameter(0.0, 0.99),
        "mini_batch_size": sagemaker.tuner.IntegerParameter(1, n_samples_validation),
        "optimizer": sagemaker.tuner.CategoricalParameter(['sgd', 'adam', 'rmsprop', 'adadelta'])
    }

    tuner = sagemaker.tuner.HyperparameterTuner(
        estimator,
        "validation:mAP",  # Name of the objective metric to optimize
        objective_type="Maximize",  # "Mean Average Precision" high = good
        hyperparameter_ranges=hyperparameter_ranges,
        base_tuning_job_name="bootsncats-ssd-hpo",
        # `max_jobs` obviously has cost implications, but the optimization can always be terminated:
        max_jobs=24,
        max_parallel_jobs=3  # Keep sensible for the configured max_jobs...
    )
    
    tuner.fit(
        {
            "train": train_channel,
            "validation": validation_channel,
            "images": image_channel
        },
        include_cls_metadata=False
    )

## Step 5: While the model(s) are training

**TODO: Training notes** (Go and finish off the other notebooks!)

## Step 6: Deploy the model

In [None]:
%%time
if (WITH_HPO):
    if (predictor_hpo):
        predictor_hpo.delete_endpoint()
    print("Deploying HPO model...")
    predictor_hpo = tuner.deploy(
        initial_instance_count=1,
        instance_type="ml.m4.xlarge"
    )
else:
    if (predictor_std):
        predictor_std.delete_endpoint()
    print("Deploying standard (non-HPO) model...")
    predictor_std = estimator.deploy(
        initial_instance_count=1,
        instance_type="ml.m4.xlarge"
    )

## Step 7: Run inference on test images

**TODO: Notes on confidence threshold**

In [None]:
# Change this if you want something different:
predictor = predictor_hpo if WITH_HPO else predictor_std

# This time confidence is 0-1, not 0-100:
confidence_threshold = 0.2

for test_image in os.listdir(test_image_folder):
    test_image_path = f"{test_image_folder}/{test_image}"
    with open(test_image_path, "rb") as f:
        payload = bytearray(f.read())

    client = boto3.client("sagemaker-runtime")
    response = client.invoke_endpoint(
        EndpointName=predictor.endpoint,
        ContentType='application/x-image',
        Body=payload
    )

    result = response['Body'].read()
    result = json.loads(result)["prediction"]
    # result is a list of [class_ix, confidence, y1, y2, x1, x2] detections.
    display(HTML(f"<h4>{test_image}</h4>"))
    util.visualize_detection(
        test_image_path,
        result,
        CLASS_NAMES,
        thresh=confidence_threshold
    )

## Clean up

Although training instances are ephemeral, the resources we allocated for real-time endpoints need to be cleaned up to avoid ongoing charges.

The code below will delete the *most recently deployed* endpoint for the HPO and non-HPO configurations, but note that if you deployed either more than once, you might end up with extra endpoints.

To be safe, it's best to still check through the SageMaker console for any left-over resources when cleaning up.

In [None]:
if (predictor_hpo):
    print("Deleting HPO-optimized predictor endpoint")
    predictor_hpo.delete_endpoint()
if (predictor_std):
    print("Deleting standard (non-HPO) predictor endpoint")
    predictor_std.delete_endpoint()

## Review TODO

**TODO: Review**