# End-to-end Pipeline for YoloV4 Model Training

In this notebook, we will use a base implementation of the Yolov4 architecture which uses the PyTorch framework. We are using a forked version of this repo (https://github.com/Tianxiaomo/pytorch-YOLOv4) which has been modified for training with custom datasets. We have also made some minor adjustments to the forked repository to accomodate Amazon SageMaker Training.

Before we train our model, we need to create our training container image and ensure our dataset is in the appropriate format for PyTorch Yolov4 models.

## Create Docker training image

These steps are best executed on your local machine or somewhere that has Docker installed.

1. Navigate to this repository (https://github.com/kwwendt/sagemaker-yolov4-e2e-example) & follow the instructions to build, tag, and push the container to Amazon Elastic Container Registry.
2. Once complete, return back to the notebook for the remaining steps.

## Upload dataset to Amazon S3

For this step, we will upload our dataset to Amazon S3 so we can easily load the data into our container during model training.

In this demo, I am leveraging an open-source dataset provided by Roboflow: https://public.roboflow.com/object-detection/chess-full/24

It is free to create an account and they have an easy export option for PyTorch Yolov4 compatible datasets.

Once you have your dataset, we can upload our data to Amazon S3. It is best to separate your training, validation, and testing data into 3 folders in S3.

## Create the estimator

### Set up parameters & imports

In [None]:
import sys

!{sys.executable} -m pip install -q --upgrade sagemaker 

In [None]:
import boto3
import json
from sagemaker.estimator import Estimator
from sagemaker import get_execution_role
from sagemaker.utils import name_from_base
from sagemaker.session import Session

sess = Session()
region = sess.boto_region_name
bucket = sess.default_bucket()
role = get_execution_role()

In [None]:
# Image URI for your docker training image
docker_img_uri = "<enter the output from the build_and_push_container.sh script>"
training_job_name = name_from_base('torch-yolov4-model')

# Location where the trained model resides in S3
model_path = f"s3://{bucket}/{training_job_name}/output/model.tar.gz"

# Input shape and layer name
input_shape = [1,3,608,608]
input_layer_name = 'input0'
data_shape = json.dumps({input_layer_name: input_shape})

# Compiled model path for model compiled with Sagemaker Neo
compiled_model_path = f"s3://{bucket}/{training_job_name}/models/compiled"

### Create & train the estimator

In [None]:
estimator = Estimator(
    image_uri=docker_img_uri,
    role=role,
    instance_type="ml.g4dn.2xlarge",
    volume_size=50,
    instance_count=1,
    max_run = 6 * 60 *60,
    hyperparameters={
        "pretrained": "yolov4.conv.137.pth",
        "classes": 13,
        "train_label": "_annotations.txt", # If your annotations file is named differently, please note the correct name here
        "val_label": "_annotations.txt", # If your annotations file is named differently, please note the correct name here
        "batch": 2,
        "subdivisions": 1,
        "learning_rate": 0.001,
        "gpu": "0",
        "epochs": 5
    }
)

In [None]:
estimator.fit(job_name=training_job_name, inputs={
    "train": f"s3://{bucket}/yolov4_training_data/trainv2/", # The location in S3 where your training data and training annotations are stored
    "val": f"s3://{bucket}/yolov4_training_data/validv2/" # The location in S3 where your validation data and validation annotations are stored
})

## Clone Yolov4 repo

We also need to clone the Yolov4 repo we are using here so we can trace our trained model. Once you clone the repo, we need to move our notebook into the `pytorch-YOLOv4` directory so everything is on the same path.

In [None]:
!git clone https://github.com/roboflow-ai/pytorch-YOLOv4.git

## Prepare trained model

In [None]:
s3_client = boto3.client('s3')

with open('model.tar.gz', 'wb') as data:
    s3_client.download_fileobj(Bucket=bucket, Key=f'{training_job_name}/output/model.tar.gz', Fileobj=data)
weightfile = 'yolov4-trained-model.pth'

In [None]:
!tar -zxvf model.tar.gz

In [None]:
import torch
import models

model = models.Yolov4(n_classes=13)
pretrained_dict = torch.load(weightfile, map_location=torch.device('cpu'))
model.load_state_dict(pretrained_dict)
model.eval()

In [None]:
input1 = torch.zeros(input_shape).float()

trace = torch.jit.trace(model.eval().float(), input1)
trace.save('model.pth')

In [None]:
!tar -czvf traced-yolov4-model.tar.gz model.pth

In [None]:
traced_model_path = sess.upload_data(path='traced-yolov4-model.tar.gz', key_prefix='models/traced')
print(traced_model_path)

## Create the compiled model with SageMaker Neo

In [None]:
# Framework information
framework = 'PYTORCH'
framework_version = '1.7'
compilation_job_name = f"{training_job_name}-5"

sm_client = boto3.client('sagemaker', region_name=region)

In [None]:
sm_client.create_compilation_job(
    CompilationJobName=compilation_job_name,
    RoleArn=role,
    InputConfig={
        'S3Uri': traced_model_path,
        'DataInputConfig': data_shape,
        'Framework': framework,
        'FrameworkVersion': framework_version
    },
    OutputConfig={
        'S3OutputLocation': compiled_model_path,
        'TargetDevice': 'ml_g4dn'
    },
    StoppingCondition={ 'MaxRuntimeInSeconds': 900 }    
)

import time
while True:
    resp = sm_client.describe_compilation_job(CompilationJobName=compilation_job_name)    
    if resp['CompilationJobStatus'] in ['STARTING', 'INPROGRESS']:
        print('Running...')
    else:
        print(resp['CompilationJobStatus'], compilation_job_name)
        break
    time.sleep(10)

Now we need to upload the `inference.py` entry point script so we can use it to create our model endpoint. Create a new folder called `code` and then upload the inference file into the new directory.

In [None]:
!mkdir code

In [None]:
env_vars = {"COMPILEDMODEL": 'True', 'MMS_MAX_RESPONSE_SIZE': '100000000', 'MMS_DEFAULT_RESPONSE_TIMEOUT': '120'}

In [None]:
from sagemaker.pytorch import PyTorchModel

# Create SageMaker model and deploy an endpoint
sm_pytorch_compiled_model = PyTorchModel(
    model_data=f"{compiled_model_path}/traced-yolov4-model-LINUX_X86_64.tar.gz",
    role=role,
    entry_point='inference.py',
    source_dir='code',
    framework_version=framework_version,
    py_version='py3',
    env=env_vars
)

In [None]:
# Replace the example instance_type below to your preferred instance_type
predictor = sm_pytorch_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.g4dn.xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name) 

Now that our endpoint is deployed, you can test with a sample image from your test set.

In [None]:
client = boto3.client('sagemaker-runtime', region_name=region)

content_type = 'application/x-image'

img_name = "IMG_0293_JPG.rf.e208f5cdf5e993c552be7f96e86c4890.jpg" # Add your image here

with open(img_name, "rb") as f:
    payload = f.read()
    payload = bytearray(payload)

response = client.invoke_endpoint(EndpointName=predictor.endpoint_name, Body=payload, ContentType=content_type)
print(response)

In [None]:
!python3 models.py 13 yolov4-trained-model.pth IMG_0293_JPG.rf.e208f5cdf5e993c552be7f96e86c4890.jpg dataset.names

In [None]:
#visualize inference
from IPython.display import Image
Image('predictions.jpg')