## Intro and refs

This is my first object detection project and I used the YOLOX framework and pretrained model downloaded from [this](https://github.com/Megvii-BaseDetection/YOLOX) gitub repo.  I used Remek Kinas's [excellent notebook](https://www.kaggle.com/remekkinas/yolox-training-pipeline-cots-dataset-lb-0-507) as an inspiration and reference for some parts, though I wrote most of what's here from scratch to encourage learning (anything copied is referenced inline).  Coming into this with no object detection experience there was a lot to learn, and hopefully this notebook might be helpful to someone else in the same boat.  

I used the the YOLOX small model setting with COCO pretrained weights to perform detection, with a (very low) 320 x 320 image size (to fit in my computers 12gb vram).  Later I experimented with 1280 and 2560 resolutions on a remote server with an A6000, which greatly increased performance (the 320 resolution model was almost useless on the LB dataset).  So far I've only done limited hyperparameter tuning to get to 4.81 on the LB datset, but generally overfitting the train and validation set is an issue (given we are only training on 3 videos of data), so # of epochs is important.  

**Full notebook with training**

This version of the notebook is for inference and competition only to avoid cluttering the output directory with extra files created during cv fold generation.  See the [full notebook](https://www.kaggle.com/max237/getting-started-with-yolox-training-and-inference) for the complete pipeline.  


**Some Terminology:**    

IoU - Intersection over union, a measure of how close the predicted bounding box overlaps with the actual box NMS - non maximum suppression, a technique to filter and dedup prediction boxes that overlap. Uses IoU to measure the confidence of each box.
conf - Confidence level/threshold for the prediction - experimenting with the threshold for this is important.

## Install and load dependencies  

Install YOLOX and any other dependencies.  Some of this can be skipped if running on a kaggle notebook, I set this up to be used on a remote ssh server for training as well.  

Some of these are commented out to allow the notebook to run without internet, and i'm using Remek Kinas's [yolox-cots-models](https://www.kaggle.com/remekkinas/yolox-cots-models) dataset.  

In [None]:
# Download the model repo
#! git clone https://github.com/Megvii-BaseDetection/YOLOX -q
    
#! cp -r /kaggle/input/yolox-cots-models/YOLOX/ /kaggle/working/

In [None]:
# Install the model
#%cd YOLOX

# Install yolox  
#!pip install -v -e .

# Reset filepath
#%cd ..

In [None]:
# Load pretrained weights to yolox_s.pth
#! wget https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth

In [None]:
import pandas as pd
import numpy as np
import torch
from torchvision import transforms
import math
import time
import os
import shutil
from skimage import io, transform
import PIL
import cv2
import IPython.display as display
import ast

In [None]:
from shutil import copyfile
from sklearn.model_selection import KFold
import random
from collections import defaultdict
import json

In [None]:
# Add yolox path and load dependencies from their file structure
import sys
sys.path.append("/kaggle/input/yolox-cots-models/YOLOX")
sys.path.append("./pycocotools-2.0.4")

In [None]:
# Unzip and Install pycocotools from a file
# This was necessary for inference using some of the YOLOX modules
! tar -xf ../input/pycocotools/pycocotools-2.0.4.tar

In [None]:
%cd pycocotools-2.0.4/
!python setup.py build_ext --inplace
%cd ..

In [None]:
# Add yolox path and pycocotools paths
import sys
sys.path.append("../input/yolox-cots-models/YOLOX")
sys.path.append("./pycocotools-2.0.4/")

In [None]:
# Install other YOLOX dependencies from Ramek's dataset
! pip install loguru --no-index --find-links=file:///kaggle/input/yolox-cots-models/yolox-dep/
! pip install thop --no-index --find-links=file:///kaggle/input/yolox-cots-models/yolox-dep/

In [None]:
# Load yolox dependencies for inference later
from yolox.data.data_augment import ValTransform
from yolox.utils import postprocess

## Load and preprocess data

### Format annotations and get cv folds

I create 5 folders with COCO formated data, which makes it easy to feed into the YOLOX training script (which requires this format).  

In [None]:
data_dir = '../input/tensorflow-great-barrier-reef'

In [None]:
df = pd.read_csv(f'{data_dir}/train.csv')
df.head(5)

In [None]:
# Limit to annotated points only (a majority don't have annotations)
df_train = df[df['annotations'] != '[]'].copy(deep=True).reset_index(drop=True)
# Convert from string 
df_train['annotations'] = df_train['annotations'].apply(lambda x: ast.literal_eval(x))
df_train.head()

## Create experiment config

This matches the config used in training and allows us to load the model in preparation for adding the weights from training.  I upload my training weights from the [full notebook](https://www.kaggle.com/max237/getting-started-with-yolox-training-and-inference) as a dataset to use for inference.  

In [None]:
# Keep for inference

exp_file_path = './barrier_reef_exp.py'

with open(exp_file_path, mode='w') as outfile:
    outfile.write("""import os
from yolox.exp import Exp as MyExp

class Exp320(MyExp):
    def __init__(self):
        super(Exp320, self).__init__()
        self.depth = 0.33 # values for the yolox_s
        self.width = 0.50 # values for the yolox_s
        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

        # Define yourself dataset path
        self.data_dir = "./fold_0"
        self.train_ann = "train.json"
        self.val_ann = "valid.json"

        self.num_classes = 1

        self.warmup_epochs = 4
        self.max_epoch = 15
        self.data_num_workers = 8

        self.print_interval = 40
        self.eval_interval = 1

        self.input_size = (320, 320)
        self.test_size = (320, 320)

class Exp1280(MyExp):
    def __init__(self):
        super(Exp1280, self).__init__()
        self.depth = 0.33 # values for the yolox_s
        self.width = 0.50 # values for the yolox_s
        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

        # Define yourself dataset path
        self.data_dir = "./fold_0"
        self.train_ann = "train.json"
        self.val_ann = "valid.json"

        self.num_classes = 1

        self.warmup_epochs = 4
        self.max_epoch = 15
        self.data_num_workers = 8

        self.print_interval = 40
        self.eval_interval = 1

        self.input_size = (1280, 1280)
        self.test_size = (1280, 1280)
        
class Exp2560(MyExp):
    def __init__(self):
        super(Exp2560, self).__init__()
        self.depth = 0.33 # values for the yolox_s
        self.width = 0.50 # values for the yolox_s
        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

        # Define yourself dataset path
        self.data_dir = "./fold_0"
        self.train_ann = "train.json"
        self.val_ann = "valid.json"

        self.num_classes = 1

        self.warmup_epochs = 4
        self.max_epoch = 15
        self.data_num_workers = 8

        self.print_interval = 40
        self.eval_interval = 1

        self.input_size = (2560, 2560)
        self.test_size = (2560, 2560)
""")

## Inference and results

### Visualize test predictions

To start, we can usse the YOLOX demo script to run the model with the latest trained weights on some sample images, and then plot the bboxes on top of the images.  The script also helpfully draws the box predictions onto the images and saves them, and i've implemented some code to draw the true boxes to compare.  

Example of how to run the script here: https://github.com/Megvii-BaseDetection/YOLOX/blob/main/docs/quick_run.md  

In [None]:
from barrier_reef_exp import Exp320, Exp1280, Exp2560
experiment = Exp2560()
test_image_path = '../input/tensorflow-great-barrier-reef/train_images/video_0/9653.jpg'
# model_weights_path = 'YOLOX_outputs/barrier_reef_exp/best_ckpt.pth' # Local
model_weights_path = '../input/yolox-s-trained-weights/best_ckpt_2560_15.pth' # kaggle dataset

In [None]:
# Use the demo.py tool to run inference
# Currently not running this

#! python3 YOLOX/tools/demo.py image \
#    -f barrier_reef_exp.py \
#    -c {model_weights_path} \
#    --path {test_image_path} \
#    --conf 0.1 \
#    --nms 0.3 \
#    --tsize 320 \
#    --device gpu \
#    --save_result
    

In [None]:
# Visualize predicted boxes
# Copy the image path output from the previous step

#img_path = './YOLOX_outputs/barrier_reef_exp/vis_res/2022_02_02_00_24_37/0-9653.jpg'
#test_img = PIL.Image.open(img_path)
#display.display(test_img)

In [None]:
# Get the actual ground truth locations (use cv2 to draw in the boxes)

test_img = cv2.imread(test_image_path)
boxes = df_train[df_train['image_id'] == '0-' + test_image_path.split('/')[-1][:-4]]['annotations'].tolist()[0]

for box in boxes:
    upper_left = (int(box['x']), int(box['y']))
    lower_right = (int(box['x'] + box['width']), int(box['y'] + box['height']))
    color = (255, 0, 0)
    test_img = cv2.rectangle(test_img, upper_left, lower_right, color=color, thickness = 2)

test_img = cv2.cvtColor(test_img, cv2.COLOR_BGR2RGB)
test_img_pil = PIL.Image.fromarray(test_img)
display.display(test_img_pil)

### Run on the test dataset for submission

**Inference in notebook**

Since we are getting images directly from the api and not from a file, we unfortunately can't just use the YOLOX demo tool to get our box predictions.  Instead, we have to adapt pieces of that tool to fit our needs.  I adapt the [inference function from demo.py](https://github.com/Megvii-BaseDetection/YOLOX/blob/main/tools/demo.py#L132) script for this purpose.  

Using this also requires installing some new python libaries, which I do offline from Remek's dataset.  Instructions [here](https://www.kaggle.com/samuelepino/pip-installing-packages-with-no-internet).

**Load the model object**

Uses the get_model function from the [experiment base class] (https://github.com/Megvii-BaseDetection/YOLOX/blob/main/yolox/exp/yolox_base.py)

In [None]:
def get_trained_model(experiment, weights):
    
    # Use the experiment built in function to generate the same model we trained with
    model = experiment.get_model()
    
    # Inference on gpu
    model.cuda()
    
    # Turn off training mode so the model so it won't try to calculate loss
    model.eval()
    model.head.training=False
    model.training=False
    
    # Load in the weights from training
    best_weights = torch.load(weights)
    model.load_state_dict(best_weights['model'])
    
    return model

In [None]:
# Extract the boxes from the processed predictions 

def get_boxes(outputs):
    output = outputs[0][0]
    
    if output == None:
        return {'bboxes': [], 'scores': []}
    # move to cpu
    output = output.cpu()
    
    img_info = outputs[1]
    
    bboxes = output[:, 0:4]/img_info['ratio']
    scores = output[:, 4] * output[:, 5]
    
    return {'bboxes': bboxes, 'scores': scores}

In [None]:
# Custom implementation of the inference function from demo.py 
# Takes in an image object instead of a filepath

def inference(img, model, experiment, device):
    
        test_size = experiment.test_size
        confthre = experiment.test_conf
        nmsthre = experiment.nmsthre
    
        img_info = {"id": 0}
        img_info["file_name"] = None

        height, width = img.shape[:2]
        img_info["height"] = height
        img_info["width"] = width
        img_info["raw_img"] = img

        ratio = min(test_size[0] / img.shape[0], test_size[1] / img.shape[1])
        img_info["ratio"] = ratio
        
        preproc = ValTransform(legacy=False)
        
        img, _ = preproc(img, None, test_size)
        img = torch.from_numpy(img).unsqueeze(0)
        img = img.float()
        if device == "gpu":
            img = img.cuda()

        with torch.no_grad():
            t0 = time.time()
            outputs = model(img)

            outputs = postprocess(
                outputs, 1, confthre,
                nmsthre, class_agnostic=True
            )
        return outputs, img_info

In [None]:
# Get box predictions for a single image and given thresholds

def barrier_reef_inference(exp_file, weights, test_image, 
                           conf_threshold=0.1, nms_threshold=0.3,
                           device='gpu'):
    
    # Load the experiment file
    experiment = exp_file
    
    # Set up the model and weights
    model = get_trained_model(experiment, weights)
    
    # Set custom thresholds for inference
    experiment.test_conf = conf_threshold
    experiment.nmsthre = nms_threshold
    
    test_size = experiment.test_size
    
    # Run the image through the model
    outputs = inference(test_image, model, experiment, device)
    
    return get_boxes(outputs)


In [None]:
# Try it out on the test image 

# Load as a array first
test_image = cv2.imread(test_image_path)
print(test_image.shape)

box_preds = barrier_reef_inference(experiment, model_weights_path, test_image,
                                   conf_threshold=0.1, nms_threshold=0.3)
print(box_preds)

In [None]:
# Sample submission string for this example
test_image = cv2.imread(test_image_path)

outputs = barrier_reef_inference(experiment, model_weights_path, test_image,
                                 conf_threshold=0.1, nms_threshold=0.3)

bboxes = outputs['bboxes']
scores = outputs['scores']

predictions = []

for i in range(len(bboxes)):
    box = bboxes[i]
    score = scores[i]

    x_min = int(box[0])
    y_min = int(box[1])
    x_max = int(box[2])
    y_max = int(box[3])

    bbox_width = x_max - x_min
    bbox_height = y_max - y_min

    predictions.append('{:.2f} {} {} {} {}'.format(score, x_min, y_min, bbox_width, bbox_height))

prediction_str = ' '.join(predictions)

print('Prediction:', prediction_str)


In [None]:
def visualize_boxes(test_image, boxes, scores):
    
    test_image_boxed = test_image

    for box in boxes:
        
        x0 = int(box[0])
        y0 = int(box[1])
        x1 = int(box[2])
        y1 = int(box[3])

        color = (255, 0, 0)
        test_image_boxed = cv2.rectangle(test_image_boxed, (x0, y0), (x1, y1), color=color, thickness = 2)
    
    test_image_recolored = cv2.cvtColor(test_image_boxed, cv2.COLOR_BGR2RGB)
    test_image_pil = PIL.Image.fromarray(test_image_recolored)
    
    return test_image_pil

test_image = cv2.imread(test_image_path)
test_image_pil = visualize_boxes(test_image, box_preds['bboxes'], box_preds['scores'])
display.display(test_image_pil)

The boxes from our custom inference implementation match that of the YOLOX demo tool, which means we are good to go and inference is working as expected.  

### Call the barrier reef api and send predictions

This last section is also adapted from [Ramek's notebook](https://www.kaggle.com/remekkinas/yolox-training-pipeline-cots-dataset-lb-0-507).  

In [None]:
import greatbarrierreef

env = greatbarrierreef.make_env()  # initialize the environment
iter_test = env.iter_test() 

In [None]:
# Get predicted boxes for each image returned by the api

for (image_np, sample_prediction_df) in iter_test:
    
    
    outputs = barrier_reef_inference(experiment, model_weights_path, image_np[:,:,::-1],
                                     conf_threshold=0.15, nms_threshold=0.3)

    bboxes = outputs['bboxes']
    scores = outputs['scores']

    predictions = []

    for i in range(len(bboxes)):
        box = bboxes[i]
        score = scores[i]

        x_min = int(box[0])
        y_min = int(box[1])
        x_max = int(box[2])
        y_max = int(box[3])

        bbox_width = x_max - x_min
        bbox_height = y_max - y_min

        predictions.append('{:.2f} {} {} {} {}'.format(score, x_min, y_min, bbox_width, bbox_height))

    prediction_str = ' '.join(predictions)
    sample_prediction_df['annotations'] = prediction_str
    env.predict(sample_prediction_df)

    print('Prediction:', prediction_str)

In [None]:
# Check the dataframe output from the previous cell (this is what's used for submission)
sub_df = pd.read_csv('submission.csv')
sub_df.head()

In [None]:
# Test out image color transforms
test_image_recolored = cv2.cvtColor(image_np[:,:,::-1], cv2.COLOR_BGR2RGB)
test_image_pil = PIL.Image.fromarray(test_image_recolored)
display.display(test_image_pil)