## Intro and refs

This is my first object detection project and I used the YOLOX framework and pretrained model downloaded from [this](https://github.com/Megvii-BaseDetection/YOLOX) gitub repo.  I used Remek Kinas's [excellent notebook](https://www.kaggle.com/remekkinas/yolox-training-pipeline-cots-dataset-lb-0-507) as an inspiration and reference for some parts, though I wrote most of what's here from scratch to encourage learning (anything copied is referenced inline).  Coming into this with no object detection experience there was a lot to learn, and hopefully this notebook might be helpful to someone else in the same boat.  

I used the the YOLOX small model setting with COCO pretrained weights to perform detection, with a (very low) 320 x 320 image size (to fit in my computers 12gb vram).  Later I experimented with 1280 and 2560 resolutions on a remote server with an A6000, which greatly increased performance (the 320 resolution model was almost useless on the LB dataset).  So far I've only done limited hyperparameter tuning to get to 4.81 on the LB datset, but generally overfitting the train and validation set is an issue (given we are only training on 3 videos of data), so # of epochs is important.  

**Submitting**

This notebook makes a lot of extra files in the working directory, which interferes with submitting.  I have a simpler [inference only notebook](https://www.kaggle.com/max237/getting-started-with-yolox-inference-only) for submitting


**Some Terminology:**    

IoU -  Intersection over union, a measure of how close the predicted bounding box overlaps with the actual box 
NMS - non maximum suppression, a technique to filter and dedup prediction boxes that overlap.  Uses IoU to measure the confidence of each box.  
conf -  Confidence level/threshold for the prediction - experimenting with the threshold for this is important.  

## Install and load dependencies  

Install YOLOX and any other dependencies.  Some of this can be skipped if running on a kaggle notebook, I set this up to be used on a remote ssh server for training as well which required different settings.  

Some of these are commented out to allow the notebook to run without internet, and i'm using Remek Kina's [yolox-cots-models](https://www.kaggle.com/remekkinas/yolox-cots-models) dataset.  

In [None]:
# install kaggle api (not necessary if running in a kaggle notebook)
# %pip install --user kaggle

In [None]:
# Download the model repo
# I'm using a preloaded dataset of this instead to avoid redownloading and installing
#! git clone https://github.com/Megvii-BaseDetection/YOLOX -q
    
#! cp -r /kaggle/input/yolox-cots-models/YOLOX/ /kaggle/working/

In [None]:
# Install the model
#%cd YOLOX

# Install yolox  
#!pip install -v -e .

# Reset filepath
#%cd ..

In [None]:
# Load pretrained weights to yolox_s.pth
#! wget https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth

In [None]:
import pandas as pd
import numpy as np
import torch
from torchvision import transforms
import math
import time
import os
import shutil
from skimage import io, transform
import PIL
import cv2
import IPython.display as display
import ast

In [None]:
from shutil import copyfile
from sklearn.model_selection import KFold
import random
from collections import defaultdict
import json

In [None]:
# Add yolox path and load dependencies from their file structure
import sys
sys.path.append("/kaggle/input/yolox-cots-models/YOLOX")
sys.path.append("./pycocotools-2.0.4")

In [None]:
# Unzip and Install pycocotools from a file
# This was necessary for inference using some of the YOLOX modules
! tar -xf ../input/pycocotools/pycocotools-2.0.4.tar

In [None]:
%cd pycocotools-2.0.4/
!python setup.py build_ext --inplace
%cd ..

In [None]:
# Add yolox path and pycocotools paths (from kaggle datasets)
import sys
sys.path.append("../input/yolox-cots-models/YOLOX")
sys.path.append("./pycocotools-2.0.4/")

In [None]:
# Install other YOLOX dependencies from Ramek's dataset
! pip install loguru --no-index --find-links=file:///kaggle/input/yolox-cots-models/yolox-dep/
! pip install thop --no-index --find-links=file:///kaggle/input/yolox-cots-models/yolox-dep/

In [None]:
# Load yolox dependencies for inference later
from yolox.data.data_augment import ValTransform
from yolox.utils import postprocess

## Load and preprocess data

### Format annotations and get cv folds

I create 5 folders with COCO formated data, which makes it easy to feed into the YOLOX training script (which requires this format).  

In [None]:
data_dir = '../input/tensorflow-great-barrier-reef'

In [None]:
df = pd.read_csv(f'{data_dir}/train.csv')
df.head(5)

In [None]:
# Limit to annotated points only (a majority don't have annotations) for training
df_train = df[df['annotations'] != '[]'].copy(deep=True).reset_index(drop=True)
# Convert from string 
df_train['annotations'] = df_train['annotations'].apply(lambda x: ast.literal_eval(x))
df_train.head()

**Get CV folds**

In [None]:
# Generate splits

folds = 5
rands = 10 # set seed
cv_shuffle = True

cv = KFold(n_splits=folds, random_state=rands, shuffle=cv_shuffle)

In [None]:
# Copy files inte new directories for each fold
# YOLOX requires the directories to be named train2017 and val2017

indicies = list(range(len(df_train)))
random.shuffle(indicies) # randomize samples before splitting

# Save a dictionary of indicies for each fold
fold_indicies = defaultdict(dict)

# Generate files and names for each fold
for fold_num, (train_idx, valid_idx) in enumerate(cv.split(indicies)):
    
    # Fold directories (drop the folder to prevent mixups if re-running)
    
    train_path = f'./fold_{fold_num}/train2017'
    valid_path = f'./fold_{fold_num}/val2017'
    
    try: 
        shutil.rmtree(train_path)
    except(FileNotFoundError):
        pass
    finally:
        os.makedirs(train_path)
        
    try: 
        shutil.rmtree(valid_path)
    except(FileNotFoundError):
        pass
    finally:
        os.makedirs(valid_path)
    
    
    # Training data per fold
    for i in train_idx:
        img_num = df_train.iloc[i, 4].split('-')[-1]
        img_id = df_train.iloc[i, 4]
        vid_num = df_train.iloc[i, 0]
        copyfile(f'{data_dir}/train_images/video_{vid_num}/{img_num}.jpg', 
                 f'{train_path}/{img_id}.jpg')
    
    # Validation data per fold
    for i in valid_idx:
        img_num = df_train.iloc[i, 4].split('-')[-1]
        img_id = df_train.iloc[i, 4]
        vid_num = df_train.iloc[i, 0]
        copyfile(f'{data_dir}//train_images/video_{vid_num}/{img_num}.jpg', 
                 f'{valid_path}/{img_id}.jpg') # save with a unique index file name matching the df index

    fold_indicies[fold_num]['train'] = train_idx
    fold_indicies[fold_num]['valid'] = valid_idx
    

In [None]:
fold_indicies.keys()

In [None]:
fold_indicies[0].keys()

### Create coco annotation files

This section is adapted from Remek's notebook: https://www.kaggle.com/remekkinas/yolox-training-pipeline-cots-dataset-lb-0-507  
Which is taken from Awsaf's notebook: https://www.kaggle.com/awsaf49/great-barrier-reef-yolov5-train

In [None]:
def save_annot_json(json_annotation, filename):
    with open(filename, 'w') as f:
        output_json = json.dumps(json_annotation)
        f.write(output_json)

In [None]:
def dataset2coco(df):
    
    annotion_id = 0
    
    annotations_json = {
        "info": [],
        "licenses": [],
        "categories": [],
        "images": [],
        "annotations": []
    }
    
    info = {
        "year": "2021",
        "version": "1",
        "description": "COTS dataset - COCO format",
        "contributor": "",
        "url": "https://kaggle.com",
        "date_created": "2022-01-29T15:01:26+00:00"
    }
    annotations_json["info"].append(info)
    
    lic = {
            "id": 1,
            "url": "",
            "name": "Unknown"
        }
    annotations_json["licenses"].append(lic)

    classes = {"id": 0, "name": "starfish", "supercategory": "none"}

    annotations_json["categories"].append(classes)

    
    for ann_row in df.itertuples():
            
        images = {
            "id": ann_row[0],
            "license": 1,
            "file_name": str(ann_row[5]) + '.jpg', # use the image id with video information
            "height": 720,
            "width": 1280,
            "date_captured": "2021-11-30T15:01:26+00:00"
        }
        
        annotations_json["images"].append(images)
        
        for bbox in ann_row.annotations:
            # some boxes in COTS are outside the image height and width
            if (bbox['x'] + bbox['width'] > 1280):
                b_width = bbox['x'] - 1280 
            if (bbox['y'] + bbox['height'] > 720):
                b_height = bbox['y'] - 720 
                
            image_annotations = {
                "id": annotion_id,
                "image_id": ann_row[0],
                "category_id": 0,
                "bbox": [bbox['x'], bbox['y'], bbox['width'], bbox['height']],
                "area": bbox['width'] * bbox['height'],
                "segmentation": [],
                "iscrowd": 0
            }
            
            annotion_id += 1
            annotations_json["annotations"].append(image_annotations)
        
        
    print(f"Dataset COTS annotation to COCO json format completed! Files: {len(df)}")
    return annotations_json

In [None]:
# Add a training and  to each fold
# Use the indicies from the file moving step, which match the file names
for fold_num, tv_indicies in fold_indicies.items():
    train_annot_json = dataset2coco(df_train.iloc[tv_indicies['train'], :])
    valid_annot_json = dataset2coco(df_train.iloc[tv_indicies['valid'], :])

    if not os.path.exists(f'./fold_{fold_num}/annotations'):
        os.makedirs(f'./fold_{fold_num}/annotations')
    
    save_annot_json(train_annot_json, f'fold_{fold_num}/annotations/train.json')
    save_annot_json(valid_annot_json, f'fold_{fold_num}/annotations/valid.json')

In [None]:
df_train.iloc[fold_indicies[0]['train']].head(20)

## Train the model

To do this using the YOLOX author's provided scripts, we first need to create an Exp config file.  

Tutorial: https://github.com/Megvii-BaseDetection/YOLOX/blob/main/docs/train_custom_data.md  
Train pipeline and args: https://github.com/Megvii-BaseDetection/YOLOX/blob/main/tools/train.py  
Exp defaults: https://github.com/Megvii-BaseDetection/YOLOX/blob/main/yolox/exp/yolox_base.py  

I currently haven't optimized the training process at all (using the author's defaults), and I'm using only 30 epochs and a small 320x320 to fit in VRAM on my local GPU and train quickly.  

**Create the config file**

In [None]:
# Create separate experiment configs for each fold

for i in range(folds):
    print(i)
    exp_string = f"""import os
from yolox.exp import Exp as MyExp

class Exp(MyExp):
    def __init__(self):
        super(Exp, self).__init__()
        self.depth = 0.33 # values for the yolox_s
        self.width = 0.50 # values for the yolox_s
        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

        # Define yourself dataset path
        self.data_dir = "./fold_{i}"
        self.train_ann = "train.json"
        self.val_ann = "valid.json"

        self.num_classes = 1

        self.warmup_epochs = 4
        self.max_epoch = 18
        self.data_num_workers = 8

        self.print_interval = 100
        self.eval_interval = 2

        self.input_size = (2560, 2560)
        self.test_size = (2560, 2560)
"""
    
    exp_file_path = f'./barrier_reef_exp_train_{i}.py'
    with open(exp_file_path, mode='w') as outfile:
        outfile.write(exp_string)

In [None]:
# Keep for inference - exp files for various resolutions

exp_file_path = './barrier_reef_exp.py'

with open(exp_file_path, mode='w') as outfile:
    outfile.write("""import os
from yolox.exp import Exp as MyExp

class Exp320(MyExp):
    def __init__(self):
        super(Exp320, self).__init__()
        self.depth = 0.33 # values for the yolox_s
        self.width = 0.50 # values for the yolox_s
        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

        # Define yourself dataset path
        self.data_dir = "./fold_0"
        self.train_ann = "train.json"
        self.val_ann = "valid.json"

        self.num_classes = 1

        self.warmup_epochs = 4
        self.max_epoch = 15
        self.data_num_workers = 8

        self.print_interval = 40
        self.eval_interval = 1

        self.input_size = (320, 320)
        self.test_size = (320, 320)

class Exp1280(MyExp):
    def __init__(self):
        super(Exp1280, self).__init__()
        self.depth = 0.33 # values for the yolox_s
        self.width = 0.50 # values for the yolox_s
        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

        # Define yourself dataset path
        self.data_dir = "./fold_0"
        self.train_ann = "train.json"
        self.val_ann = "valid.json"

        self.num_classes = 1

        self.warmup_epochs = 4
        self.max_epoch = 15
        self.data_num_workers = 8

        self.print_interval = 40
        self.eval_interval = 1

        self.input_size = (1280, 1280)
        self.test_size = (1280, 1280)
        

class Exp2560(MyExp):
    def __init__(self):
        super(Exp2560, self).__init__()
        self.depth = 0.33 # values for the yolox_s
        self.width = 0.50 # values for the yolox_s
        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

        # Define yourself dataset path
        self.data_dir = "./fold_0"
        self.train_ann = "train.json"
        self.val_ann = "valid.json"

        self.num_classes = 1

        self.warmup_epochs = 4
        self.max_epoch = 15
        self.data_num_workers = 8

        self.print_interval = 40
        self.eval_interval = 1

        self.input_size = (2560, 2560)
        self.test_size = (2560, 2560)
""")

### Run YOLOX training script

I have this commented out for this submission version, but uncommenting the command below would run the training process and save the weights.  

To run inference in the kaggle notebook, i've uploaded my trained weights as data 'best-ckpt_{resolution}'

In [None]:
# Run the training pipeline
# -d is devices
# -b is batch size
# -fp16 is mix precision training
# -o is occupy GPU memory first for training
# -c is a checkpoint file (for loading pretrained weights)

# ! python3 YOLOX/tools/train.py -f barrier_reef_exp_train.py -d 1 -b 8 --fp16 -o -c yolox_s.pth --cache

## Inference and results

### Visualize test predictions

To start, we can usse the YOLOX demo script to run the model with the latest trained weights on some sample images, and then plot the bboxes on top of the images.  The script also helpfully draws the box predictions onto the images and saves them, and i've implemented some code to draw the true boxes to compare.  

Example of how to run the script here: https://github.com/Megvii-BaseDetection/YOLOX/blob/main/docs/quick_run.md  

In [None]:
test_image_path = 'fold_0/val2017/0-9653.jpg'
# model_weights_path = 'YOLOX_outputs/barrier_reef_exp/best_ckpt.pth' # Local
model_weights_path = '../input/yolox-s-trained-weights/best_ckpt_1280.pth' # Kaggle notebook

In [None]:
# Use the demo.py tool to run inference
# Currently not running this

#! python3 YOLOX/tools/demo.py image \
#    -f barrier_reef_exp.py \
#    -c {model_weights_path} \
#    --path {test_image_path} \
#    --conf 0.1 \
#    --nms 0.3 \
#    --tsize 320 \
#    --device gpu \
#    --save_result
    

In [None]:
# Visualize predicted boxes
# Copy the image path output from the previous step

#img_path = './YOLOX_outputs/barrier_reef_exp/vis_res/2022_02_02_00_24_37/0-9653.jpg'
#test_img = PIL.Image.open(img_path)
#display.display(test_img)

In [None]:
# Get the actual ground truth locations (use cv2 to draw in the boxes)

test_img = cv2.imread(test_image_path)
boxes = df_train[df_train['image_id'] == test_image_path.split('/')[-1][:-4]]['annotations'].tolist()[0]

for box in boxes:
    upper_left = (int(box['x']), int(box['y']))
    lower_right = (int(box['x'] + box['width']), int(box['y'] + box['height']))
    color = (255, 0, 0)
    test_img = cv2.rectangle(test_img, upper_left, lower_right, color=color, thickness = 2)

test_img = cv2.cvtColor(test_img, cv2.COLOR_BGR2RGB)
test_img_pil = PIL.Image.fromarray(test_img)
display.display(test_img_pil)

### Run on the test dataset for submission

**Inference in notebook**

Since we are getting images directly from the api and not from a file, we unfortunately can't just use the YOLOX demo tool to get our box predictions.  Instead, we have to adapt pieces of that tool to fit our needs.  I adapt the [inference function from demo.py](https://github.com/Megvii-BaseDetection/YOLOX/blob/main/tools/demo.py#L132) script for this purpose.  

Using this also requires installing some new python libaries, which I do offline from Remek's dataset.  Instructions [here](https://www.kaggle.com/samuelepino/pip-installing-packages-with-no-internet).

In [None]:
from barrier_reef_exp import Exp1280

**Load the model object**

Uses the get_model function from the [experiment base class] (https://github.com/Megvii-BaseDetection/YOLOX/blob/main/yolox/exp/yolox_base.py)

In [None]:
def get_trained_model(experiment, weights):
    
    # Use the experiment built in function to generate the same model we trained with
    model = experiment.get_model()
    
    # Inference on gpu
    model.cuda()
    
    # Turn off training mode so the model so it won't try to calculate loss
    model.eval()
    model.head.training=False
    model.training=False
    
    # Load in the weights from training
    best_weights = torch.load(weights)
    model.load_state_dict(best_weights['model'])
    
    return model

In [None]:
# Extract the boxes from the processed predictions 

def get_boxes(outputs):
    output = outputs[0][0]
    
    if output == None:
        return {'bboxes': [], 'scores': []}
    # move to cpu
    output = output.cpu()
    
    img_info = outputs[1]
    
    bboxes = output[:, 0:4]/img_info['ratio']
    scores = output[:, 4] * output[:, 5]
    
    return {'bboxes': bboxes, 'scores': scores}

In [None]:
# Custom implementation of the inference function from demo.py 
# Takes in an image object instead of a filepath

def inference(img, model, experiment, device):
    
        test_size = experiment.test_size
        confthre = experiment.test_conf
        nmsthre = experiment.nmsthre
    
        img_info = {"id": 0}
        img_info["file_name"] = None

        height, width = img.shape[:2]
        img_info["height"] = height
        img_info["width"] = width
        img_info["raw_img"] = img

        ratio = min(test_size[0] / img.shape[0], test_size[1] / img.shape[1])
        img_info["ratio"] = ratio
        
        preproc = ValTransform(legacy=False)
        
        img, _ = preproc(img, None, test_size)
        img = torch.from_numpy(img).unsqueeze(0)
        img = img.float()
        if device == "gpu":
            img = img.cuda()

        with torch.no_grad():
            t0 = time.time()
            outputs = model(img)

            outputs = postprocess(
                outputs, 1, confthre,
                nmsthre, class_agnostic=True
            )
        return outputs, img_info

In [None]:
# Get box predictions for a single image and given thresholds

def barrier_reef_inference(exp_file, weights, test_image, 
                           conf_threshold=0.1, nms_threshold=0.3,
                           device='gpu'):
    
    # Load the experiment file
    experiment = exp_file
    
    # Set up the model and weights
    model = get_trained_model(experiment, weights)
    
    # Set custom thresholds for inference
    experiment.test_conf = conf_threshold
    experiment.nmsthre = nms_threshold
    
    test_size = experiment.test_size
    
    # Run the image through the model
    outputs = inference(test_image, model, experiment, device)
    
    return get_boxes(outputs)


In [None]:
# Try it out on the test image 

# Load as a array first
test_image = cv2.imread(test_image_path)
print(test_image.shape)
experiment = Exp1280()

box_preds = barrier_reef_inference(experiment, model_weights_path, test_image,
                                   conf_threshold=0.1, nms_threshold=0.3)
print(box_preds)

In [None]:
def visualize_boxes(test_image, boxes, scores):

    for box in boxes:
        
        x0 = int(box[0])
        y0 = int(box[1])
        x1 = int(box[2])
        y1 = int(box[3])

        color = (255, 0, 0)
        test_image = cv2.rectangle(test_image, (x0, y0), (x1, y1), color=color, thickness = 2)
    
    test_image = cv2.cvtColor(test_image, cv2.COLOR_BGR2RGB)
    test_image_pil = PIL.Image.fromarray(test_image)
    
    return test_image_pil

test_image = cv2.imread(test_image_path)
test_image_pil = visualize_boxes(test_image, box_preds['bboxes'], box_preds['scores'])
display.display(test_image_pil)

Compared to the ground truth above, our model is correcty identifying all of the COTS in the test image from the validation set.  

### Predict boxes and calculate F2 metric on the validation data set

To get an idea of the optimal confidence score cutoff, we can calculate the evaluation metric F2 score for different confidence scores.  

This is currently greatly overestimating the true impact vs. LB, since i'm looking at the performance on all of the training data (to make sure we include empty images), but it gives an idea of how score can change for different confidence thresholds.  Regardless, we are only training on 3 vidoes, so calculating F2 score on a validation dataset with randomly selected images from those videos is also going to greatly overstate performance.  An alternative technique would be to use all images from one of the videos as validation data, but that would cut down on training data significantly.  

In [None]:
def IoU(pred_box, label_box):
    
    # Extract prediction boxes
    x0 = pred_box[0]
    y0 = pred_box[1]
    width = pred_box[2]
    height = pred_box[3]
    
    # Extract label boxes
    x0_l = label_box['x']
    y0_l = label_box['y']
    width_l = label_box['width']
    height_l = label_box['height']

    # Get the overlap in width ranges
    left_bound = max(x0, x0_l)
    right_bound = min(x0 + width, x0_l + width_l)
    upper_bound = min(y0, y0_l)
    lower_bound = max(y0 - height, y0_l - height_l)
    #print(left_bound, right_bound, lower_bound,  upper_bound)
    
    # Calculate metrics
    intersection = max(right_bound - left_bound, 0) * max(upper_bound - lower_bound, 0)
    area = width * height
    area_l = width_l * height_l
    union = area + area_l - intersection

    # Calculate IoU
    return intersection/union
    
pred_box = [25, 40, 30, 25]
label_box = {'x': 30, 'y': 30, 'width': 25, 'height': 25}

IoU(pred_box, label_box)


**F2 Score**

Explanation of 'micro averaging': 
https://www.kaggle.com/enforcer007/what-is-micro-averaged-f1-score?scriptVersionId=36576991

Basically using all of the tp/fp/fn together with equal weights to calculate F2.  

In [None]:
def FScore(results_overall, beta=2):
    numerator = ((1+beta**2) * results_overall['tp']) 
    denominator = ((1+beta**2) * results_overall['tp'] + beta * results_overall['fn']
                   + results_overall['fp'])
    
    return numerator/denominator

In [None]:
# Another (real) example IoU calc
pred_box1 = (158, 422, 56, 42)
label_box = {'x': 167, 'y': 425, 'width': 47, 'height': 31}

print(IoU(pred_box1, label_box))

In [None]:
# Get all training images (which include empty images)

df_all = pd.read_csv(f'{data_dir}/train.csv')

df_all['annotations'] = df_all['annotations'].apply(lambda x: ast.literal_eval(x))
df_all.head(5)

In [None]:
# Get image list
image_list_val = []
for val_image_group in os.walk('fold_0/val2017'):
    for counter, val_image in enumerate(val_image_group[2][1:]):
        image_list_val.append(val_image)

In [None]:
# Image list for all images 
image_list_all = []
for val_image_group in list(os.walk('../input/tensorflow-great-barrier-reef/train_images'))[1:]:
    for val_image in val_image_group[2]:
        image_list_all.append(val_image_group[0] + '/' + val_image)

In [None]:
# Iterate through validation images and calculate F2 score
# Use the original images that include examples with no starfish
# Change to batch to improve performance  

from collections import OrderedDict, defaultdict

def get_f2_score(image_list, iou_thresh, conf_threshold, nms_theshold,
                 experiment, model_weights_path, sample_limit=-1):

    # Track total tp, fp, tn
    results_overall = {}
    results_overall['tp'] = 0
    results_overall['fp'] = 0
    results_overall['fn'] = 0

    # Shuffle to be less biased towards early images if limited count
    random.shuffle(image_list)

    for counter, val_image_path in enumerate(image_list):
        #print(f'\n iteration {counter}, {val_image}')

        #val_image_path = f'fold_0/val2017/{val_image}'
        val_image_arr = cv2.imread(val_image_path)

        # Get predicted boxes
        box_preds = barrier_reef_inference(experiment, model_weights_path, val_image_arr,
                                   conf_threshold=conf_threshold, nms_threshold=0.3)

        # Order the predictions by score to allow checking in order
        preds = OrderedDict()
        for i in range(len(box_preds['bboxes'])):
            box = box_preds['bboxes'][i]
            score = box_preds['scores'][i]

            x0 = int(box[0])
            y0 = int(box[1])
            x1 = int(box[2])
            y1 = int(box[3])

            box_width = x1 - x0
            box_height = y1 - y0

            preds[score.item()] = (x0, y0, box_width, box_height)

        # Get the image name from the file path and extract label boxes
        split_path = val_image_path.split('/')
        if '-' in split_path[-1]:
            img_name = split_path[-1][:-4]
            label_boxes = df_train[df_train['image_id'] == img_name]['annotations'].tolist()[0]
        else:
            img_name = split_path[-2][-1] + '-' + split_path[-1][:-4]
            label_boxes = df_all[df_all['image_id'] == img_name]['annotations'].tolist()[0]

        #print('label boxes:', label_boxes)

        # Initialize tp/fp/fn counts
        tp = 0
        fp = len(preds)
        fn = len(label_boxes)

        # Check each prediction for matching labels in score order 
        for score, pred_box in preds.items():
            #print('score, current pred box: ', score, pred_box)
            # Check it against each possible label
            for i, label_box in enumerate(label_boxes):
                #print('current label box:', label_box)    
                # Calculate IoU
                iou = IoU(pred_box, label_box)
                #print('iou: ', iou)
                if iou >= iou_thresh:
                    # Add to true positives
                    tp += 1
                    # Remove from fn and fp 
                    fp -= 1
                    fn -= 1
                    # If a match is found, skip the remaining labels
                    break

        results_overall['tp'] += tp
        results_overall['fp'] += fp
        results_overall['fn'] += fn

        #print('Results (tp, fp, fn): ', tp, fp, fn)
        #print('\n')
        if sample_limit == -1:
            continue
        else:
            if counter >= sample_limit:
                break
        
    return (results_overall, FScore(results_overall))



In [None]:
# Test
iou_thresh = 0.5
conf_threshold = 0.1
nms_threshold = 0.3
experiment = Exp1280()

results_overall, fscore = get_f2_score(image_list_all, iou_thresh, conf_threshold, nms_threshold, 
                                           experiment, model_weights_path, sample_limit=100)

print('Results overall: ', results_overall)
print('F2 Score: ', fscore)

Currently this is only calculated on the validation set, which is wrong.  

In [None]:
# Check multiple confidence thresholds

iou_thresh = 0.5
nms_threshold = 0.3
experiment = Exp1280()

# model_weights_path = 'YOLOX_outputs/barrier_reef_exp/best_ckpt_1280.pth' 
# experiment = Exp1280()

# Increase the sample_limit for more accurate results
for conf_threshold in np.arange(0.05, 0.5, 0.05):
    results_overall, fscore = get_f2_score(image_list_all, iou_thresh, conf_threshold, nms_threshold, 
                                           experiment, model_weights_path, sample_limit=100)

    print('Confidence Threshold: ', conf_threshold)
    print('Results overall: ', results_overall)
    print('F2 Score: ', fscore)
    print('\n')

### Check the output format

In [None]:
for val_image_group in os.walk('fold_0/val2017'):
    for counter, val_image in enumerate(val_image_group[2]):
 
        val_image_path = f'fold_0/val2017/{val_image}'
        val_image_arr = cv2.imread(val_image_path)
        
        experiment = Exp1280()

        outputs = barrier_reef_inference(experiment, model_weights_path, val_image_arr,
                                         conf_threshold=0.1, nms_threshold=0.4)

        bboxes = outputs['bboxes']
        scores = outputs['scores']

        predictions = []

        for i in range(len(bboxes)):
            box = bboxes[i]
            score = scores[i]

            x_min = int(box[0])
            y_min = int(box[1])
            x_max = int(box[2])
            y_max = int(box[3])

            bbox_width = x_max - x_min
            bbox_height = y_max - y_min

            predictions.append('{:.2f} {} {} {} {}'.format(score, x_min, y_min, bbox_width, bbox_height))

        prediction_str = ' '.join(predictions)

        print('Prediction:', prediction_str)
        
        if counter == 10:
            break

### Call the barrier reef api and send predictions

This last section is also adapted from [Ramek's notebook](https://www.kaggle.com/remekkinas/yolox-training-pipeline-cots-dataset-lb-0-507).  

In [None]:
import greatbarrierreef

env = greatbarrierreef.make_env()   # initialize the environment
iter_test = env.iter_test() 

In [None]:
submission_dict = {
    'id': [],
    'prediction_string': [],
}

for (image_np, sample_prediction_df) in iter_test:
 
    experiment = Exp1280()
    
    outputs = barrier_reef_inference(experiment, model_weights_path, image_np[:,:,::-1],
                                     conf_threshold=0.1, nms_threshold=0.4)

    bboxes = outputs['bboxes']
    scores = outputs['scores']

    predictions = []

    for i in range(len(bboxes)):
        box = bboxes[i]
        score = scores[i]

        x_min = int(box[0])
        y_min = int(box[1])
        x_max = int(box[2])
        y_max = int(box[3])

        bbox_width = x_max - x_min
        bbox_height = y_max - y_min

        predictions.append('{:.2f} {} {} {} {}'.format(score, x_min, y_min, bbox_width, bbox_height))

    prediction_str = ' '.join(predictions)
    sample_prediction_df['annotations'] = prediction_str
    env.predict(sample_prediction_df)

    print('Prediction:', prediction_str)

In [None]:
sub_df = pd.read_csv('submission.csv')
sub_df.head()