## Train YoloR on COTS dataset (PART 1 - TRAINING) - as easy as possible to help people start with YoloR and develop this notebook
This notebook introduces YOLOR on Kaggle and TensorFlow - Help Protect the Great Barrier Reef competition. It shows how to train custom object detection model (COTS dataset) using YoloR. It could be good starting point for build own custom model based on YoloR detector. Full github repository you can find here - [YOLOR](https://github.com/WongKinYiu/yolor)

Steps covered in this notebook:

* Prepare COTS dataset for YoloR training
* Install YoloR (YoloR, MISH CUDA, pytorch_wavelets)
* Download Pre-Trained Weights for YoloR HUB
* Prepare configuration files (YoloR hyperparameters and dataset)
* Weights and Biases configuration for training logging
* YoloR training
* Run YoloR inference on test images

<div class="alert alert-warning">I found that there is no reference custom model training YoloR notebook on Kaggle. Since we have such an opportunity this is my contribution to this competition. Feel free to use it and enjoy! I really appreciate if you upvote this notebook. Thank you!</div>

<div class="alert alert-success" role="alert">
I introduced YoloX in TensorFlow - Help Protect the Great Barrier Reef competition as well. You can find these notebooks here:      
    <ul>
        <li> <a href="https://www.kaggle.com/remekkinas/yolox-full-training-pipeline-for-cots-dataset">YoloX full training pipeline for COTS dataset</a></li>
        <li> <a href="https://www.kaggle.com/remekkinas/yolox-inference-on-kaggle-for-cots-lb-0-507">YoloX detections submission made on COTS dataset</a></li>
    </ul>
    
</div>

<div align="center"><img width="640" src="https://github.com/WongKinYiu/yolor/raw/main/figure/unifued_network.png"/></div>

<div align="center"><img width="640" src="https://github.com/WongKinYiu/yolor/raw/main/figure/performance.png"/></div>

## 0. IMPORT MODULES

In [None]:
import ast
import glob
import os
import yaml
import torch

import numpy as np
import pandas as pd


from IPython.display import Image, display
from IPython.core.magic import register_line_cell_magic
from shutil import copyfile
from tqdm import tqdm
tqdm.pandas()

In [None]:
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print('Device:', device)
print('Current cuda device:', torch.cuda.current_device())
print('Count of using GPUs:', torch.cuda.device_count())

In [None]:
HOME_DIR = './'
COTS_DATASET_PATH = './cots_dataset/train_images'

## 1. PREPARE DATASET

In [None]:
# I just used spllited dataset by @julian3833 - Reef - A CV strategy: subsequences! 
# https://www.kaggle.com/julian3833/reef-a-cv-strategy-subsequences 

df = pd.read_csv("./reef/train-validation-split/train-0.1.csv")
df.head(3)

In [None]:
def add_path(row):
    return f"{COTS_DATASET_PATH}/video_{row.video_id}/{row.video_frame}.jpg"

def num_boxes(annotations):
    annotations = ast.literal_eval(annotations)
    return len(annotations)

df['path'] = df.apply(lambda row: add_path(row), axis=1)
df['num_bbox'] = df['annotations'].apply(lambda x: num_boxes(x))
print("New path and annotations preprocessing completed")

In [None]:
df = df[df.num_bbox > 0]

print(f'Dataset images with annotations: {len(df)}')

In [None]:
def add_new_path(row):
    if row.is_train:
        return f"{HOME_DIR}/yolor_dataset/images/train/{row.image_id}.jpg"
    else: 
        return f"{HOME_DIR}/yolor_dataset/images/valid/{row.image_id}.jpg"
    

df['new_path'] = df.apply(lambda row: add_new_path(row), axis=1)
print("New image path for train/valid created")

In [None]:
df.head(3)

## 2. CREATE DATASET FILE STRUCTURE

In [None]:
os.makedirs(f"{HOME_DIR}/yolor_dataset/images/train")
os.makedirs(f"{HOME_DIR}/yolor_dataset/images/valid")
os.makedirs(f"{HOME_DIR}/yolor_dataset/labels/train")
os.makedirs(f"{HOME_DIR}/yolor_dataset/labels/valid")
print(f"Directory structure yor YoloR created")

In [None]:
def copy_file(row):
  copyfile(row.path, row.new_path)

_ = df.progress_apply(lambda row: copy_file(row), axis=1)

## 3. CREATE YoloR ANNOTATIONS

In [None]:
IMG_WIDTH, IMG_HEIGHT = 1280, 720

def get_yolo_format_bbox(img_w, img_h, box):
    w = box['width'] 
    h = box['height']
    
    if (bbox['x'] + bbox['width'] > 1280):
        w = 1280 - bbox['x'] 
    if (bbox['y'] + bbox['height'] > 720):
        h = 720 - bbox['y'] 
        
    xc = box['x'] + int(np.round(w/2))
    yc = box['y'] + int(np.round(h/2)) 

    return [xc/img_w, yc/img_h, w/img_w, h/img_h]
    

for index, row in tqdm(df.iterrows()):
    annotations = ast.literal_eval(row.annotations)
    bboxes = []
    for bbox in annotations:
        bbox = get_yolo_format_bbox(IMG_WIDTH, IMG_HEIGHT, bbox)
        bboxes.append(bbox)
        
    if row.is_train:
        file_name = f"{HOME_DIR}/yolor_dataset/labels/train/{row.image_id}.txt"
        os.makedirs(os.path.dirname(file_name), exist_ok=True)
    else:
        file_name = f"{HOME_DIR}/yolor_dataset/labels/valid/{row.image_id}.txt"
        os.makedirs(os.path.dirname(file_name), exist_ok=True)
        
    with open(file_name, 'w') as f:
        for i, bbox in enumerate(bboxes):
            label = 0
            bbox = [label]+bbox
            bbox = [str(i) for i in bbox]
            bbox = ' '.join(bbox)
            f.write(bbox)
            f.write('\n')
                
print("Annotations in YoloR format for all images created.")

## 4. CREATE YoloR DATASET CONFIGURATION FILE

In [None]:
data_yaml = dict(
    train = f'{HOME_DIR}/yolor_dataset/images/train',
    val = f'{HOME_DIR}/yolor_dataset/images/valid',
    nc = 1,
    names = ['sf']
)


with open(f'{HOME_DIR}/YoloR-data.yaml', 'w') as outfile:
    yaml.dump(data_yaml, outfile, default_flow_style=True)

print(f'Dataset configuration file for YoloR created')

## 4. INSTALL YoloR

### 4A. CLONE YoloR GIT REPOSITORY 

In [None]:
!git clone https://github.com/WongKinYiu/yolor

In [None]:
!pip install torchvision --upgrade -q
!pip install wandb --upgrade

In [None]:
%cd yolor
!pip install -qr requirements.txt

### 4B. INSTALL MISH CUDA

In [None]:
%cd ..
!git clone https://github.com/JunnYu/mish-cuda
%cd mish-cuda
!git reset --hard 6f38976064cbcc4782f4212d7c0c5f6dd5e315a8
!python setup.py build install
%cd ..

### 4C. INSTALL PYTORCH WAVELETS 

In [None]:
!git clone https://github.com/fbcotter/pytorch_wavelets
%cd pytorch_wavelets
!pip install .
%cd ..

### 4D. DWONLOAD LATEST CHECKPOINT FROM YoloR MODEL HUB 

In this notebook we take P6 model (because I want to show only how to train YoloR model on Kaggle) but you can experiment with other YoloR models: https://github.com/WongKinYiu/yolor

In [None]:
%cd yolor
!bash scripts/get_pretrain.sh

### 4E. CONFIGURE WEIGHTS AND BIASES FOR EXPERIMENT LOGGING 

In [None]:
# more about Secrets -> https://www.kaggle.com/product-feedback/114053
import wandb
# from kaggle_secrets import UserSecretsClient

# user_secrets = UserSecretsClient()
# wandb_api = user_secrets.get_secret("wandb_api") 
wandb.login(key='your_key')
wandb.login(anonymous='must')

### 4F. CONFIGURE YoloR HYPERPARAMETERS 

In [None]:
%cd yolor

In [None]:
@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))

In [None]:
%%writetemplate ./data/coco.yaml

nc: 1
names: ['starfish',]

In [None]:
%%writetemplate ./data/coco.names

starfish

In [None]:
%%writetemplate ./hyp-yolor.yaml

lr0: 0.01  # initial learning rate (SGD=1E-2, Adam=1E-3)
lrf: 0.2  # final OneCycleLR learning rate (lr0 * lrf)
momentum: 0.937  # SGD momentum/Adam beta1
weight_decay: 0.0005  # optimizer weight decay 5e-4
warmup_epochs: 3.0  # warmup epochs (fractions ok)
warmup_momentum: 0.8  # warmup initial momentum
warmup_bias_lr: 0.1  # warmup initial bias lr
box: 0.05  # box loss gain
cls: 0.5  # cls loss gain
cls_pw: 1.0  # cls BCELoss positive_weight
obj: 1.0  # obj loss gain (scale with pixels)
obj_pw: 1.0  # obj BCELoss positive_weight
iou_t: 0.20  # IoU training threshold
anchor_t: 4.0  # anchor-multiple threshold
# anchors: 3  # anchors per output layer (0 to ignore)
fl_gamma: 0.0  # focal loss gamma (efficientDet default gamma=1.5)
hsv_h: 0.0  # image HSV-Hue augmentation (fraction)
hsv_s: 0.0  # image HSV-Saturation augmentation (fraction)
hsv_v: 0.0  # image HSV-Value augmentation (fraction)
degrees: 0.0  # image rotation (+/- deg)
translate: 0.5  # image translation (+/- fraction)
scale: 0.0  # image scale (+/- gain)
shear: 0.0  # image shear (+/- deg)
perspective: 0.0  # image perspective (+/- fraction), range 0-0.001
flipud: 0.0  # image flip up-down (probability)
fliplr: 0.5  # image flip left-right (probability)
mosaic: 0.95  # image mosaic (probability)
mixup: 0.3  # image mixup (probability)

## 5. TRAIN YoloR

In [None]:
!python train.py \
 --batch-size 16 \
 --img 2560 1024 \
 --data '../YoloR-data.yaml' \
 --cfg './cfg/yolor_p6.cfg' \
 --weights './yolor_p6.pt' \
 --device 1 \
 --name yolor_p6 \
 --hyp './hyp-yolor.yaml' \
 --epochs 300

We got an error - but it is connected with w&b integrations. Looking for solution.

## 6. INFERENCE USING YoloR 

In [None]:
%cd ..

In [None]:
%pwd

In [None]:
INFER_PATH = f"./cots_dataset/infer"
# os.makedirs(INFER_PATH)

df_infer = df.query("~is_train and num_bbox > 4").sample(n = 15)

def copy_file(row):
    new_location = INFER_PATH + '/' + row.image_id + '.jpg'
    copyfile(row.path, new_location)

_ = df_infer.progress_apply(lambda row: copy_file(row), axis=1)

In [None]:
cd yolor

In [None]:
!python detect.py \
    --source {INFER_PATH} \
    --cfg ./cfg/yolor_p6.cfg \
    --weights './runs/train/yolor_p66/weights/best_overall.pt' \
    --conf 0.05 \
    --img-size 1280 \
    --device 1 

In [None]:
for img in glob.glob('./inference/output/*.jpg'): 
    display(Image(filename=img))
    print("\n")