![](https://i2.wp.com/sinicropispine.com/wp-content/uploads/2015/07/38897355_l.jpg?w=800&ssl=1)

<p style='text-align: center;'><span style="color: #000508; font-family: Segoe UI; font-size: 2.6em; font-weight: 300;">EfficientDet PyTorch Pipeline with CutMix + Mixup + KFold + Cosine Annealing</span></p>

<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Overview</span>

<p style='text-align: justify;'><span style="font-family: Segoe UI; font-size: 1.2em;">This notebook covers a PyTorch EfficientDet training and validation pipeline with novel augmentation, regularization and validation techniques such as Mixup and CutMix Augmentations, Cosine Annealing LR Scheduling and more...</span></p>

<p style='text-align: justify;'><span style="font-family: Segoe UI; font-size: 1.2em;">I hope this helps in saving some time with some techniques for a good score. This notebook uses the latest effdet version - 0.2.3, by @rwightman with the new code changes, improvements and updates. Most kernals I found was still using the older effdet versions.</span></p>

<p style='text-align: justify;'><span style="color: #000508; font-family: Segoe UI; font-size: 1.4em; font-weight: 300;">The dataset used in this notebook for the offline installations of the latest EffDet package  and its dependancies along with the WBF fused annotation csv file, is registered as a public dataset.</span></p>



**EffDet 0.2.3 Latest + VinBigData WBF Fused**

DATASET LINK - https://www.kaggle.com/sreevishnudamodaran/effdet-latestvinbigdata-wbf-fused
 


### References:

**Thanks to @rwightman for the awesome EfficientDet implementation. Do check it out https://github.com/rwightman/efficientdet-pytorch**

**Thanks to @shonenkov, ultralytics (https://github.com/ultralytics/yolov5) and @nvnnghia for the base PyTorch pipeline, Mixup and CutMix implementations from which this notebook is adapted.**

**Thanks to the original authors of the paper.** 

CutMix Paper: https://arxiv.org/abs/1905.04899

MixUp Paper: https://arxiv.org/pdf/1710.09412.pdf

**Please do check out their work**




[![Ask Me Anything !](https://img.shields.io/badge/Ask%20me-anything-1abc9c.svg?style=flat-square&logo=kaggle)](https://www.kaggle.com/sreevishnudamodaran)



![Upvote!](https://img.shields.io/badge/Upvote-If%20you%20like%20my%20work-07b3c8?style=for-the-badge&logo=kaggle)



<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Environment Setup</span>

Package installations for pytorch 1.7.0 which is the current pytorch version in Kaggle kernals as of 21st Jan 2021.

For efficientdet-pytorch, there is a conflict/bug with Numpy 1.18+ and pycocotools 2.0, force install numpy <= 1.17.5 or ensure you install pycocotools >= 2.0.2. Installing pycocotools >= 2.0.2 here.

 <span style="color: #000508; font-family: Segoe UI; font-size: 2.0em; font-weight: 300;">Online installations</span>

In [None]:
# !pip install pycocotools>=2.0.2
# !pip install timm>=0.3.2
# !pip install omegaconf>=2.0
# !pip install ensemble-boxes

 <span style="color: #000508; font-family: Segoe UI; font-size: 2.0em; font-weight: 300;">Offline installations</span>

In [None]:
## Install omegaconf - dependancy for effdet
!pip install --no-deps ../input/effdet-latestvinbigdata-wbf-fused/omegaconf-2.0.6-py3-none-any.whl

## Install timm & ensemble_boxes
!pip install --no-deps ../input/effdet-latestvinbigdata-wbf-fused/timm-0.3.4-py3-none-any.whl
!pip install --no-deps ../input/effdet-latestvinbigdata-wbf-fused/ensemble_boxes-1.0.4-py3-none-any.whl

!cp -r ../input/effdet-latestvinbigdata-wbf-fused/pycocotools-2.0.2/ .
!cd ./pycocotools-2.0.2 && python setup.py install
!rm -r pycocotools-2.0.2

!cp -r ../input/effdet-latestvinbigdata-wbf-fused/efficientdet-pytorch/ .
!cd ./efficientdet-pytorch && python setup.py install
!rm -r efficientdet-pytorch

<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Update Environment & Release GPU Memory</span>

Please note that the kernal will stop execution here when using 'Run All'. This is because of the exit command which is used to load refresh and reload the packages installed in the previous steps. So run up till this cell manually then, use 'Run After' with the next cell selected to run the rest in one go.


This applies to fresh environments or if packages in the environment got reset.

<p style='text-align: justify;'><span style="color: #001b2e; font-family: Segoe UI; font-size: 1.2em;">The below cell also helps in clearing GPU memory on crashes and error. When facing OOM error, just run the below cell manually and then run the rest in one go using 'Run After'.</span></p>



In [None]:
# Release GPU Memory.
from numba import cuda as CU
try:
    device = CU.get_current_device()
    device.reset()
except Exception as E:
    print("GPU not enabled. Nothing to clear and good to go.")


## Restart session to detect installed libraries
!pip list | grep effdet
import os
os._exit(00)
## Check PyTorch Version

<span style="color: #000508; font-family: Segoe UI; font-size: 2.0em; font-weight: 300;">Import Packages</span>

In [None]:
import sys
import torch
import os
from datetime import datetime
import time
import random
import cv2
import pandas as pd
import numpy as np
from tqdm.auto import tqdm
from glob import glob
import warnings
from collections import Counter

from ensemble_boxes import weighted_boxes_fusion
import albumentations as A
from albumentations.pytorch.transforms import ToTensorV2
import matplotlib.pyplot as plt

from sklearn.model_selection import StratifiedKFold
from torch.utils.data import Dataset,DataLoader
from torch.utils.data.sampler import SequentialSampler, RandomSampler
from torch.utils.data.dataloader import default_collate

from effdet import get_efficientdet_config, EfficientDet, DetBenchTrain
from effdet.efficientdet import HeadNet
from effdet import create_model, unwrap_bench, create_loader, create_dataset, create_evaluator, create_model_from_config
from effdet.data import resolve_input_config, SkipSubset
from effdet.anchors import Anchors, AnchorLabeler
from timm.models import resume_checkpoint, load_checkpoint
from timm.utils import *
from timm.optim import create_optimizer
from timm.scheduler import create_scheduler

<span style="color: #000508; font-family: Segoe UI; font-size: 2.0em; font-weight: 300;">Seed Everything</span>


In [None]:
SEED = 42

def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

seed_everything(SEED)

<span style="color: #000508; font-family: Segoe UI; font-size: 2.0em; font-weight: 300;">Load Data</span>

In [None]:
df_annotations = pd.read_csv('../input/vinbigdata-chest-xray-abnormalities-detection/train.csv')
df_annotations = df_annotations[df_annotations['class_id']!=14].reset_index(drop=True)
df_annotations['image_path'] = df_annotations['image_id'].map(lambda x:os.path.join('../input/vinbigdata-original-image-dataset/vinbigdata/train',
                                                                                    str(x)+'.jpg'))
df_annotations.sample(5)

In [None]:
image_paths = df_annotations['image_path'].unique()
print("Number of Images with abnormalities:",len(image_paths))
anno_count = df_annotations.shape[0]
print("Number of Annotations with abnormalities:", anno_count)

<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Loading Fused Annotations From Dataset</span>

I have covered the bounding box fusion techniques in another notebook.

https://www.kaggle.com/sreevishnudamodaran/vinbigdata-fusing-bboxes-coco-dataset

DATASET USED: **EffDet 0.2.3 Latest + VinBigData WBF Fused**

https://www.kaggle.com/sreevishnudamodaran/effdet-latestvinbigdata-wbf-fused

Reading the fused bbox annotations directly from the registered dataset. Please note that this dataset only has annotations of the images with bboxes. Along with the annotations, I have also included the latest EfficientDet-Pytorch and it's dependancies for offline installations.

In [None]:
df_annotations_wbf = pd.read_csv('../input/effdet-latestvinbigdata-wbf-fused/train_wbf_original.csv', index_col='Unnamed: 0')

#For Testing
# df_annotations_wbf = df_annotations_wbf.head(18000)

df_annotations_wbf.sample(5)

<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Visualize Original vs Fused</span>

<span style="color: #000508; font-family: Segoe UI; font-size: 2.0em; font-weight: 300;">Helper Funtions</span>

In [None]:
label2color = [[59, 238, 119], [222, 21, 229], [94, 49, 164], [206, 221, 133], [117, 75, 3],
                 [210, 224, 119], [211, 176, 166], [63, 7, 197], [102, 65, 77], [194, 134, 175],
                 [209, 219, 50], [255, 44, 47], [89, 125, 149], [110, 27, 100]]

viz_labels =  ["Aortic_enlargement", "Atelectasis", "Calcification", "Cardiomegaly",
            "Consolidation", "ILD", "Infiltration", "Lung_Opacity", "Nodule/Mass",
            "Other_lesion", "Pleural_effusion", "Pleural_thickening", "Pneumothorax",
            "Pulmonary_fibrosis"]

def plot_img(img, size=(18, 18), is_rgb=True, title="", cmap=None):
    plt.figure(figsize=size)
    plt.imshow(img, cmap=cmap)
    plt.suptitle(title)
    plt.show()

def plot_imgs(imgs, cols=2, size=10, is_rgb=True, title="", cmap=None, img_size=None):
    rows = len(imgs)//cols + 1
    fig = plt.figure(figsize=(cols*size, rows*size))
    for i, img in enumerate(imgs):
        if img_size is not None:
            img = cv2.resize(img, img_size)
        fig.add_subplot(rows, cols, i+1)
        plt.imshow(img, cmap=cmap)
    plt.suptitle(title)
    return fig
    
def draw_bbox(image, box, label, color):   
    alpha = 0.4
    alpha_font = 0.6
    thickness = 4
    font_size = 2.0
    font_weight = 2
    overlay_bbox = image.copy()
    overlay_text = image.copy()
    output = image.copy()

    text_width, text_height = cv2.getTextSize(label.upper(), cv2.FONT_HERSHEY_SIMPLEX, font_size, font_weight)[0]
    cv2.rectangle(overlay_bbox, (box[0], box[1]), (box[2], box[3]),
                color, -1)
    cv2.addWeighted(overlay_bbox, alpha, output, 1 - alpha, 0, output)
    cv2.rectangle(overlay_text, (box[0], box[1]-18-text_height), (box[0]+text_width+8, box[1]),
                (0, 0, 0), -1)
    cv2.addWeighted(overlay_text, alpha_font, output, 1 - alpha_font, 0, output)
    cv2.rectangle(output, (box[0], box[1]), (box[2], box[3]),
                    color, thickness)
    cv2.putText(output, label.upper(), (box[0], box[1]-12),
            cv2.FONT_HERSHEY_SIMPLEX, font_size, (255, 255, 255), font_weight, cv2.LINE_AA)
    return output

def draw_bbox_small(image, box, label, color):   
    alpha = 0.4
    alpha_text = 0.4
    thickness = 1
    font_size = 0.4
    overlay_bbox = image.copy()
    overlay_text = image.copy()
    output = image.copy()

    text_width, text_height = cv2.getTextSize(label.upper(), cv2.FONT_HERSHEY_SIMPLEX, font_size, thickness)[0]
    cv2.rectangle(overlay_bbox, (box[0], box[1]), (box[2], box[3]),
                color, -1)
    cv2.addWeighted(overlay_bbox, alpha, output, 1 - alpha, 0, output)
    cv2.rectangle(overlay_text, (box[0], box[1]-7-text_height), (box[0]+text_width+2, box[1]),
                (0, 0, 0), -1)
    cv2.addWeighted(overlay_text, alpha_text, output, 1 - alpha_text, 0, output)
    cv2.rectangle(output, (box[0], box[1]), (box[2], box[3]),
                    color, thickness)
    cv2.putText(output, label.upper(), (box[0], box[1]-5),
            cv2.FONT_HERSHEY_SIMPLEX, font_size, (255, 255, 255), thickness, cv2.LINE_AA)
    return output

In [None]:
viz_images = []

for img_id in df_annotations_wbf['image_id'].unique()[:2]:
    img_path = df_annotations_wbf[df_annotations_wbf.image_id==img_id]['image_path'].iloc[0]
#     print(img_path)
    img_array  = cv2.imread(img_path)

    img_annotations = df_annotations[df_annotations.image_id==img_id]
    boxes_actual = img_annotations[['x_min', 'y_min', 'x_max', 'y_max']].to_numpy().tolist()
    labels_actual = img_annotations['class_id'].to_numpy().tolist()
    
    img_annotations_wbf = df_annotations_wbf[df_annotations_wbf.image_id==img_id]
    boxes_wbf = img_annotations_wbf[['x_min', 'y_min', 'x_max', 'y_max']].to_numpy().tolist()
    box_labels_wbf = img_annotations_wbf['class_id'].to_numpy().tolist()
    
    print("Bboxes before WBF:\n", boxes_actual)
    print("Labels before WBF:\n", labels_actual)
    
    ## Visualize Original Bboxes
    img_before = img_array.copy()
    for box, label in zip(boxes_actual, labels_actual):
        x_min, y_min, x_max, y_max = (box[0], box[1], box[2], box[3])
        color = label2color[int(label)]
        img_before = draw_bbox(img_before, list(np.int_(box)), viz_labels[label], color)
    viz_images.append(img_before)

    print("Bboxes after WBF:\n", boxes_wbf)
    print("Labels after WBF:\n", box_labels_wbf)
    
    ## Visualize Bboxes after operation
    img_after = img_array.copy()
    for box, label in zip(boxes_wbf, box_labels_wbf):
        color = label2color[int(label)]
        img_after = draw_bbox(img_after, list(np.int_(box)), viz_labels[label], color)
    viz_images.append(img_after)
    print()
        
plot_imgs(viz_images, cmap=None)
plt.figtext(0.3, 0.9,"Original Bboxes", va="top", ha="center", size=25)
plt.figtext(0.73, 0.9,"WBF", va="top", ha="center", size=25)
plt.savefig('wbf.png', bbox_inches='tight')
plt.show()

<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Augmentations & Transforms</span>

Usign the Albumentations package to create an augmentation pipeline for training and validation. 

In [None]:
def get_train_transforms():
    return A.Compose(
        [
        ## RandomSizedCrop not working for some reason. I'll post a thread for this issue soon.
        ## Any help or suggestions are appreciated.
#         A.RandomSizedCrop(min_max_height=(300, 512), height=512, width=512, p=0.5),
#         A.RandomSizedCrop(min_max_height=(300, 1000), height=1000, width=1000, p=0.5),
        A.OneOf([
            A.HueSaturationValue(hue_shift_limit=0.2, sat_shift_limit= 0.2, 
                                 val_shift_limit=0.2, p=0.9),
            A.RandomBrightnessContrast(brightness_limit=0.2, 
                                       contrast_limit=0.2, p=0.9),
        ],p=0.9),
        A.JpegCompression(quality_lower=85, quality_upper=95, p=0.2),
        A.OneOf([
            A.Blur(blur_limit=3, p=1.0),
            A.MedianBlur(blur_limit=3, p=1.0)
            ],p=0.1),
        A.HorizontalFlip(p=0.5),
        A.VerticalFlip(p=0.5),
        A.RandomRotate90(p=0.5),
        A.Transpose(p=0.5),
        A.Resize(height=512, width=512, p=1),
        A.Cutout(num_holes=8, max_h_size=64, max_w_size=64, fill_value=0, p=0.5),
        ToTensorV2(p=1.0)
        ], 
        p=1.0, 
        bbox_params=A.BboxParams(
            format='pascal_voc',
            min_area=0, 
            min_visibility=0,
            label_fields=['labels']
        )
    )

def get_valid_transforms():
    return A.Compose(
        [
            A.Resize(height=512, width=512, p=1.0),
            ToTensorV2(p=1.0),
        ], 
        p=1.0, 
        bbox_params=A.BboxParams(
            format='pascal_voc',
            min_area=0, 
            min_visibility=0,
            label_fields=['labels']
        )
    )

<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Dataset Retrieval and Pre-processsing</span>

Define the data loader class to retrive images, perform albumentation-based + custom CutMix and MixUp augmentations

In [None]:
TRAIN_ROOT_PATH = '../input/vinbigdata-original-image-dataset/vinbigdata/train'

class DatasetRetriever(Dataset):

    def __init__(self, marking, image_ids, transforms=None, test=False):
        super().__init__()
        self.image_ids = image_ids
        self.marking = marking
        self.transforms = transforms
        self.test = test
        
    def __getitem__(self, index: int):
        image_id = self.image_ids[index]
        
        image, boxes, labels = self.load_image_and_boxes(index)
        
        if self.test or random.random() > 0.33:
            image, boxes, labels = self.load_image_and_boxes(index)
        elif random.random() > 0.5:
            image, boxes, labels = self.load_cutmix_image_and_boxes(index)
        else:
            image, boxes, labels = self.load_mixup_image_and_boxes(index)
        
        ## To prevent ValueError: y_max is less than or equal to y_min for bbox from albumentations bbox_utils
        labels = np.array(labels, dtype=np.int).reshape(len(labels), 1)
        combined = np.hstack((boxes.astype(np.int), labels))
        combined = combined[np.logical_and(combined[:,2] > combined[:,0],
                                                          combined[:,3] > combined[:,1])]
        boxes = combined[:, :4]
        labels = combined[:, 4].tolist()
        
        target = {}
        target['boxes'] = boxes
        target['labels'] = torch.tensor(labels)
        target['image_id'] = torch.tensor([index])
        if self.transforms:
            for i in range(10):
                sample = self.transforms(**{
                    'image': image,
                    'bboxes': target['boxes'],
                    'labels': labels
                })
                if len(sample['bboxes']) > 0:
                    image = sample['image']
                    target['boxes'] = torch.stack(tuple(map(torch.tensor, zip(*sample['bboxes'])))).permute(1, 0)
                    target['boxes'][:,[0,1,2,3]] = target['boxes'][:,[1,0,3,2]]  ## ymin, xmin, ymax, xmax
                    break
            
            ## Handling case where no valid bboxes are present
            if len(target['boxes'])==0 or i==9:
                return None
            else:
                ## Handling case where augmentation and tensor conversion yields no valid annotations
                try:
                    assert torch.is_tensor(image), f"Invalid image type:{type(image)}"
                    assert torch.is_tensor(target['boxes']), f"Invalid target type:{type(target['boxes'])}"
                except Exception as E:
                    print("Image skipped:", E)
                    return None      

        return image, target, image_id

    def __len__(self) -> int:
        return self.image_ids.shape[0]
    
    def load_image_and_boxes(self, index):
        image_id = self.image_ids[index]
#         print(f'{TRAIN_ROOT_PATH}/{image_id}.jpg')
        image = cv2.imread(f'{TRAIN_ROOT_PATH}/{image_id}.jpg', cv2.IMREAD_COLOR).copy()
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32)
        image /= 255.0
        records = self.marking[self.marking['image_id'] == image_id]
        boxes = records[['x_min', 'y_min', 'x_max', 'y_max']].values
        labels = records['class_id'].tolist()
        resize_transform = A.Compose([A.Resize(height=512, width=512, p=1.0)], 
                                    p=1.0, 
                                    bbox_params=A.BboxParams(
                                        format='pascal_voc',
                                        min_area=0.1, 
                                        min_visibility=0.1,
                                        label_fields=['labels'])
                                    )

        resized = resize_transform(**{
                'image': image,
                'bboxes': boxes,
                'labels': labels
            })

        resized_bboxes = np.vstack((list(bx) for bx in resized['bboxes']))
        return resized['image'], resized_bboxes, resized['labels']
    
    def load_mixup_image_and_boxes(self, index):
        image, boxes, labels = self.load_image_and_boxes(index)
        r_image, r_boxes, r_labels = self.load_image_and_boxes(random.randint(0, self.image_ids.shape[0] - 1))
        return (image+r_image)/2, np.vstack((boxes, r_boxes)).astype(np.int32), np.concatenate((labels, r_labels))

    def load_cutmix_image_and_boxes(self, index, imsize=512):
        """ 
        This implementation of cutmix author:  https://www.kaggle.com/nvnnghia 
        Refactoring and adaptation: https://www.kaggle.com/shonenkov
        """
        w, h = imsize, imsize
        s = imsize // 2
    
        xc, yc = [int(random.uniform(imsize * 0.25, imsize * 0.75)) for _ in range(2)]  # center x, y
        indexes = [index] + [random.randint(0, self.image_ids.shape[0] - 1) for _ in range(3)]

        result_image = np.full((imsize, imsize, 3), 1, dtype=np.float32)
        result_boxes = []
        result_labels = np.array([], dtype=np.int)

        for i, index in enumerate(indexes):
            image, boxes, labels = self.load_image_and_boxes(index)
            if i == 0:
                x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)
                x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)
            elif i == 1:  # top right
                x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
                x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
            elif i == 2:  # bottom left
                x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
                x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, max(xc, w), min(y2a - y1a, h)
            elif i == 3:  # bottom right
                x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
                x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)
            result_image[y1a:y2a, x1a:x2a] = image[y1b:y2b, x1b:x2b]
            padw = x1a - x1b
            padh = y1a - y1b

            boxes[:, 0] += padw
            boxes[:, 1] += padh
            boxes[:, 2] += padw
            boxes[:, 3] += padh

            result_boxes.append(boxes)
            result_labels = np.concatenate((result_labels, labels))

        result_boxes = np.concatenate(result_boxes, 0)
        np.clip(result_boxes[:, 0:], 0, 2 * s, out=result_boxes[:, 0:])
        result_boxes = result_boxes.astype(np.int32)
        index_to_use = np.where((result_boxes[:,2]-result_boxes[:,0])*(result_boxes[:,3]-result_boxes[:,1]) > 0)
        result_boxes = result_boxes[index_to_use]
        result_labels = result_labels[index_to_use]
        
        return result_image, result_boxes, result_labels

<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Visualize Mosaic and MixUp Techniques</span>

<span style="color: #000508; font-family: Segoe UI; font-size: 2.0em; font-weight: 300;">MixUp</span>

MixUp is a technique which trains a neural network on convex combinations of pairs of examples and their labels imparting regularization to favor simple linear behavior in-between training examples. Experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. It also reduces the memorization of corrupt labels, increases the robustness of examples, and stabilizes the training process.

![](https://miro.medium.com/max/1580/1*XqyD5OE47AdqeR6KeMg9FQ.png)

In [None]:
warnings.filterwarnings("ignore")

viz_ids = df_annotations_wbf.sample(6).image_id.tolist()
viz_dataset = DatasetRetriever(
                    image_ids=np.array(viz_ids),
                    marking=df_annotations_wbf,
                    transforms=get_train_transforms(),
                    test=False,
                    )
viz_images = []
for idx, im_id in enumerate(viz_ids):
    image, boxes, labels = viz_dataset.load_mixup_image_and_boxes(idx)
    image_viz = image.copy()
#     print("image_viz.shape", image_viz.shape)
    for box, label in zip(boxes, labels):
        color = label2color[int(label)]
#         image_viz *= 255 
#         image_viz = image_viz.astype('uint8')
        image_viz = cv2.normalize(src=image_viz, dst=None, alpha=0, beta=255,
                                  norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8U)
        image_viz = draw_bbox_small(image_viz, list(np.int_(box)), viz_labels[label], color)
    viz_images.append(image_viz)

fig = plot_imgs(viz_images)
fig.suptitle("MIXUP VISUALIZED", x=0.125, y=0.91, ha='left',
             fontweight=100, fontfamily='Lato', size=36)
plt.savefig('mixup.png', bbox_inches='tight')
plt.show()

<span style="color: #000508; font-family: Segoe UI; font-size: 2.0em; font-weight: 300;">CutMix - Mosaic</span>

In CutMix augmentation, patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. By making efficient use of training pixels and retaining the regularization effect of regional dropout, CutMix consistently outperforms the state-of-the-art augmentation strategies 

CutMix achieves lower validation errors. When the learning rates are low, the training without cutmix suffer from overfitting with increasing validation error. CutMix has experimentally shown a steady decrease in validation error with significantly less overfitting.

**I have used the Mosaic CutMix implementaion which mixes four images and its bboxes randomly. Recent research and experiments show they yield the best performance.**


![](https://www.researchgate.net/publication/338474384/figure/fig4/AS:845276554227712@1578541043082/Result-of-a-mosaic-data-augmentation-example-from-four-input-images-best-viewed-in.ppm)


In [None]:
viz_ids = df_annotations_wbf.sample(6).image_id.tolist()
viz_dataset = DatasetRetriever(
                    image_ids=np.array(viz_ids),
                    marking=df_annotations_wbf,
                    transforms=get_train_transforms(),
                    test=False,
                    )
viz_images = []
for idx, im_id in enumerate(viz_ids):
    image, boxes, labels = viz_dataset.load_cutmix_image_and_boxes(idx)
    image_viz = image.copy()
#     print("image_viz.shape", image_viz.shape)
    for box, label in zip(boxes, labels):
        color = label2color[int(label)]
#         image_viz *= 255 
#         image_viz = image_viz.astype('uint8')
        image_viz = cv2.normalize(src=image_viz, dst=None, alpha=0, beta=255,
                                  norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8U)
        image_viz = draw_bbox_small(image_viz, list(np.int_(box)), viz_labels[label], color)
    viz_images.append(image_viz)

fig = plot_imgs(viz_images)
fig.suptitle("CUTMIX VISUALIZED", x=0.125, y=0.91, ha='left',
             fontweight=100, fontfamily='Lato', size=36)
plt.savefig('cutmix.png', bbox_inches='tight')
plt.show()

<span style="color: #000508; font-family: Segoe UI; font-size: 2.0em; font-weight: 300;">Helper Functions</span>

Function to store Average and Current values

In [None]:
class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Build Training and Validation Loops</span>

In [None]:
class Fitter:
    def __init__(self, model, device, config):
        self.config = config
        self.epoch = 0

        self.base_dir = f'./{config.folder}'
        
        if not os.path.exists(self.base_dir):
            os.makedirs(self.base_dir)
        
        self.log_path = f'{self.base_dir}/log.txt'
        self.best_summary_loss = 10**5
        self.model = model
        self.device = device

        param_optimizer = list(self.model.named_parameters())
        no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
        optimizer_grouped_parameters = [
            {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.001},
            {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
        ] 

        self.optimizer = config.OptimizerClass(self.model.parameters(), lr=config.lr)
        self.scheduler = config.SchedulerClass(self.optimizer, **config.scheduler_params)
        self.log(f'Fitter prepared. Device is {self.device}')

    def fit(self, train_loader, validation_loader):
        history_dict = {}
        history_dict['epoch'] = []
        history_dict['train_loss'] = []
        history_dict['val_loss'] = []
        history_dict['train_lr'] = []
        
        for e in range(self.config.n_epochs):
            history_dict['epoch'].append(self.epoch)
            lr = self.optimizer.param_groups[0]['lr']
            timestamp = datetime.utcnow().isoformat()
            
            if self.config.verbose:
                self.log(f'\n{timestamp}\nLR: {lr}')

            t = time.time()
            summary_loss, loss_trend, lr_trend = self.train_epoch(train_loader)
            history_dict['train_loss'].append(loss_trend)
            history_dict['train_lr'].append(lr_trend)
            self.log(f'[RESULT]: Train. Epoch: {self.epoch}, summary_loss: {summary_loss.avg:.5f}, time: {(time.time() - t):.5f}')
            self.save(f'{self.base_dir}/last-checkpoint.bin')
            
            t = time.time()
            summary_loss, loss_trend = self.validation(validation_loader)
            history_dict['val_loss'].append(loss_trend)
            self.log(f'[RESULT]: Val. Epoch: {self.epoch}, summary_loss: {summary_loss.avg:.5f}, time: {(time.time() - t):.5f}')
            
            if summary_loss.avg < self.best_summary_loss:
                self.best_summary_loss = summary_loss.avg
                self.model.eval()
                self.save(f'{self.base_dir}/best-checkpoint-{str(self.epoch).zfill(3)}epoch.bin')
                
                try:
                    os.remove(f)
                except:pass
                f = f'{self.base_dir}/best-checkpoint-{str(self.epoch).zfill(3)}epoch.bin'

            if self.config.validation_scheduler:
                self.scheduler.step(metrics=summary_loss.avg)

            self.epoch += 1
        return history_dict

    def train_epoch(self, train_loader):
        self.model.train()
        summary_loss = AverageMeter()
        t = time.time()
        loss_trend = []
        lr_trend = []
        for step, (images, targets, image_ids) in tqdm(enumerate(train_loader), total=len(train_loader)):
            if self.config.verbose:
                if step % self.config.verbose_step == 0:
                    print(
                        f'Train Step {step}/{len(train_loader)}, ' + \
                        f'summary_loss: {summary_loss.avg:.5f}, ' + \
                        f'time: {(time.time() - t):.5f}', end='\r'
                    )            
            
            images = torch.stack(images)
            images = images.to(self.device).float()
            
            target_res = {}
            boxes = [target['boxes'].to(self.device).float() for target in targets]
            labels = [target['labels'].to(self.device).float() for target in targets]
            target_res['bbox'] = boxes
            target_res['cls'] = labels
            self.optimizer.zero_grad()
            output = self.model(images, target_res)

            loss = output['loss']
            loss.backward()
            summary_loss.update(loss.detach().item(), self.config.batch_size)
            self.optimizer.step()

            if self.config.step_scheduler:
                self.scheduler.step()

            
            lr = self.optimizer.param_groups[0]['lr']
            loss_trend.append(summary_loss.avg)
            lr_trend.append(lr)
        return summary_loss, loss_trend, lr_trend
    
    def validation(self, val_loader):
        self.model.eval()
        summary_loss = AverageMeter()
        t = time.time()
        loss_trend = []
#         lr_trend = []
        
        for step, (images, targets, image_ids) in tqdm(enumerate(val_loader), total=len(val_loader)):
            if self.config.verbose:
                if step % self.config.verbose_step == 0:
                    print(
                        f'Val Step {step}/{len(val_loader)}, ' + \
                        f'summary_loss: {summary_loss.avg:.5f}, ' + \
                        f'time: {(time.time() - t):.5f}', end='\r'
                    )
            with torch.no_grad():
                images = torch.stack(images)
                images = images.to(self.device).float()
                target_res = {}
                boxes = [target['boxes'].to(self.device).float() for target in targets]
                labels = [target['labels'].to(self.device).float() for target in targets]
                target_res['bbox'] = boxes
                target_res['cls'] = labels 
                target_res["img_scale"] = torch.tensor([1.0] * self.config.batch_size,
                                                       dtype=torch.float).to(self.device)
                target_res["img_size"] = torch.tensor([images[0].shape[-2:]] * self.config.batch_size,
                                                      dtype=torch.float).to(self.device)
                
                output = self.model(images, target_res)
            
                loss = output['loss']
                summary_loss.update(loss.detach().item(), self.config.batch_size)

                loss_trend.append(summary_loss.avg)
        return summary_loss, loss_trend[-1]
    
    
    def save(self, path):
        self.model.eval()
        torch.save({
            'model_state_dict': self.model.model.state_dict(),
            'optimizer_state_dict': self.optimizer.state_dict(),
            'scheduler_state_dict': self.scheduler.state_dict(),
            'best_summary_loss': self.best_summary_loss,
            'epoch': self.epoch,
        }, path)

    def load(self, path):
        checkpoint = torch.load(path)
        self.model.model.load_state_dict(checkpoint['model_state_dict'])
        self.optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        self.scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
        self.best_summary_loss = checkpoint['best_summary_loss']
        self.epoch = checkpoint['epoch'] + 1
        
    def log(self, message):
        if self.config.verbose:
            print(message)
        with open(self.log_path, 'a+') as logger:
            logger.write(f'{message}\n')

<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Define Training Configuration</span>

In [None]:
class TrainGlobalConfig:
    def __init__(self):
        self.num_classes = 14
        self.num_workers = 2
        self.batch_size = 4 
        self.n_epochs = 2
        self.lr = 0.0002
        self.model_name = 'tf_efficientdet_d1'
        self.folder = 'training_job'
        self.verbose = True
        self.verbose_step = 1
        self.step_scheduler = True
        self.validation_scheduler = False
        self.n_img_count = len(df_annotations_wbf.image_id.unique())
        self.OptimizerClass = torch.optim.AdamW
        self.SchedulerClass = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts
        self.scheduler_params = dict(
                            T_0=50,
                            T_mult=1,
                            eta_min=0.0001,
                            last_epoch=-1,
                            verbose=False
                            )
        self.kfold = 3
    
    def reset(self):
        self.OptimizerClass = torch.optim.AdamW
        self.SchedulerClass = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts

train_config = TrainGlobalConfig()

<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Creating Validation Folds</span>

In [None]:
from sklearn.model_selection import GroupKFold, train_test_split

df_annotations_wbf['fold'] = -1
group_kfold  = GroupKFold(n_splits = 3)
for fold, (train_index, val_index) in enumerate(group_kfold.split(df_annotations_wbf,
                                                              groups=df_annotations_wbf.image_id.tolist())):
    df_annotations_wbf.loc[val_index, 'fold'] = fold
df_annotations_wbf.sample(5)

<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Cosine Annealing Warm Restarts LR Scheduler</span>

Cosine Annealing is a type of learning rate schedule where the learning rate is decreased according to the cosine function to a minimum value before being increased rapidly again. The resetting of the learning rate acts like a simulated restart of the learning process and it also mimics a sequential model ensembling. This rapid increase of the learning rate may also cause the 'hopping' out of the weights of the model from local minima and it drives its way towards the global minimum.

The reuse of good weights of the model as a starting point of the restart is referred to as a "warm restart" in contrast to a "cold restart" where a new set of small random numbers may be used as a starting point.

It has been found empirically in numerous experiments that a cyclic LR schedule such as Cosine Annealing outperforms traditional learning rate schedulers in terms of faster and better convergence.


![](https://ruder.io/content/images/size/w2000/2017/12/snapshot_ensembles.png)


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker
from matplotlib import rcParams
sns.set(rc={"font.size":18,"axes.titlesize":30,"axes.labelsize":18,
            "axes.titlepad":22, "axes.labelpad":18, "legend.fontsize":15,
            "legend.title_fontsize":15, "figure.titlesize":35})

model = torch.nn.Linear(2, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer,
                                                                T_0=100,
                                                                T_mult=1,
                                                                eta_min=0.0001,
                                                                last_epoch=-1,
                                                                verbose = False)

lrs = []
sample_steps = train_config.n_img_count//train_config.n_epochs
for e in range(train_config.n_epochs):
    for i in range(sample_steps):
        scheduler.step()
        lrs.append(
            optimizer.param_groups[0]["lr"]
        )
fig = plt.figure(figsize=(22,8))
fig.suptitle("PROJECTED LR TREND - COSINE ANNEALING WARM RESTARTS", x=0.125, y=1.00, ha='left',
             fontweight=100, fontfamily='Lato', size=34)
plt.plot(lrs)
plt.savefig('cosanneal.png', bbox_inches='tight')
plt.show()

<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Initiate Training</span>

In [None]:
## Training will resume if the checkpoint path is specified below
checkpoint_path = None

device = torch.device('cuda:0' if torch.cuda.is_available() else "cpu")

## Filters out invalid return items from the Dataloader
# def collate_fn(batch):
#     batch = list(filter(lambda x : x is not None, batch))
#     return default_collate(batch)

def collate_fn(batch):
    batch = list(filter(lambda x: x is not None, batch))
    
    return tuple(zip(*batch))

# def collate_fn(batch):
#     return tuple(zip(*batch))

fold_history = []
for val_fold in range(train_config.kfold):
    print(f'Fold {val_fold+1}/{train_config.kfold}')
    
    train_ids = df_annotations_wbf[df_annotations_wbf['fold'] != val_fold].image_id.unique()
    val_ids = df_annotations_wbf[df_annotations_wbf['fold'] == val_fold].image_id.unique()
    
    train_dataset = DatasetRetriever(
                        image_ids=train_ids,
                        marking=df_annotations_wbf,
                        transforms=get_train_transforms(),
                        test=False,
                        )

    validation_dataset = DatasetRetriever(
                            image_ids=val_ids,
                            marking=df_annotations_wbf,
                            transforms=get_valid_transforms(),
                            test=True,
                            )

    train_loader = torch.utils.data.DataLoader(
        train_dataset,
        batch_size=train_config.batch_size,
        sampler=RandomSampler(train_dataset),
        pin_memory=False,
        drop_last=True,
        num_workers=train_config.num_workers,
        collate_fn=collate_fn,
    )
    val_loader = torch.utils.data.DataLoader(
        validation_dataset, 
        batch_size=train_config.batch_size,
        num_workers=train_config.num_workers,
        shuffle=False,
        sampler=SequentialSampler(validation_dataset),
        pin_memory=False,
        collate_fn=collate_fn,
    )
    
    base_config = get_efficientdet_config(train_config.model_name)
    base_config.image_size = (512, 512)

    if(checkpoint_path):
        print(f'Resuming from checkpoint: {checkpoint_path}')        
        model = create_model_from_config(base_config, bench_task='train', bench_labeler=True,
                                 num_classes=train_config.num_classes,
                                 pretrained=False)
        model.to(device)
        
        fitter = Fitter(model=model, device=device, config=train_config)
        fitter.load(checkpoint_path)
    
    else:
        model = create_model_from_config(base_config, bench_task='train', bench_labeler=True,
                                     pretrained=True,
                                     num_classes=train_config.num_classes)
        model.to(device)
    
        fitter = Fitter(model=model, device=device, config=train_config)  
        
    model_config = model.config
    history_dict = fitter.fit(train_loader, val_loader)
    fold_history.append(history_dict)
    
    ## Reset Optimizer and LR Sch+)
    

<span style="color: #0087e4; font-family: Segoe UI; font-size: 2.3em; font-weight: 300;">Visualize Training Job & Model Metrics</span>

In [None]:
fold_history_c = fold_history.copy()

train_loss_all = []
val_loss_all = []

n_steps_fold = (train_config.n_img_count//train_config.n_epochs)//train_config.kfold

for fold, fold_dict in enumerate(fold_history_c):
    train_losses = [item for sublist in fold_dict['train_loss'] for item in sublist]
    val_losses = [item for item in fold_dict['val_loss']]
    train_lrs = [item for sublist in fold_dict['train_lr'] for item in sublist]
    train_loss_all.append(np.array(train_losses))
    
    val_losses = np.repeat(val_losses, n_steps_fold).tolist()
    val_loss_all.append(np.array(val_losses))
    
    fig = plt.figure(figsize=(22,8))
    fig.suptitle(f'FOLD{fold+1} - TRAIN LOSS & VAL LOSSES', x=0.125, y=1.00, ha='left',
                 fontweight=100, fontfamily='Lato', size=36)
    plt.plot(train_losses, color='red', label='train_loss', linewidth=1)
    plt.plot(val_losses, color='green', label='val_loss', linewidth=1)
    plt.legend() 
    plt.savefig(f'fold{fold+1}_loss_trend.png', bbox_inches='tight')
    plt.show()
    
    fig = plt.figure(figsize=(22,8))
    fig.suptitle(f'FOLD{fold+1} - LEARNING RATE TREND', x=0.125, y=1.00, ha='left',
                 fontweight=100, fontfamily='Lato', size=36)
    plt.plot(train_lrs, color='blue', label='lr', linewidth=1)
    plt.legend() 
    plt.savefig(f'fold{fold+1}_lr_trend.png', bbox_inches='tight')
    plt.show()

In [None]:
train_df = pd.DataFrame([np.array(i) for i in train_loss_all]).transpose()
train_df['average'] = train_df.mean(axis=1)
val_df = pd.DataFrame([np.array(i) for i in val_loss_all]).transpose()
val_df['average'] = val_df.mean(axis=1)

fig = plt.figure(figsize=(22,8))
fig.suptitle(f'AVERAGE TRAIN LOSS & VAL LOSSES', x=0.125, y=1.00, ha='left',
             fontweight=100, fontfamily='Lato', size=36)
plt.plot(train_df['average'], color='red', label='train_loss', linewidth=1)
plt.plot(val_df['average'], color='green', label='val_loss', linewidth=1)
plt.legend() 
plt.savefig('avg_loss_trend.png', bbox_inches='tight')
plt.show()

In [None]:
!zip -r training_job_full_2epochs_30_jan.zip ./training_job/*

<p style='text-align: center;'><span style="color: #000508; font-family: Segoe UI; font-size: 1.3em; font-weight: 300;">Let me know if you have any suggestions or improvements.</span></p>

<p style='text-align: center;'><span style="color: #000508; font-family: Segoe UI; font-size: 1.8em; font-weight: 300;">THANK YOU! PLEASE UPVOTE</span></p>

<p style='text-align: center;'><span style="color: #000508; font-family: Segoe UI; font-size: 2.5em; font-weight: 300;">HOPE IT WAS USEFUL</span></p>

<p style='text-align: center;'><span style="color: #0087e4; font-family: Segoe UI; font-size: 1.4em; font-weight: 300;">Tuning & updation in progress.</span></p>