<p style='text-align: center;'><span style="color: #000508; font-family: Segoe UI; font-size: 2.5em; font-weight: 350;"> [TRAIN] EfficientDet + augmentation + Stratified folding + mixed precision</span></p>

<br><br>
<span style="color: #027fc1; font-family: Segoe UI; font-size: 1.9em; font-weight: 350;">Highlights</span><br>

<p style='text-align: left;'><span style="color: #000508; font-family: Segoe UI; font-size: 1.3em;font-weight: 350;"> ðŸ‘‹ Big thanks to: </span></p>


>  [Alex Shonenkov for his great notebook](https://www.kaggle.com/shonenkov/training-efficientdet)<br>
>  [Aleksandr for his help](https://www.kaggle.com/vodan37)<br>


<br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.3em; font-weight: 400;">I adopted his notebook to SIIM-FISABIO-RSNA COVID-19 competition and what is waiting for you in this notebook:</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 350;">1. Step by step <b>EfficientDet</b> training pipeline (you can use this pipeline for another detection task).</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 350;">2. Apex installation for mixed precision. I will tell about this great tool later.</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 350;">3. Dividing data into stratified folds.</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 350;">4. Good augmentation strategy.</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 350;">5. EfficientDet models family for different detection tasks and GPU possibilities.</span><br>
<br>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 350;">OK, Let's go! ðŸš€</span><br>

<span style="color: #027fc1; font-family: Segoe UI; font-size: 1.9em; font-weight: 350;">Data preparation</span>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 350;">Before start we should prepare our data.</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 350;">I prepared images by myself because It takes a long time - about 2 hours and no need to do it every time. I uploaded images to public dataset <b>simcov19eqhistorig</b>. Just check input part at the right corner.</span><br>


<span style="color: #000508; font-family: Segoe UI; font-size: 1.3em; font-weight: 350;">Main features:</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 350;">1. Convert from DCM to PNG</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 350;">2. Non resized images (original size)</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 350;">3. Added equalizing histogram in images (it gave better results)</span><br>
<br>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 350;">OK. It means there is nothing to do and go further.</span><br>

<span style="color: #027fc1; font-family: Segoe UI; font-size: 1.9em; font-weight: 350;">Mixed precision</span>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">You can read more about mixed precision here: </span>[Introducing native PyTorch automatic mixed precision for faster training on NVIDIA GPUs](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision)

<p style='text-align: left;'><span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">From official site:</span></p>

>  Most deep learning frameworks, including PyTorch, train with 32-bit floating point (FP32) arithmetic by default. However this is not essential to achieve full accuracy for many deep learning models. In 2017, NVIDIA researchers developed a methodology for mixed-precision training, which combined single-precision (FP32) with half-precision (e.g. FP16) format when training a network, and achieved the same accuracy as FP32 training using the same hyperparameters, with additional performance benefits on NVIDIA GPUs: <br>1. Shorter training time;<br>2. Lower memory requirements, enabling larger batch sizes, larger models, or larger inputs.

<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">And with mixed precision you can decrease GPU's usage and increase training speed without decreasing accuracy.<br>The GPU is very significant resourse for you if you have only Google Colab with 16Gb GPU as me.</span><br><br>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">Using mixed precision with PyTorch is very simple. Let's install <b>Apex</b> library and later I will show how to integrate it into training process.</span><br><br>

In [None]:
# installing NVIDIA apex
!git clone https://github.com/NVIDIA/apex
!pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./apex

In [None]:
import apex

<span style="color: #027fc1; font-family: Segoe UI; font-size: 1.9em; font-weight: 350;">Install dependencies</span>

In [None]:
!pip install albumentations==0.4.6
!pip install effdet
!pip install timm
!pip install pycocotools

In [None]:
# check what GPU we have
!nvidia-smi

In [None]:
import sys
import torch
import os
from datetime import datetime
import time
import random
import cv2
import pandas as pd
import numpy as np
import albumentations as A
import matplotlib.pyplot as plt
from albumentations.pytorch.transforms import ToTensorV2
from sklearn.model_selection import StratifiedKFold
from torch.utils.data import Dataset,DataLoader
from torch.utils.data.sampler import SequentialSampler, RandomSampler
from glob import glob
from effdet import get_efficientdet_config, EfficientDet, DetBenchTrain
from effdet.efficientdet import HeadNet
from effdet import create_model
from pathlib import Path
from tqdm import tqdm

SEED = 42

def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True

seed_everything(SEED)

<span style="color: #027fc1; font-family: Segoe UI; font-size: 1.9em; font-weight: 350;">Main constants</span>

In [None]:
# image size that we will use during training. Image size should be divided by 128. 
# You can increase image size and it gives you better results but It requires more GPU resource. 
# This image size is good for this notebook with 16Gb GPU. 
img_size = 768 

# number of fold that we are going to choose for validation during the training
fold_number = 0

# format of our images
image_ext = 'png'

# path to images directory
TRAIN_ROOT_PATH = '/kaggle/input/siimcov19eqhistorig/train_eq_hist_orig_png'

<span style="color: #027fc1; font-family: Segoe UI; font-size: 1.9em; font-weight: 350;">Prepare labels</span>
<br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">Prepare information about bboxes to appropriate format.</span><br><br>

In [None]:
def get_all_files_in_folder(folder, types):
    files_grabbed = []
    for t in types:
        files_grabbed.extend(folder.rglob(t))
    files_grabbed = sorted(files_grabbed, key=lambda x: x)
    return files_grabbed

def prepare_csv_for_efdet(input_filepath, output_filepath):
    train_image_df = pd.read_csv(input_filepath)
    train_image_df['id'] = train_image_df['id'].str.split('_', expand=True)[0]

    image_ids = train_image_df['id'].tolist()
    labels_raw = train_image_df['label'].tolist()
    boxes_raw = train_image_df['boxes'].tolist()

    images = get_all_files_in_folder(Path('/kaggle/input/siimcov19eqhistorig/train_eq_hist_orig_png'), ['*.png'])

    result = []
    for image_path in tqdm(images, colour = '#00ff00'):
        s = image_path.stem + ','

        boxes = []
        for image_id, label in zip(image_ids, labels_raw):
            if image_id == image_path.stem:

                label_split = label.split('opacity')
                if len(label_split) > 1:

                    for l in label_split:
                        if l != '':
                            box = l.split(' ')
                            x1 = (float(box[2]))
                            if x1 < 0: x1 = 0.0
                            y1 = (float(box[3]))
                            if y1 < 0: y1 = 0.0
                            x2 = (float(box[4]))
                            y2 = (float(box[5]))

                            boxes.append([x1, y1, x2, y2])

        boxes_str = ''
        if len(boxes):
            for box in boxes:
                boxes_str += str(box[0]) + ' ' + str(box[1]) + ' ' + str(box[2]) + ' ' + str(box[3]) + ';'

            s += boxes_str[:-1]
        else:
            boxes_str = 'no_box'
            s += boxes_str

        s += ',0'
        result.append(s)

    with open(output_filepath, 'w') as f:
        f.write('image_name,BoxesString,domain\n')
        for item in result:
            f.write("%s\n" % item)

In [None]:
# extract all labels and save them to CSV file
input_filepath = '/kaggle/input/siim-covid19-detection/train_image_level.csv'
output_filepath = '/kaggle/working/images_train.csv'
prepare_csv_for_efdet(input_filepath, output_filepath)

In [None]:
# read our new CSV
df = pd.read_csv('/kaggle/working/images_train.csv')

# remove samples without bboxes
df.drop(df[df.BoxesString == 'no_box'].index, inplace=True)

In [None]:
# divide on folds
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
df_folds = df[['image_name']].copy()
for fold_number, (train_index, val_index) in enumerate(skf.split(X=df.image_name, y=df['domain'])):
    df_folds.loc[df_folds.iloc[val_index].index, 'fold'] = fold_number

<span style="color: #027fc1; font-family: Segoe UI; font-size: 1.9em; font-weight: 350;">Augmentations</span>
<br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">I used original augmentation of Alex Shonenkov with several changes:</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">1. Delete random crops because we have images with different resolutions and it can lead to bboxes on the edge of image or cutted bboxes.</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">2. Delete Cutout augmentation because this augmentation is good for a lot of bboxes in one sample and for finding bboxes on the edge of image. Our dataset has only bboxes in the center of image and only 2 bboxes on image.</span><br><br>

In [None]:
def get_train_transforms():
    return A.Compose([
                        A.OneOf([
                            A.HueSaturationValue(hue_shift_limit=0.2, sat_shift_limit= 0.2, 
                                                val_shift_limit=0.2, p=0.9),
                            A.RandomBrightnessContrast(brightness_limit=0.2, 
                                                    contrast_limit=0.2, p=0.9),
                        ],p=0.9),
                        A.ToGray(p=0.01),
                        A.HorizontalFlip(p=0.5),
                        A.VerticalFlip(p=0.5),
                        A.RandomRotate90(p=0.5),
                        A.Transpose(p=0.5),
                        A.ImageCompression(quality_lower=85, quality_upper=95, p=0.5),
                        A.OneOf([
                            A.Blur(blur_limit=3, p=1.0), 
                            A.MedianBlur(blur_limit=3, p=1.0),
                            A.MotionBlur(p=1)], p=0.5),
                        A.Resize(height=img_size, width=img_size, p=1),
                        # A.Cutout(num_holes=8, max_h_size=64, max_w_size=64, fill_value=0, p=0.5), 
                        ToTensorV2(p=1.0), 
                        ],
                        p=1.0,
                        bbox_params=A.BboxParams(
                                                format="pascal_voc",
                                                min_area=0, 
                                                min_visibility=0,
                                                label_fields=['labels']
                    )
    )


def get_valid_transforms():
    return A.Compose(
        [
            A.Resize(height=img_size, width=img_size, p=1.0),
            ToTensorV2(p=1.0),
        ], 
        p=1.0, 
        bbox_params=A.BboxParams(
            format='pascal_voc',
            min_area=0, 
            min_visibility=0,
            label_fields=['labels']
        )
    )

<span style="color: #027fc1; font-family: Segoe UI; font-size: 1.9em; font-weight: 350;">Dataset</span>
<br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">Alex added great custom cutmix augmentation but I trained without it. I think this augmentation gives worse results with this dataset. You can check it ðŸ™‚</span><br>

In [None]:
class DatasetRetriever(Dataset):

    def __init__(self, marking, image_ids, transforms=None, test=False):
        super().__init__()

        self.image_ids = image_ids
        self.marking = marking
        self.transforms = transforms
        self.test = test

    def __getitem__(self, index: int):
        index = self.image_ids[index]
        image_name = self.marking.loc[index]['image_name']

        # if self.test or random.random() > 0.3:
        image, boxes = self.load_image_and_boxes(index)
        # else:
            # image, boxes = self.load_cutmix_image_and_boxes(index)


        # there is only one class
        labels = torch.ones((boxes.shape[0],), dtype=torch.int64)
        target = {'boxes': boxes, 'labels': labels, 'image_id': torch.tensor([index])}
        
        if self.transforms:
            for i in range(10):
                sample = self.transforms(**{
                    'image': image,
                    'bboxes': target['boxes'],
                    'labels': labels
                })
                if len(sample['bboxes']) > 0:
                    image = sample['image']
                    target['boxes'] = torch.stack(tuple(map(torch.tensor, zip(*sample['bboxes'])))).permute(1, 0)
                    target['boxes'][:, [0, 1, 2, 3]] = target['boxes'][:, [1, 0, 3, 2]]  # yxyx: be warning
                    target['labels'] = target['labels'][:len(target['boxes'])]
                    break

        return image, target, image_name

    def __len__(self) -> int:
        return self.image_ids.shape[0]

    def load_image_and_boxes(self, index):
        image_name = self.marking['image_name'][index]
        image = cv2.imread(TRAIN_ROOT_PATH+'/'+image_name + '.' + image_ext, cv2.IMREAD_COLOR)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32)
        image /= 255.0
        row = self.marking.loc[index]

        bboxes = []
        if row['BoxesString'] != 'no_box':
            for bbox in row['BoxesString'].split(';'):
                bboxes.append(list(map(float, bbox.split(' '))))
        return image, np.array(bboxes)

#     def load_cutmix_image_and_boxes(self, index, imsize=1024):
#         """
#         This implementation of cutmix author:  https://www.kaggle.com/nvnnghia
#         Refactoring and adaptation: https://www.kaggle.com/shonenkov
#         """
#         w, h = imsize, imsize
#         s = imsize // 2

#         xc, yc = [int(random.uniform(imsize * 0.25, imsize * 0.75)) for _ in range(2)]  # center x, y
#         indexes = [index] + [random.randint(0, self.image_ids.shape[0] - 1) for _ in range(3)]

#         result_image = np.full((imsize, imsize, 3), 1, dtype=np.float32)
#         result_boxes = []

#         for i, index in enumerate(indexes):
#             image, boxes = self.load_image_and_boxes(index)
#             if i == 0:
#                 x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)
#                 x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)
#             elif i == 1:  # top right
#                 x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
#                 x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
#             elif i == 2:  # bottom left
#                 x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
#                 x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, max(xc, w), min(y2a - y1a, h)
#             elif i == 3:  # bottom right
#                 x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
#                 x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)
#             result_image[y1a:y2a, x1a:x2a] = image[y1b:y2b, x1b:x2b]
#             padw = x1a - x1b
#             padh = y1a - y1b

#             boxes[:, 0] += padw
#             boxes[:, 1] += padh
#             boxes[:, 2] += padw
#             boxes[:, 3] += padh

#             result_boxes.append(boxes)

#         result_boxes = np.concatenate(result_boxes, 0)
#         np.clip(result_boxes[:, 0:], 0, 2 * s, out=result_boxes[:, 0:])
#         result_boxes = result_boxes.astype(np.int32)
#         result_boxes = result_boxes[
#             np.where((result_boxes[:, 2] - result_boxes[:, 0]) * (result_boxes[:, 3] - result_boxes[:, 1]) > 0)]
#         return result_image, result_boxes

In [None]:
# create datasets using fold_number
train_dataset = DatasetRetriever(
    image_ids=df_folds[df_folds['fold'] != fold_number].index.values,
    marking=df,
    transforms=get_train_transforms(),
    test=False,
)

validation_dataset = DatasetRetriever(
    image_ids=df_folds[df_folds['fold'] == fold_number].index.values,
    marking=df,
    transforms=get_valid_transforms(),
    test=True,
)

In [None]:
# Let's look at one sample

image, target, image_id = train_dataset[1]

boxes = target['boxes'].cpu().numpy().astype(np.int32)
labels = target['labels'].cpu().numpy().astype(np.int32)

numpy_image = image.permute(1,2,0).cpu().numpy()
numpy_image_box = numpy_image.copy()
fig, ax = plt.subplots(1, 1, figsize=(16, 8))

for box in boxes:
    cv2.rectangle(numpy_image_box, (int(box[1]), int(box[0])), (int(box[3]),  int(box[2])), (0, 1, 0), 2)

ax.set_axis_off()
ax.imshow(numpy_image_box)

<span style="color: #027fc1; font-family: Segoe UI; font-size: 1.9em; font-weight: 350;">Fitter</span>
<br>

In [None]:
class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

<span style="color: #027fc1; font-family: Segoe UI; font-size: 1.9em; font-weight: 350;">Training</span>
<br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">Below there is the whole training pipeline. Notice that I added #apex comment near parts where we use mixed precision</span><br>

In [None]:
import warnings

warnings.filterwarnings("ignore")

class Fitter:
    
    def __init__(self, model, device, config):
        self.config = config
        self.epoch = 0

        self.base_dir = f'./{config.folder}'
        if not os.path.exists(self.base_dir):
            os.makedirs(self.base_dir)
        
        self.log_path = f'{self.base_dir}/log.txt'
        self.best_summary_loss = 10**5

        self.model = model
        self.device = device

        param_optimizer = list(self.model.named_parameters())
        no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
        optimizer_grouped_parameters = [
            {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.001},
            {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
        ] 

        self.optimizer = torch.optim.AdamW(self.model.parameters(), lr=config.lr)
        self.scheduler = config.SchedulerClass(self.optimizer, **config.scheduler_params)
        self.log(f'Fitter prepared. Device is {self.device}')

    def fit(self, train_loader, validation_loader):
        for e in range(self.config.n_epochs):
            if self.config.verbose:
                lr = self.optimizer.param_groups[0]['lr']
                timestamp = datetime.utcnow().isoformat()
                self.log(f'\n{timestamp}\nLR: {lr}')

            t = time.time()
            summary_loss = self.train_one_epoch(train_loader)
            self.log(f'[RESULT]: Train. Epoch: {self.epoch}, summary_loss: {summary_loss.avg:.5f}, time: {(time.time() - t):.5f}')
            self.save(f'{self.base_dir}/last-checkpoint.bin')

            t = time.time()
            summary_loss = self.validation(validation_loader)
            self.log(f'[RESULT]: Val. Epoch: {self.epoch}, summary_loss: {summary_loss.avg:.5f}, time: {(time.time() - t):.5f}')
            if summary_loss.avg < self.best_summary_loss:
                self.best_summary_loss = summary_loss.avg
                self.model.eval()
                self.save(f'{self.base_dir}/best-checkpoint-{str(self.epoch).zfill(3)}epoch.bin')
                for path in sorted(glob(f'{self.base_dir}/best-checkpoint-*epoch.bin'))[:-3]:
                    os.remove(path)

            if self.config.validation_scheduler:
                self.scheduler.step(metrics=summary_loss.avg)

            self.epoch += 1

    def validation(self, val_loader):
        self.model.eval()
        summary_loss = AverageMeter()
        t = time.time()
        for step, (images, targets, image_ids) in enumerate(val_loader):
            # if self.config.verbose:
            #     if step % self.config.verbose_step == 0:
            #         print(
            #             f'Val Step {step}/{len(val_loader)}, ' + \
            #             f'summary_loss: {summary_loss.avg:.5f}, ' + \
            #             f'time: {(time.time() - t):.5f}', end='\r'
            #         )

            print(
                f'Val Step {step}/{len(val_loader)}, ' + \
                f'summary_loss: {summary_loss.avg:.5f}, ' + \
                f'time: {(time.time() - t):.5f}', end='\r'
            )
            with torch.no_grad():
                images = torch.stack(images)
                batch_size = images.shape[0]
                images = images.to(self.device).float()
                boxes = [target['boxes'].to(self.device).float() for target in targets]
                labels = [target['labels'].to(self.device).float() for target in targets]

                target_res = {}
                target_res['bbox'] = boxes
                target_res['cls'] = labels 
                target_res["img_scale"] = torch.tensor([1.0] * batch_size, dtype=torch.float).to(self.device)
                target_res["img_size"] = torch.tensor([images[0].shape[-2:]] * batch_size, dtype=torch.float).to(self.device)

                outputs = self.model(images, target_res)
                loss = outputs['loss']
                summary_loss.update(loss.detach().item(), batch_size)

        return summary_loss

    def train_one_epoch(self, train_loader):
        self.model.train()
        summary_loss = AverageMeter()
        t = time.time()
        
        #apex
        scaler = torch.cuda.amp.GradScaler()


        for step, (images, targets, image_ids) in enumerate(train_loader):
            # if self.config.verbose:
            #     if step % self.config.verbose_step == 0:
            #         print(
            #             f'Train Step {step}/{len(train_loader)}, ' + \
            #             f'summary_loss: {summary_loss.avg:.5f}, ' + \
            #             f'time: {(time.time() - t):.5f}', end='\r'
            #         )

            print(
                f'Train Step {step}/{len(train_loader)}, ' + \
                f'summary_loss: {summary_loss.avg:.5f}, ' + \
                f'time: {(time.time() - t):.5f}', end='\r'
            )
            
            images = torch.stack(images)
            images = images.to(self.device).float()
            batch_size = images.shape[0]
            
            target_res = {}

            boxes = [target['boxes'].to(self.device).float() for target in targets]
            labels = [target['labels'].to(self.device).float() for target in targets]

            target_res['bbox'] = boxes
            target_res['cls'] = labels 

            
            self.optimizer.zero_grad()

            #apex
            with torch.cuda.amp.autocast():
                outputs = self.model(images, target_res)
            
            loss = outputs['loss']

            #apex
            scaler.scale(loss).backward()

            summary_loss.update(loss.detach().item(), batch_size)

            #apex
            scaler.step(self.optimizer)

            if self.config.step_scheduler:
                self.scheduler.step()

            #apex
            scaler.update()

        return summary_loss
    
    def save(self, path):
        self.model.eval()
        torch.save({
            'model_state_dict': self.model.model.state_dict(),
            'optimizer_state_dict': self.optimizer.state_dict(),
            'scheduler_state_dict': self.scheduler.state_dict(),
            'best_summary_loss': self.best_summary_loss,
            'epoch': self.epoch,
        }, path)

    def load(self, path):
        checkpoint = torch.load(path)
        self.model.model.load_state_dict(checkpoint['model_state_dict'])
        self.optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        self.scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
        self.best_summary_loss = checkpoint['best_summary_loss']
        self.epoch = checkpoint['epoch'] + 1
        
    def log(self, message):
        if self.config.verbose:
            print(message)
        with open(self.log_path, 'a+') as logger:
            logger.write(f'{message}\n')

In [None]:
class TrainGlobalConfig:
    num_workers = 2
    batch_size = 2
    n_epochs = 1
    lr = 0.0002

    # folder where we are going to save weights
    folder = '/output/fold0'

    verbose = True
    verbose_step = 1
    
    step_scheduler = False  # do scheduler.step after optimizer.step
    validation_scheduler = True  # do scheduler.step after validation stage loss

#     SchedulerClass = torch.optim.lr_scheduler.OneCycleLR
#     scheduler_params = dict(
#         max_lr=0.001,
#         epochs=n_epochs,
#         steps_per_epoch=int(len(train_dataset) / batch_size),
#         pct_start=0.1,
#         anneal_strategy='cos', 
#         final_div_factor=10**5
#     )
    
    SchedulerClass = torch.optim.lr_scheduler.ReduceLROnPlateau
    scheduler_params = dict(
        mode='min',
        factor=0.5,
        patience=5,
        verbose=True, 
        threshold=0.0001,
        threshold_mode='abs',
        cooldown=0, 
        min_lr=1e-8,
        eps=1e-08
    )

In [None]:
def collate_fn(batch):
    return tuple(zip(*batch))

def run_training():
    device = torch.device('cuda:0')
    net.to(device)

    train_loader = torch.utils.data.DataLoader(
        train_dataset,
        batch_size=TrainGlobalConfig.batch_size,
        sampler=RandomSampler(train_dataset),
        pin_memory=False,
        drop_last=False,
        num_workers=TrainGlobalConfig.num_workers,
        collate_fn=collate_fn,
    )
    val_loader = torch.utils.data.DataLoader(
        validation_dataset, 
        batch_size=TrainGlobalConfig.batch_size,
        num_workers=TrainGlobalConfig.num_workers,
        shuffle=False,
        sampler=SequentialSampler(validation_dataset),
        pin_memory=False,
        collate_fn=collate_fn,
    )

    fitter = Fitter(model=net, device=device, config=TrainGlobalConfig)
    
    # Attention! Add this line if you want to continue training with your weights
    # fitter.load('/content/gdrive/MyDrive/datasets/sber_food/efficientdet/fold0/last-checkpoint.bin')
    fitter.fit(train_loader, val_loader)

<span style="color: #027fc1; font-family: Segoe UI; font-size: 1.9em; font-weight: 350;">Model</span>
<br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">I choose tf_efficientdet_d6 because It's max model that can fit in 16Gb GPU. If you have more GPU you can use tf_efficientdet_d7. I uploaded pretrained weights for all models.</span><br>

In [None]:
from effdet import create_model_from_config, get_efficientdet_config
device = 'cuda'
def get_net():
    config = get_efficientdet_config('tf_efficientdet_d6')

    config.image_size = [img_size,img_size]
    config.norm_kwargs=dict(eps=.001, momentum=.01)

    net = EfficientDet(config, pretrained_backbone=False)
    checkpoint = torch.load('/kaggle/input/efficientdet-init-weights/efficientdet_d6-51cb0132.pth')
    net.load_state_dict(checkpoint)

    # we have only one class - opacity
    net.reset_head(num_classes=1)
    net.class_net = HeadNet(config, num_outputs=config.num_classes)

    return DetBenchTrain(net, config)

net = get_net()
net.to(device)

In [None]:
# start training
run_training()

<span style="color: #027fc1; font-family: Segoe UI; font-size: 1.9em; font-weight: 350;">How to improve:</span>
<br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">1. Increase image size</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">2. Play with augmentation and try to add cutmix </span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">3. Choose stronger model, for example tf_efficientdet_d7</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">4. Increase batch size</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">5. Experiment with hyperparameters</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">6. Train 5 folds and make ensemble</span><br>
<br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em;">Thank you and Good luck! ðŸ˜ƒ</span><br>

<center><img border="0" alt="Ask Me Something" src="https://img.shields.io/badge/Please-Upvote%20If%20you%20like%20this-07b3c8?style=for-the-badge&logo=kaggle" width="260" height="20"></center>