Hey everybody, I wanted to share one of the approaches I got some decent results with. It's based on the [Fully Convolutional One-Stage Object Detection (FCOS) Paper](https://arxiv.org/pdf/1904.01355.pdf). What's neat about this approach is that it doesn't rely on "anchor boxes" like Yolo, EffDet, and FasterRCNN. This means there are fewer hyper parameters to tune and as you might have noticed, lb scores can be very sensitive to hyperparameter choices. 

This notebook contains all the code you should need for training and inference. I'm not doing any TTA or pseudo labeling here and I didn't spend much time tuning parameters. So there's likely a lot of room for improvement. 

I'm using the [mmdetection framework](https://github.com/open-mmlab/mmdetection) here which uses an approved liscense. It's a little different in that it's entirely config based. So, instead of digging through multiple class file to tweak settings and parameters, you just modify the config. It also lets you see everything you can potentially tweak. Mixing and matching different backbones and heads is really easy and it does a great job keeping track of past configs and training histories. 

The one downside is that it requires a lot of slow boilerplate to get running on Kaggle:

In [None]:
!pip install ../input/mmcvwhl/addict-2.2.1-py3-none-any.whl
!pip install ../input/mmdetection20-5-13/mmcv-0.5.1-cp37-cp37m-linux_x86_64.whl
!pip install ../input/mmdetection20-5-13/terminal-0.4.0-py3-none-any.whl
!pip install ../input/mmdetection20-5-13/terminaltables-3.1.0-py3-none-any.whl

In [None]:
!cp -r ../input/mmdetection20-5-13/mmdetection/mmdetection .

In [None]:
cd mmdetection

In [None]:
!cp -r ../../input/mmdetection20-5-13/cocoapi/cocoapi .

In [None]:
cd cocoapi/PythonAPI

In [None]:
!make

In [None]:
!make install

In [None]:
!python setup.py install

In [None]:
import pycocotools

In [None]:
cd ../..

In [None]:
!pip install -v -e .

In [None]:
cd ../

In [None]:
import sys
sys.path.append('mmdetection')

Huge thanks to [@superkevingit](https://www.kaggle.com/superkevingit) and [@cdeotte](https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/113195) who I borrowed the boilerplate from. 

1. Here's what the full mmdetection config looks like:

In [None]:
dataset_type = 'CocoDataset'
data_root = 'data/global_wheat'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='Resize',
        img_scale=[(1333, 640), (1333, 800)],
        multiscale_mode='value',
        keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=6,
    workers_per_gpu=6,
    train=dict(
        type='CocoDataset',
        ann_file='data/global_wheat/annotations/train_fold1.json',
        img_prefix='data/global_wheat/train',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(
                type='Albu',
                transforms=[
                    dict(type='ToFloat', max_value=255.0),
                    dict(
                        type='RandomSizedCrop',
                        min_max_height=(650, 1024),
                        height=1024,
                        width=1024,
                        p=0.5),
                    dict(
                        type='HueSaturationValue',
                        hue_shift_limit=0.68,
                        sat_shift_limit=0.68,
                        val_shift_limit=0.1,
                        p=0.75),
                    dict(
                        type='RandomBrightnessContrast',
                        brightness_limit=0.1,
                        contrast_limit=0.1,
                        p=0.33),
                    dict(type='RandomRotate90', p=0.5),
                    dict(
                        type='Cutout',
                        num_holes=20,
                        max_h_size=32,
                        max_w_size=32,
                        fill_value=0.0,
                        p=0.25),
                    dict(type='FromFloat', max_value=255.0, dtype='uint8')
                ],
                bbox_params=dict(
                    type='BboxParams',
                    format='pascal_voc',
                    label_fields=['gt_labels'],
                    min_visibility=0.0,
                    min_area=0,
                    filter_lost_elements=True),
                keymap=dict(img='image', gt_bboxes='bboxes'),
                update_pad_shape=False,
                skip_img_without_anno=False),
            dict(
                type='Resize',
                img_scale=[(1333, 640), (1333, 800)],
                multiscale_mode='value',
                keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5, direction='horizontal'),
            dict(type='RandomFlip', flip_ratio=0.5, direction='vertical'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ],
        classes=('wheat', )),
    val=dict(
        type='CocoDataset',
        ann_file='data/global_wheat/annotations/val_fold1.json',
        img_prefix='data/global_wheat/train',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        classes=('wheat', )),
    test=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(interval=1, metric='bbox')
optimizer = dict(type='AdamW', lr=0.0003)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='step',
    warmup='constant',
    warmup_iters=500,
    warmup_ratio=0.3333333333333333,
    step=[15, 25, 38])
total_epochs = 40
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
model = dict(
    type='FCOS',
    pretrained='open-mmlab://resnext101_64x4d',
    backbone=dict(
        type='ResNeXt',
        depth=101,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        groups=64,
        base_width=4),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=512,
        start_level=1,
        add_extra_convs=True,
        extra_convs_on_inputs=False,
        num_outs=5,
        relu_before_extra_convs=True),
    bbox_head=dict(
        type='FCOSHead',
        num_classes=1,
        in_channels=512,
        stacked_convs=4,
        feat_channels=512,
        strides=[8, 16, 32, 64, 128],
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='IoULoss', loss_weight=1.0),
        loss_centerness=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))
train_cfg = dict(
    assigner=dict(
        type='MaxIoUAssigner',
        pos_iou_thr=0.5,
        neg_iou_thr=0.4,
        min_pos_iou=0,
        ignore_iof_thr=-1),
    allowed_border=-1,
    pos_weight=-1,
    debug=False)
test_cfg = dict(
    nms_pre=1000,
    min_bbox_size=0,
    score_thr=0.05,
    nms=dict(type='nms', iou_thr=0.5),
    max_per_img=100)
classes = ('wheat', )
albu_train_transforms = [
    dict(type='ToFloat', max_value=255.0),
    dict(
        type='RandomSizedCrop',
        min_max_height=(650, 1024),
        height=1024,
        width=1024,
        p=0.5),
    dict(
        type='HueSaturationValue',
        hue_shift_limit=0.68,
        sat_shift_limit=0.68,
        val_shift_limit=0.1,
        p=0.75),
    dict(
        type='RandomBrightnessContrast',
        brightness_limit=0.1,
        contrast_limit=0.1,
        p=0.33),
    dict(type='RandomRotate90', p=0.5),
    dict(
        type='Cutout',
        num_holes=20,
        max_h_size=32,
        max_w_size=32,
        fill_value=0.0,
        p=0.25),
    dict(type='FromFloat', max_value=255.0, dtype='uint8')
]
work_dir = './work_dirs/fcos_wheat2'
gpu_ids = [0]


In [None]:
config_txt = """
dataset_type = 'CocoDataset'
data_root = 'data/global_wheat'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='Resize',
        img_scale=[(1333, 640), (1333, 800)],
        multiscale_mode='value',
        keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=6,
    workers_per_gpu=6,
    train=dict(
        type='CocoDataset',
        ann_file='data/global_wheat/annotations/train_fold1.json',
        img_prefix='data/global_wheat/train',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(
                type='Albu',
                transforms=[
                    dict(type='ToFloat', max_value=255.0),
                    dict(
                        type='RandomSizedCrop',
                        min_max_height=(750, 1024),
                        height=1024,
                        width=1024,
                        p=0.5),
                    dict(
                        type='HueSaturationValue',
                        hue_shift_limit=0.68,
                        sat_shift_limit=0.68,
                        val_shift_limit=0.1,
                        p=0.75),
                    dict(
                        type='RandomBrightnessContrast',
                        brightness_limit=0.1,
                        contrast_limit=0.1,
                        p=0.33),
                    dict(type='RandomRotate90', p=0.5),
                    dict(
                        type='Cutout',
                        num_holes=20,
                        max_h_size=32,
                        max_w_size=32,
                        fill_value=0.0,
                        p=0.25),
                    dict(type='FromFloat', max_value=255.0, dtype='uint8')
                ],
                bbox_params=dict(
                    type='BboxParams',
                    format='pascal_voc',
                    label_fields=['gt_labels'],
                    min_visibility=0.0,
                    min_area=0,
                    filter_lost_elements=True),
                keymap=dict(img='image', gt_bboxes='bboxes'),
                update_pad_shape=False,
                skip_img_without_anno=False),
            dict(
                type='Resize',
                img_scale=[(1333, 640), (1333, 800)],
                multiscale_mode='value',
                keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5, direction='horizontal'),
            dict(type='RandomFlip', flip_ratio=0.5, direction='vertical'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ],
        classes=('wheat', )),
    val=dict(
        type='CocoDataset',
        ann_file='data/global_wheat/annotations/val_fold1.json',
        img_prefix='data/global_wheat/train',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        classes=('wheat', )),
    test=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(interval=1, metric='bbox')
optimizer = dict(type='AdamW', lr=0.0003)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='step',
    warmup='constant',
    warmup_iters=500,
    warmup_ratio=0.3333333333333333,
    step=[15, 24, 32, 38])
total_epochs = 40
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
model = dict(
    type='FCOS',
    pretrained='open-mmlab://resnext101_64x4d',
    backbone=dict(
        type='ResNeXt',
        depth=101,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        groups=64,
        base_width=4),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=512,
        start_level=1,
        add_extra_convs=True,
        extra_convs_on_inputs=False,
        num_outs=5,
        relu_before_extra_convs=True),
    bbox_head=dict(
        type='FCOSHead',
        num_classes=1,
        in_channels=512,
        stacked_convs=4,
        feat_channels=512,
        strides=[8, 16, 32, 64, 128],
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='IoULoss', loss_weight=1.0),
        loss_centerness=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))
train_cfg = dict(
    assigner=dict(
        type='MaxIoUAssigner',
        pos_iou_thr=0.7,
        neg_iou_thr=0.5,
        min_pos_iou=0,
        ignore_iof_thr=-1),
    allowed_border=-1,
    pos_weight=-1,
    debug=False)
test_cfg = dict(
    nms_pre=1000,
    min_bbox_size=0,
    score_thr=0.05,
    nms=dict(type='nms', iou_thr=0.45),
    max_per_img=200)
classes = ('wheat', )
albu_train_transforms = [
    dict(type='ToFloat', max_value=255.0),
    dict(
        type='RandomSizedCrop',
        min_max_height=(750, 1024),
        height=1024,
        width=1024,
        p=0.5),
    dict(
        type='HueSaturationValue',
        hue_shift_limit=0.68,
        sat_shift_limit=0.68,
        val_shift_limit=0.1,
        p=0.75),
    dict(
        type='RandomBrightnessContrast',
        brightness_limit=0.1,
        contrast_limit=0.1,
        p=0.33),
    dict(type='RandomRotate90', p=0.5),
    dict(
        type='Cutout',
        num_holes=20,
        max_h_size=32,
        max_w_size=32,
        fill_value=0.0,
        p=0.25),
    dict(type='FromFloat', max_value=255.0, dtype='uint8')
]
work_dir = './work_dirs/fcos_iou'
gpu_ids = [0]
"""
config_file = open("/kaggle/working/mmdetection/config.py", "w")
n = config_file.write(config_txt)
config_file.close()

In [None]:
def format_prediction_string(boxes, scores):
    pred_strings = []
    for j in zip(scores, boxes):
        pred_strings.append("{0:.4f} {1} {2} {3} {4}".format(j[0], j[1][0], j[1][1], j[1][2], j[1][3]))
    return " ".join(pred_strings)


In a non-kaggle environment the configs support inheritance so you don't need to fully specify every detail

Using the model specified above and the weights I trained on my local machine, we'll make predictions our predictions below. Most of the code should look pretty familiar.

In [None]:
from mmdet.apis import init_detector, inference_detector
import pandas as pd
import numpy as np

checkpoint_path = '../input/resnest3fcos1iouatseven/epoch_40.pth'
config_path = '/kaggle/working/mmdetection/config.py'

model = init_detector(config_path, checkpoint_path, device='cuda:0')

val_df = pd.read_csv('../input/global-wheat-detection/sample_submission.csv')
all_image_ids = set(val_df['image_id'].unique())
pred_threshold = 0.25

pred_results = []
for image_id in all_image_ids:
    img = '../input/global-wheat-detection/test/' + image_id + '.jpg'
    result = inference_detector(model, img)
    
    boxes = result[0][:, 0:4]
    scores = result[0][:, 4]

    boxes[:, 2] = boxes[:, 2] - boxes[:, 0]
    boxes[:, 3] = boxes[:, 3] - boxes[:, 1]
    
    boxes = boxes[scores >= pred_threshold].astype(np.int32)
    scores = scores[scores >= pred_threshold]

    result = {
        'image_id': image_id,
        'PredictionString': format_prediction_string(boxes, scores)
    }

    pred_results.append(result)
    

test_df = pd.DataFrame(pred_results, columns=['image_id', 'PredictionString'])
test_df.to_csv('submission.csv', index=False)

Finally, we can't submit if we leave any files other than our submission in the working dir, so we'll clean up some of the boilerplate we copied over earlier:

In [None]:
!rm -rf mmdetection/

Thanks for reading! Let me know in the comments if you have any questions.

-Ryan