# Frameworks for Object Detection

Implementing models for object detection from scratch is a bit complicated: we need to start with building a [Feature Pyramid Networks](https://arxiv.org/abs/1612.03144) combined with [Region Proposal Networks](https://medium.com/@nabil.madali/demystifying-region-proposal-network-rpn-faa5a8fb8fce) if we want to use region proposal algorithms such as Faster R-CNN. To understand these models it is important to build it at least once from scratch. Nevertheless, object detection frameworks come to the rescue if we want to train a model with a custom dataset without dwelling too much on the specific implementation.

Here you can find an incomplete list of popular object detection frameworks:

- Detectron2 -> [repository](https://github.com/facebookresearch/detectron2)
- MMDetection -> [paper](https://arxiv.org/pdf/1906.07155.pdf) -> [repository](https://github.com/open-mmlab/mmdetection) -> [tutorial](https://medium.com/@ravisingh93362/object-detection-using-mmdetection-c7f0eb26a2c9)
- SimpleDet -> [paper](https://arxiv.org/pdf/1903.05831.pdf) -> [repository](https://github.com/TuSimple/simpledet)
- TorchVision -> [repository](https://github.com/pytorch/vision)
- Tensorflow Object Detection -> [repository](https://github.com/tensorflow/models/tree/master/research/object_detection)

Such frameworks usually lag behind the state of the art, as developers need to integrate the most recent algorithms into the framework. If you want to use a very recent implementation you should consider repositories that usually being published together with the papers. For the most recent state of the art models and their implementations consider visiting [Papers with Code](https://paperswithcode.com/task/object-detection).
- YOLOv5 -> [repository](https://github.com/ultralytics/yolov5)
- EfficientDet [paper](https://arxiv.org/pdf/1911.09070.pdf) [repository](https://github.com/rwightman/efficientdet-pytorch)

# Detectron2 for Object Detection


In this notebook, we are going to use [Detectron2](https://github.com/facebookresearch/detectron2) framework from Facebook Artificial Intelligence Research (FAIR) to train a deep neural network model for object detection. Of course, you can use any implementation of object detection algorithms out there. However, there are several benefits associated with using such a framework:
 - A variety of pre-trained models available in [the model zoo](https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md).
 - Unified configuration file for specifying models 
 - Standardized way of handling labels and annotations
 - Efficient implementation based on PyTorch deep learning library
 - Extensible framework, that allows you to build new models without rewriting tons of boilerplate code, such as handling datasets, statistics collection, etc.

## High Level Structure of Detectron2

Bellow, you can find a short overview of the directory structure of the Detectron2:


<a><img align="center" src="https://docs.google.com/uc?export=download&id=1fhIQ0vIVNn0H_8I0DYeEi9B7-g7fxEm5" width="500"></a></a>



The framework was developed to facilitate complete object detection pipeline. There are multiple ways to use Detectron2, we list three of them with increasing complexity:

1. Run inference on existing pre-trained models. Check [Detectron2 Begnner's Tutorial](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5) on colab.
2. Configure and train existing models. This step is the focus of this notebook. We are going to answer the following questions: 
    - How to change configuration file?
    - How to load pre-trained weights?
    - How to adjust cusotm data and annotations to fit Detectron2 format?
    - How to save the model?
    - How to run inference?
    - How to visualize results?
3. Use Detectron2 framework for building your own models and architectures. Check [Projects](https://github.com/facebookresearch/detectron2/tree/master/projects) folder.

## Install and load neccessary packages.

**Important Note**: This particular competition does not allow the use of the internet in the notebook for the submission to be valid! Therefore **do not** activate internet in the notebook settings. Bellow we present an installation process that is specific for this competition: we have added all important packages and model weights as a datasets (top right corner `Add data` button). In case you want to install Detectron2 locally, you can follow the official [installation guide](https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md).

In [None]:
!pip install '/kaggle/input/torch-15/torch-1.5.0cu101-cp37-cp37m-linux_x86_64.whl'
!pip install '/kaggle/input/torch-15/torchvision-0.6.0cu101-cp37-cp37m-linux_x86_64.whl'
!pip install '/kaggle/input/torch-15/yacs-0.1.7-py3-none-any.whl'
!pip install '/kaggle/input/torch-15/fvcore-0.1.1.post200513-py3-none-any.whl'
!pip install '/kaggle/input/pycocotools/pycocotools-2.0-cp37-cp37m-linux_x86_64.whl'
!pip install '/kaggle/input/detectron2/detectron2-0.1.3cu101-cp37-cp37m-linux_x86_64.whl'

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from tqdm.notebook import tqdm

import gc
import os
import copy
from glob import glob

import cv2
from PIL import Image

import random

from collections import deque, defaultdict
from multiprocessing import Pool, Process
from functools import partial

import torch
import torch.nn as nn

import pycocotools
import detectron2
from detectron2.config import get_cfg
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor, DefaultTrainer
from detectron2.utils.visualizer import Visualizer, ColorMode
from detectron2.structures import BoxMode
from detectron2.data import datasets, DatasetCatalog, MetadataCatalog, build_detection_train_loader, build_detection_test_loader
from detectron2.data import transforms as T
from detectron2.data import detection_utils as utils
from detectron2.evaluation import COCOEvaluator, verify_results
from detectron2.modeling import GeneralizedRCNNWithTTA
from detectron2.data.transforms import TransformGen
from detectron2.utils.logger import setup_logger
setup_logger()

from fvcore.transforms.transform import TransformList, Transform, NoOpTransform
from contextlib import contextmanager

# Main Configuration

To use different your models, you would need to go to the [model zoo](https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md) select your configuration file and download the model weights. You can upload model weights by using `+Add data` in the top right corner. Afterward, you only need to specify a path to configuration and the weights by adding a new entry to the dictionary below. We have added four models for you to try.

In [None]:
MAIN_PATH = '/kaggle/input/global-wheat-detection'
TRAIN_IMAGE_PATH = os.path.join(MAIN_PATH, 'train/')
TEST_IMAGE_PATH = os.path.join(MAIN_PATH, 'test/')
TRAIN_PATH = os.path.join(MAIN_PATH, 'train.csv')
SUB_PATH = os.path.join(MAIN_PATH, 'sample_submission.csv')

models = {
    'rpn_r50_fpn': {
        'model_path': 'COCO-Detection/rpn_R_50_FPN_1x.yaml',
        'weights_path': '/kaggle/input/rpn-r-50-fpn-1x/model_final_02ce48.pkl'
    },
    'faster_rcnn_50': {
        'model_path': 'COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml',
        'weights_path': '/kaggle/input/faster-r-cnn-r50-fpn-x3/model_final_280758.pkl'
    },
    'faster_rcnn_101': {
        'model_path': 'COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml',
        'weights_path': '/kaggle/input/detectron2-faster-rcnn-101/model_final_f6e8b1.pkl'
    },
    'retinanet': {
        'model_path': 'COCO-Detection/retinanet_R_101_FPN_3x.yaml',
        'weights_path': '/kaggle/input/detectron2-faster-rcnn-101/model_final_971ab9.pkl'
    }
}

# Uncomment the model you would like to use
# MODEL_USE = 'rpn_r50_fpn'
# MODEL_USE = 'faster_rcnn_50'
# MODEL_USE = 'faster_rcnn_101'
MODEL_USE = 'retinanet'

# Load the model config and its pre-trained weights
MODEL_PATH = models[MODEL_USE]['model_path']
WEIGHT_PATH = models[MODEL_USE]['weights_path']

The models and datasets can be fully configured with a configuration file. [Here](https://github.com/facebookresearch/detectron2/tree/master/configs) you can find the list of all base configuration files. The folders in this directory corresponds to specific models pre-trained on a particular dataset. If we have a look at one of them more closely we will see that it still inherits from the `_BASE_: "../Base-RCNN-FPN.yaml"` configuration.

<a><img align="center" src="https://docs.google.com/uc?export=download&id=1AzebzK-YBwSmhfxl05YId_HbzYycFNHf" width="600"></a></a>


Everytime you are working with a detectron2 you are going to override an already existing configuration. This is what makes detectron2 easy to use: you never start from scratch and most of the configurations are fully working out of the box.

#### Some context for the configuration files
You can read more about different configurations on the [model zoo](https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md#common-settings-for-coco-models) page.

For Faster/Mask R-CNN, Detectron provides baselines based on 3 different backbone combinations:
- FPN: Use a ResNet+FPN backbone with standard conv and FC heads for mask and box prediction, respectively. It obtains the best speed/accuracy tradeoff, but the other two are still useful for research.
- C4: Use a ResNet conv4 backbone with conv5 head. The original baseline in the [Faster R-CNN paper](https://arxiv.org/abs/1506.01497).
- DC5 (Dilated-C5): Use a ResNet conv5 backbone with dilations in conv5, and standard conv and FC heads for mask and box prediction, respectively. This is used by the [Deformable ConvNet paper](https://arxiv.org/abs/1703.06211).



## Configuration file.
Configuration file consists of several main components. Lets load a configuration file and look at the most important ones. You can find a more detailed explanation [here](https://detectron2.readthedocs.io/modules/config.html).


In [None]:
# Loads the default configuration setup
cfg = get_cfg()
cfg.keys()

- 'VERSION' - indicates the version of the Detectron
- 'OUTPUT_DIR' - contains folder where output of the training is going to be saved
- 'SEED' - random seed, set to negative to fully randomize everything
- 'CUDNN_BENCHMARK' - if true cudnn algorithms are going to be benchmarked
- 'VIS_PERIOD' - the period (in terms of steps) for minibatch visualization at train time. Set to 0 to disable.

### 'MODEL'
This part of the configuration file is responsible for setting the model parameters. Each key in this dictionary corresponds to a different components of the model. Not all of the components are used at the same time, however they all need to be present in the configuration for backward compatibility.

In [None]:
cfg['MODEL'].keys()

#### 'RESNETS' - Backbone ResNet settings

In [None]:
from pprint import pprint
pprint(cfg['MODEL']['RESNETS'])

- 'NUM_GROUPS' - Number of groups to use; 1 ==> [ResNet](https://arxiv.org/abs/1512.03385); > 1 ==> [ResNeXt](https://arxiv.org/abs/1611.05431)
- 'NORM' - Options: FrozenBN, GN, "SyncBN", "BN"
- 'WIDTH_PER_GROUP' - Scaling this parameters will scale the width of all bottleneck layers.
- 'STRIDE_IN_1X1' - Place the stride 2 conv on the 1x1 filter. Use True only for the original MSRA ResNet; use False for C2 and Torch models
- 'RES2_OUT_CHANNELS' - Output width of res2. Scaling this parameters will scale the width of all 1x1 convs in ResNet

#### 'RPN'- Region Proposal Nework settings

In [None]:
pprint(cfg['MODEL']['RPN'])

#### 'ROI_HEADS' - Region of Interest Heads

In [None]:
pprint(cfg['MODEL']['ROI_HEADS'])

#### 'RETINANET' - [RetinaNet](https://arxiv.org/abs/1708.02002)

In [None]:
pprint(cfg['MODEL']['RETINANET'])

## 'DATASETS'
In this part of the configuration we are going to indicate what dataset to use for training and testing.

In [None]:
pprint(cfg['DATASETS'])

## 'SOLVER'
Solver constains configurations for the learning rate, number of iterations and other Gradient Descent settings.

In [None]:
pprint(cfg['SOLVER'])

- 'BASE_LR' - base learning rate
- 'IMS_PER_BATCH' - Number of images per batch across all machines. If we have 16 GPUs and IMS_PER_BATCH = 32, each GPU will see 2 images per batch.

# Data

In [None]:
train_img = glob(f'{TRAIN_IMAGE_PATH}/*.jpg')
test_img = glob(f'{TEST_IMAGE_PATH}/*.jpg')

print('Number of train images: {}, test images: {}'.format(len(train_img), len(test_img)))

In [None]:
train_df = pd.read_csv(TRAIN_PATH)
train_df.head()

## Display Images with Bounding Boxes

In [None]:
def display_images(df, folder, num_img=1, bb_color=(0, 50, 255)):
    
    fig, axs = plt.subplots(nrows=1, ncols=num_img,figsize=(15, 15), squeeze=False)
    axs = axs.flatten()
    for i, ax in enumerate(axs):
        # randomly pick an image
        img_random = random.choice(df['image_id'].unique())
        assert (img_random + '.jpg') in os.listdir(folder)
        
        img_df = df[df['image_id']==img_random]
        img_df.reset_index(drop=True, inplace=True)
        
        img = cv2.imread(os.path.join(folder, img_random + '.jpg'))
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
                   
        for row in range(len(img_df)):
            source = img_df.loc[row, 'source']
            box = img_df.loc[row, 'bbox'][1:-1]
            box = list(map(float, box.split(', ')))
            x, y, w, h = list(map(int, box))
            cv2.rectangle(img, (x, y), (x+w, y+h), bb_color, 2)
                
        ax.set_title(f'{img_random} have {len(img_df)} bounding boxes')
        ax.imshow(img)   
        
    plt.show()        
    plt.tight_layout()
    
display_images(train_df, TRAIN_IMAGE_PATH, 1)

## Image sources
In the dataset we see that there is a number of sources.
Lest visualize the distribution of the sources.

In [None]:
def display_feature(df, feature):
    
    plt.figure(figsize=(15,8))
    ax = sns.countplot(y=feature, data=df, order=df[feature].value_counts().index)

    for p in ax.patches:
        ax.annotate('{:.2f}%'.format(100*p.get_width()/df.shape[0]), (p.get_x() + p.get_width() + 0.02, p.get_y() + p.get_height()/2))

    plt.title(f'Distribution of {feature}', size=25, color='b')    
    plt.show()

list_source = train_df['source'].unique().tolist()
print(list_source)
display_feature(train_df, 'source')

### Number of bounding boxes per image

In [None]:
num_box = train_df.groupby('image_id')['bbox'].count().reset_index().add_prefix('Number_').sort_values('Number_bbox', ascending=False)
num_box.head()
print('Average number of bounding boxes per image: {:.2f}'.format(num_box.mean().values[0]))

## Check if the number of images in the training file is the same as in the training folder.

In [None]:
image_unique = train_df['image_id'].unique()
image_unique_in_train_path = [i for i in image_unique if i + '.jpg' in os.listdir(TRAIN_IMAGE_PATH)]

print(f'Number unique images: {len(image_unique)}, in train path: {len(image_unique_in_train_path)}')

del image_unique, image_unique_in_train_path
_ = gc.collect()

# Example Submission
We will use an example submission to make our submission by overriding the `PredicitionString` column.
Moreover we are going to use it as input in our testing dataset.

In [None]:
sub_df = pd.read_csv(SUB_PATH)
sub_df.tail()

# Building a Dataset

In order to use our dataset in a Detectron2 framwork we would need to preprocess the images and its annotations.

Currently, each line in our `train_df` corresponds to a bounding box. This bounding box is associated with an image. First thing that we need to do is to represent our data as a list of dictionaries. Every element in that list will correspond to an image that has multiple attributes. Lets have a closer look.

## Dataset Dictionary

Kaggle provides data in this format:

In [None]:
train_df.head(5)

We would need to preprocess these data so that for every image we have the following entry:
```python
{'file_name': '/kaggle/input/global-wheat-detection/train/b6ab77fd7.jpg',
 'image_id': 0,
 'height': 1024,
 'width': 1024,
 'annotations': [
     {'bbox': (834, 222, 890, 258),
      'bbox_mode': <BoxMode.XYXY_ABS: 0>,
      'category_id': 0},
     {'bbox': (226, 549, 356, 606),
      'bbox_mode': <BoxMode.XYXY_ABS: 0>,
      'category_id': 0},
     ...
 ]
    ```
Each entry in our dataset should be a unique image with properties `file_name`, `image_id`, `height`, `width`. The `annotations` property should contain a list of all bounding box annotations.
Each annotations is characterised by a bounding box and a category. Next we are going to write couple of functions that will allow us to transform our annotations to a suitable format.

In [None]:
def wheat_dataset(df, folder, is_train):
    unique_img_names = df["image_id"].unique().tolist()  # Take unique image names
    df_group = df.groupby("image_id")  # Group the training by the image ids
    dataset_dicts = []

    for img_id, img_name in enumerate(tqdm(unique_img_names)):
        img_group = df_group.get_group(img_name)  # Take all annotations for an image
        img_path = os.path.join(folder, img_name + ".jpg")  # Create path for an image
        if (
            is_train
        ):  # Using training set, where we have multiple bounding boxes per image

            record = dict()  # Create a record dictionary

            # Add image properties to the record
            record["file_name"] = img_path
            record["image_id"] = img_id
            record["height"] = int(img_group["height"].values[0])
            record["width"] = int(img_group["width"].values[0])

            annots = []  # Create annotation list
            for _, ant in img_group.iterrows():
                # bounding box is a string, so remove square brackets
                box = ant.bbox[1:-1]
                # Split the bbox values and convert it to float
                box = list(map(float, box.split(", ")))
                # Convert to int
                x, y, w, h = list(map(int, box))

                # Create annotation dictionary
                annot = {
                    "bbox": (
                        x,
                        y,
                        x + w,
                        y + h,
                    ),  # change to XYXY format. Original was in XYWH
                    "bbox_mode": BoxMode.XYXY_ABS,
                    "category_id": 0,  # only one category is present in this dataset
                }

                # Append each annotation to the list of annotation
                annots.append(dict(annot))

            record["annotations"] = list(annots)

        else:  # Using submission file, where each line is an image

            img = cv2.imread(img_path)
            h, w = img.shape[:2]

            record = dict()
            record["file_name"] = img_path
            record["image_id"] = img_id
            record["height"] = int(h)
            record["width"] = int(w)

        dataset_dicts.append(record)

    return dataset_dicts


In [None]:
%%time
img_uniques = list(zip(range(train_df['image_id'].nunique()), train_df['image_id'].unique()))
dataset_dicts = wheat_dataset(train_df, TRAIN_IMAGE_PATH, True)

In [None]:
from pprint import pprint
# display the first entry to check for errors
print('Dataset:')
pprint(dataset_dicts[0], depth=1)
print('Annotation:')
pprint(dataset_dicts[0]['annotations'][0])

### [CutMix](https://arxiv.org/pdf/1905.04899.pdf) Data Augmentation

Here we define a CutMix data augmentation class. You can use this as a reference for implementing custom data augmentations.

In [None]:
class CutMix(Transform):
    def __init__(self, box_size=50, prob_cutmix=0.5):
        super().__init__()

        self.box_size = box_size
        self.prob_cutmix = prob_cutmix

    def apply_image(self, img):

        if random.random() < self.prob_cutmix:

            h, w = img.shape[:2]
            num_rand = np.random.randint(10, 20)
            for num_cut in range(num_rand):
                x_rand = random.randint(0, w - self.box_size)
                y_rand = random.randint(0, h - self.box_size)
                img[x_rand: x_rand + self.box_size, y_rand: y_rand + self.box_size, :] = 0

        return np.asarray(img)

    def apply_coords(self, coords):
        return coords.astype(np.float32)

In [None]:
# visualize cutmix augmentation
img = CutMix(prob_cutmix=1.).apply_image(cv2.imread(dataset_dicts[0]['file_name']))
plt.figure(figsize=(10, 10))
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.show()

## Dataset Mapper
Now we would need to implement a class that will load out images and annotations in compatible with Detectron2 format. We have already did half of the work by transoforming our data in the dictionary format. An example of modified [DataSetMapper](https://detectron2.readthedocs.io/_modules/detectron2/data/dataset_mapper.html).

In [None]:
class DatasetMapper:
    """
    A callable which takes a dataset dict in Detectron2 Dataset format,
    and map it into a format used by the model.

    This is the default callable to be used to map your dataset dict into training data.
    You may need to follow it to implement your own one for customized logic,
    such as a different way to read or transform images.
    See :doc:`/tutorials/data_loading` for details.

    The callable currently does the following:

    1. Read the image from "file_name"
    2. Applies cropping/geometric transforms to the image and annotations
    3. Prepare data and annotations to Tensor and :class:`Instances`
    """

    def __init__(self, cfg, is_train=True):
        if cfg.INPUT.CROP.ENABLED and is_train:
            self.crop_gen = T.RandomCrop(cfg.INPUT.CROP.TYPE, cfg.INPUT.CROP.SIZE)
            logging.getLogger(__name__).info(
                "CropGen used in training: " + str(self.crop_gen)
            )
        else:
            self.crop_gen = None

        #       Data Augmentations
        self.tfm_gens = [
            T.RandomBrightness(0.1, 1.6),
            T.RandomContrast(0.1, 3),
            T.RandomSaturation(0.1, 2),
            T.RandomRotation(angle=[90, 90]),
            T.RandomFlip(prob=0.4, horizontal=False, vertical=True),
            T.RandomCrop("relative_range", (0.4, 0.6)),
            CutMix(), # custom augmentation!
        ]

        self.img_format = cfg.INPUT.FORMAT
        self.mask_on = cfg.MODEL.MASK_ON
        self.mask_format = cfg.INPUT.MASK_FORMAT
        self.keypoint_on = cfg.MODEL.KEYPOINT_ON
        self.load_proposals = cfg.MODEL.LOAD_PROPOSALS

        if self.keypoint_on and is_train:
            # Flip only makes sense in training
            self.keypoint_hflip_indices = utils.create_keypoint_hflip_indices(
                cfg.DATASETS.TRAIN
            )
        else:
            self.keypoint_hflip_indices = None

        if self.load_proposals:
            self.min_box_side_len = cfg.MODEL.PROPOSAL_GENERATOR.MIN_SIZE
            self.proposal_topk = (
                cfg.DATASETS.PRECOMPUTED_PROPOSAL_TOPK_TRAIN
                if is_train
                else cfg.DATASETS.PRECOMPUTED_PROPOSAL_TOPK_TEST
            )
        self.is_train = is_train

    def __call__(self, dataset_dict):
        """
        Args:
            dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format.

        Returns:
            dict: a format that builtin models in detectron2 accept
        """
        dataset_dict = copy.deepcopy(dataset_dict)  # it will be modified by code below
        # USER: Write your own image loading if it's not from a file
        image = utils.read_image(dataset_dict["file_name"], format=self.img_format)
        utils.check_image_size(dataset_dict, image)

        if "annotations" not in dataset_dict:
            image, transforms = T.apply_transform_gens(
                ([self.crop_gen] if self.crop_gen else []) + self.tfm_gens, image
            )
        else:
            # Crop around an instance if there are instances in the image.
            # USER: Remove if you don't use cropping
            if self.crop_gen:
                crop_tfm = utils.gen_crop_transform_with_instance(
                    self.crop_gen.get_crop_size(image.shape[:2]),
                    image.shape[:2],
                    np.random.choice(dataset_dict["annotations"]),
                )
                image = crop_tfm.apply_image(image)
            image, transforms = T.apply_transform_gens(self.tfm_gens, image)
            if self.crop_gen:
                transforms = crop_tfm + transforms

        image_shape = image.shape[:2]  # h, w

        # Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
        # but not efficient on large generic data structures due to the use of pickle & mp.Queue.
        # Therefore it's important to use torch.Tensor.
        dataset_dict["image"] = torch.as_tensor(
            np.ascontiguousarray(image.transpose(2, 0, 1))
        )

        if not self.is_train:
            # USER: Modify this if you want to keep them for some reason.
            dataset_dict.pop("annotations", None)
            dataset_dict.pop("sem_seg_file_name", None)
            return dataset_dict

        if "annotations" in dataset_dict:
            # USER: Modify this if you want to keep them for some reason.
            for anno in dataset_dict["annotations"]:
                if not self.mask_on:
                    anno.pop("segmentation", None)
                if not self.keypoint_on:
                    anno.pop("keypoints", None)

            # USER: Implement additional transformations if you have other types of data
            annos = [
                utils.transform_instance_annotations(
                    obj,
                    transforms,
                    image_shape,
                    keypoint_hflip_indices=self.keypoint_hflip_indices,
                )
                for obj in dataset_dict.pop("annotations")
                if obj.get("iscrowd", 0) == 0
            ]
            instances = utils.annotations_to_instances(
                annos, image_shape, mask_format=self.mask_format
            )
            # Create a tight bounding box from masks, useful when image is cropped
            if self.crop_gen and instances.has("gt_masks"):
                instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
            dataset_dict["instances"] = utils.filter_empty_instances(instances)

        return dataset_dict


Once we have defined our DatasetMapper we need to register it. Each time your register a dataset a unique name should be provided.

In [None]:
data_set_prefix = 'wheat'
for d in ["train", "test"]:
    DatasetCatalog.register(
        f"{data_set_prefix}_{d}",
        lambda d=d: wheat_dataset(
            train_df if d == "train" else sub_df,
            TRAIN_IMAGE_PATH if d == "train" else TEST_IMAGE_PATH,
            True if d == "train" else False,
        ),
    )

micro_metadata = MetadataCatalog.get("wheat_train")

Lets visualize our dataset.

In [None]:
%%time
def visualize_training_set(dataset, n_sampler=1):
    for sample in random.sample(dataset, n_sampler):
        img = cv2.imread(sample['file_name'])
        v = Visualizer(img[:, :, ::-1], metadata=micro_metadata, scale=0.5)
        v = v.draw_dataset_dict(sample)
        plt.figure(figsize = (14, 10))
        plt.imshow(cv2.cvtColor(v.get_image()[:, :, ::-1], cv2.COLOR_BGR2RGB))
        plt.show()
        
# dataset_dicts = wheat_dataset(train_df, TRAIN_IMAGE_PATH, True) # we have already performed this action before        
visualize_training_set(dataset_dicts)

# Modeling

We have already reviewed the structure of the configuration file. Now you we will modify the configuration file.
1. We start with default configuration `get_cfg()`
2. We override some fields of a default config with a config file provided in the [Model Zoo](https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md)
3. We further modify the configuration (if needed) and specify train and test dataset.

Lets look at the pre-defined configuration we are using in this example.

In [None]:
from yaml import load, FullLoader
from pprint import pprint
pprint(load(open(model_zoo.get_config_file(MODEL_PATH)), Loader=FullLoader))

### Create the Model Using Configuration File

## Configuring Model for Inference

In [None]:
def cfg_test():
    cfg = get_cfg()
    cfg.merge_from_file(model_zoo.get_config_file(MODEL_PATH))
#     cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, 'model_final.pth')
    # Here you could specify a custom path to the saved model
    cfg.MODEL.WEIGHTS = '../input/wheat-detection-retinanet-faster-rcnn-101/model_final.pth'


    cfg.DATASETS.TEST = ('wheat_test',)
    cfg.MODEL.RETINANET.NUM_CLASSES = 1
    cfg.MODEL.RETINANET.SCORE_THRESH_TEST = 0.45
    
    return cfg

cfg = cfg_test()
predict = DefaultPredictor(cfg)

### Visualize predictions

In [None]:
%%time

def visual_predict(dataset):
    for sample in dataset:
        img = cv2.imread(sample['file_name'])
        output = predict(img)
        
        v = Visualizer(img[:, :, ::-1], metadata=micro_metadata, scale=0.5)
        v = v.draw_instance_predictions(output['instances'].to('cpu'))
        plt.figure(figsize = (14, 10))
        plt.imshow(cv2.cvtColor(v.get_image()[:, :, ::-1], cv2.COLOR_BGR2RGB))
        plt.show()

test_dataset = wheat_dataset(sub_df, TEST_IMAGE_PATH, False)
visual_predict(test_dataset)

# Submit!

In [None]:
def submit():
    for idx, row in tqdm(sub_df.iterrows(), total=len(sub_df)):
        img_path = os.path.join(TEST_IMAGE_PATH, row.image_id+'.jpg')
        img = cv2.imread(img_path)
        outputs = predict(img)['instances']
        boxes = [i.cpu().detach().numpy() for i in outputs.pred_boxes]
        scores = outputs.scores.cpu().detach().numpy()
        list_str = []
        for box, score in zip(boxes, scores):
            box[3] -= box[1]
            box[2] -= box[0]
            box = list(map(int, box))
            score = round(score, 4)
            list_str.append(score) 
            list_str.extend(box)
        sub_df.loc[idx, 'PredictionString'] = ' '.join(map(str, list_str))
    
    return sub_df

sub_df = submit()    
sub_df.to_csv('submission.csv', index=False)
sub_df