<a href="https://colab.research.google.com/github/ZwwWayne/mmdetection/blob/add-colab-tutorial/demo/MMDet_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MMDetection Tutorial

Welcome to MMDetection! This is the official colab tutorial for using MMDetection. In this tutorial, you will learn
- Perform inference with a MMDet detector.
- Train a new detector with a new dataset.

Let's start!


## Install MMDetection

In [1]:
!nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243


In [2]:
# install dependencies: (use cu101 because colab has CUDA 10.1)
!pip install -U torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
!pip install cython mmcv==0.6.2
# OpenMMLab maintains a fork of cocoapi
!pip install -U 'git+https://github.com/open-mmlab/cocoapi.git#subdirectory=pycocotools'
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())
!gcc --version

Looking in links: https://download.pytorch.org/whl/torch_stable.html
Requirement already up-to-date: torch==1.5.1+cu101 in /usr/local/lib/python3.6/dist-packages (1.5.1+cu101)
Requirement already up-to-date: torchvision==0.6.1+cu101 in /usr/local/lib/python3.6/dist-packages (0.6.1+cu101)
Collecting git+https://github.com/open-mmlab/cocoapi.git#subdirectory=pycocotools
  Cloning https://github.com/open-mmlab/cocoapi.git to /tmp/pip-req-build-bquc_2pr
  Running command git clone -q https://github.com/open-mmlab/cocoapi.git /tmp/pip-req-build-bquc_2pr
Building wheels for collected packages: pycocotools
  Building wheel for pycocotools (setup.py) ... [?25l[?25hdone
  Created wheel for pycocotools: filename=pycocotools-12.0-cp36-cp36m-linux_x86_64.whl size=267287 sha256=918b50f3bacd417fcd9bcb51175ed4f87d51ef3280f01934630a31b17f7b8adc
  Stored in directory: /tmp/pip-ephem-wheel-cache-5p4gfor6/wheels/cd/f6/de/018ccc2d175046c612e93b42a169cd1ab7563d61581cfba8df
Successfully built pycocotools


In [None]:
# Try to install from pip package when it is available
!rm -rf mmdetection
!git clone https://github.com/open-mmlab/mmdetection.git
%cd mmdetection
!pip install -e .

Cloning into 'mmdetection'...
remote: Enumerating objects: 37, done.[K
remote: Counting objects:   2% (1/37)[Kremote: Counting objects:   5% (2/37)[Kremote: Counting objects:   8% (3/37)[Kremote: Counting objects:  10% (4/37)[Kremote: Counting objects:  13% (5/37)[Kremote: Counting objects:  16% (6/37)[Kremote: Counting objects:  18% (7/37)[Kremote: Counting objects:  21% (8/37)[Kremote: Counting objects:  24% (9/37)[Kremote: Counting objects:  27% (10/37)[Kremote: Counting objects:  29% (11/37)[Kremote: Counting objects:  32% (12/37)[Kremote: Counting objects:  35% (13/37)[Kremote: Counting objects:  37% (14/37)[Kremote: Counting objects:  40% (15/37)[Kremote: Counting objects:  43% (16/37)[Kremote: Counting objects:  45% (17/37)[Kremote: Counting objects:  48% (18/37)[Kremote: Counting objects:  51% (19/37)[Kremote: Counting objects:  54% (20/37)[Kremote: Counting objects:  56% (21/37)[Kremote: Counting objects:  59% (22/37)[Kremote: Count

## Perform inference with a MMDet detector
MMDetection already provide high level APIs to do inference and training.

In [None]:
from mmdet.apis import inference_detector, init_detector, show_result_pyplot

In [None]:
config = 'configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco.py'
checkpoint = 'https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'
model = init_detector(config, checkpoint, device='cuda:0')

In [None]:
img = 'demo/demo.jpg'
result = inference_detector(model, img)

In [None]:
show_result_pyplot(model, img, result, score_thr=0.3)

## Train a detector on customized dataset

To train a new detector, there are usually three things to do:
1. Support a new dataset
2. Modify the config
3. Train a new detector



### Support a new dataset

There are three ways to support a new dataset in MMDetection: 1. reorganize the dataset into COCO format, 2. reorganize the dataset into a middle format, 3. implement a new dataset. Usually we recommend to use the first two methods which are usually easier than the third.

In this tutorial, we gives an example that converting the data into middle format. Other methods and more advanced usages can be found in the [doc](https://mmdetection.readthedocs.io/en/latest/tutorials/new_dataset.html#).

Firstly, let's download a tiny dataset obtained from [KITTI](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d). We select the first 75 images and their annotations from the 3D object detection dataset (it is the same dataset as the 2D object detection dataset but has 3D annotations. We convert the original images from PNG to JPEG format with 80% quality to reduce the size of dataset.

In [None]:
# download, decompress the data
!wget https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmdetection/data/kitti_tiny.zip
!unzip kitti_tiny.zip > /dev/null
!ls kitti_tiny

After downloading the data, we need to implement a function to convert the kitti annotation format into the middle format.



In [None]:
import copy
import os.path as osp

import mmcv
import numpy as np

from mmdet.apis import (inference_detector, set_random_seed,
                        show_result_pyplot, train_detector)
from mmdet.models import build_detector
from mmdet.utils import get_root_logger


def load_kitti2middle(data_root,
                      list_path,
                      cat2label,
                      img_prefix='training/image_2'):
    # load image list from file
    image_list = mmcv.list_from_file(osp.join(data_root, list_path))

    data_infos = []
    # convert annotations to middle format
    for image_id in image_list:
        filename = f'{img_prefix}/{image_id}.png'
        image = mmcv.imread(filename)
        height, width = image.shape[:2]

        data_info = dict(filename=filename, width=width, height=height)

        # load annotations
        label_prefix = img_prefix.replace('image_2', 'label_2')
        lines = mmcv.list_from_file(osp.join(label_prefix, f'{image_id}.txt'))

        content = [line.strip().split(' ') for line in lines]
        bbox_names = [x[0] for x in content]
        bboxes = [[float(info) for info in x[4:8]] for x in content]

        gt_bboxes = []
        gt_labels = []
        gt_bboxes_ignore = []
        gt_labels_ignore = []

        # filter 'DontCare'
        for bbox_name, bbox in zip(bbox_names, bboxes):
            if bbox_name in cat2label:
                gt_labels.append(cat2label[bbox_name])
                gt_bboxes.append(bbox)
            else:
                gt_labels_ignore.append(-1)
                gt_bboxes_ignore.append(bbox)

        data_anno = dict(
            bboxes=np.array(gt_bboxes, dtype=np.float32).reshape(-1, 4),
            labels=np.array(gt_labels, dtype=np.long),
            bboxes_ignore=np.array(gt_bboxes_ignore,
                                   dtype=np.float32).reshape(-1, 4),
            labels_ignore=np.array(gt_labels_ignore, dtype=np.long))

        data_info.update(ann=data_anno)
        data_infos.append(data_info)

    return data_infos

The users could use this function to convert the annotation and save them into a pickle file, thus they do not need to convert them in every training process. Then they can directly use the `CustomDataset` by specifing the `classes` in the config. In this tutorial we choose to convert them in `load_annotations` function in a newly implemented `KittiTinyDataset`.

In [None]:
from mmdet.datasets import build_dataset
from mmdet.datasets.builder import DATASETS
from mmdet.datasets.custom import CustomDataset

@DATASETS.register_module()
class KittiTinyDataset(CustomDataset):

    CLASSES = ('Car', 'Pedestrian', 'Cyclist')

    def load_annotations(self, ann_file):
        cat2label = {k: i for i, k in enumerate(self.CLASSES)}
        data_infos = load_kitti2middle(
            self.data_root,
            self.ann_file,
            cat2label,
            img_prefix=self.img_prefix)

        return data_infos


### Modify the config

In the next step, we need to modify the config for the training.
To accelerate the process, we finetune a detector using a pre-trained detector.

In [None]:
from mmcv import Config
cfg = Config.fromfile(
    './configs/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco.py'
)

Given a config that trains a Faster R-CNN on COCO dataset, we need to modify some values to use it for training Faster R-CNN on KITTI dataset.

In [None]:
# Modify dataset type and path
cfg.dataset_type = 'KittiTinyDataset'
cfg.data_root = 'kitti_tiny/'

cfg.data.test.type = 'KittiTinyDataset'
cfg.data.test.data_root = 'kitti_tiny/'
cfg.data.test.ann_file = 'kitti_tiny/train.txt'
cfg.data.test.img_prefix = 'training/image_2'

cfg.data.train.type = 'KittiTinyDataset'
cfg.data.train.data_root = 'kitti_tiny/'
cfg.data.train.ann_file = 'kitti_tiny/train.txt'
cfg.data.train.img_prefix = 'training/image_2'

cfg.data.val.type = 'KittiTinyDataset'
cfg.data.val.data_root = 'kitti_tiny/'
cfg.data.val.ann_file = 'kitti_tiny/val.txt'
cfg.data.val.img_prefix = 'training/image_2'

# modify num classes of the model in box head
cfg.model.roi_head.bbox_head.num_classes = 3
# We can still use the pre-trained Mask RCNN model though we do not need to
# use the mask branch
cfg.load_from = 'mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'

# Set up working dir to save files and logs.
cfg.work_dir = './tutorial_exps'

# The original learning rate (LR) is set for 8-GPU training.
# We divide it by 8 since we only use one GPU.
cfg.optimizer.lr = 0.02 / 8
cfg.lr_config.warmup = None
cfg.log_config.interval = 10

# Change the evaluation metric since we use customized dataset.
cfg.evaluation.metric = 'mAP'
# We can set the evaluation interval to reduce the evaluation times
cfg.evaluation.interval = 12

# Set seed thus the results are more reproducible
cfg.seed = 0
set_random_seed(0, deterministic=False)
cfg.gpu_ids = range(1)


# We can initialize the logger for training and have a look
# at the final config used for training
logger = get_root_logger()
logger.info(f'Config:\n{cfg.pretty_text}')


### Train a new detector

Finally, lets initialize the dataset and detector, then train a new detector!

In [None]:
datasets = [build_dataset(cfg.data.train)]
if len(cfg.workflow) == 2:
    val_dataset = copy.deepcopy(cfg.data.val)
    val_dataset.pipeline = cfg.data.train.pipeline
    datasets.append(build_dataset(val_dataset))

In [None]:
model = build_detector(
    cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)
# add an attribute for visualization convenience
model.CLASSES = datasets[0].CLASSES

In [None]:
# create work_dir
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_detector(model, datasets, cfg, distributed=False, validate=True)

## Test the trained detector



In [None]:
img = mmcv.imread(
    'kitti_tiny/training/image_2/000068.jpeg'
)

model.cfg = cfg
result = inference_detector(model, img)
show_result_pyplot(model, img, result)
