# Detectron2 Beginner's Tutorial

<img src="https://dl.fbaipublicfiles.com/detectron2/Detectron2-Logo-Horz.png" width="500">

Welcome to detectron2! This is the official colab tutorial of detectron2. Here, we will go through some basics usage of detectron2, including the following:
* Run inference on images or videos, with an existing detectron2 model
* Train a detectron2 model on a new dataset

You can make a copy of this tutorial by "File -> Open in playground mode" and make changes there. __DO NOT__ request access to this tutorial.


# Install detectron2

In [None]:
 !pip install torch==1.7.0+cu101 torchvision==0.8.1+cu101 torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

In [None]:
# install dependencies: 
!pip install pyyaml==5.1
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())
!gcc --version
# opencv is pre-installed on colab

Collecting pyyaml==5.1
[?25l  Downloading https://files.pythonhosted.org/packages/9f/2c/9417b5c774792634834e730932745bc09a7d36754ca00acf1ccd1ac2594d/PyYAML-5.1.tar.gz (274kB)
[K     |█▏                              | 10kB 22.9MB/s eta 0:00:01[K     |██▍                             | 20kB 30.4MB/s eta 0:00:01[K     |███▋                            | 30kB 24.4MB/s eta 0:00:01[K     |████▉                           | 40kB 28.0MB/s eta 0:00:01[K     |██████                          | 51kB 25.8MB/s eta 0:00:01[K     |███████▏                        | 61kB 28.0MB/s eta 0:00:01[K     |████████▍                       | 71kB 18.7MB/s eta 0:00:01[K     |█████████▋                      | 81kB 20.0MB/s eta 0:00:01[K     |██████████▊                     | 92kB 18.5MB/s eta 0:00:01[K     |████████████                    | 102kB 18.5MB/s eta 0:00:01[K     |█████████████▏                  | 112kB 18.5MB/s eta 0:00:01[K     |██████████████▍                 | 122kB 18.5MB/s eta 

In [None]:
# install detectron2: (Colab has CUDA 10.1 + torch 1.8)
# See https://detectron2.readthedocs.io/tutorials/install.html for instructions
import torch
assert torch.__version__.startswith("1.8")
!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
# exit(0)  # After installation, you need to "restart runtime" in Colab. This line can also restart runtime

Looking in links: https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
Collecting detectron2
[?25l  Downloading https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/detectron2-0.4%2Bcu101-cp37-cp37m-linux_x86_64.whl (6.2MB)
[K     |████████████████████████████████| 6.2MB 650kB/s 
[?25hCollecting iopath>=0.1.2
  Downloading https://files.pythonhosted.org/packages/21/d0/22104caed16fa41382702fed959f4a9b088b2f905e7a82e4483180a2ec2a/iopath-0.1.8-py3-none-any.whl
Collecting omegaconf>=2
  Downloading https://files.pythonhosted.org/packages/d0/eb/9d63ce09dd8aa85767c65668d5414958ea29648a0eec80a4a7d311ec2684/omegaconf-2.0.6-py3-none-any.whl
Collecting fvcore<0.1.4,>=0.1.3
[?25l  Downloading https://files.pythonhosted.org/packages/6b/68/2bacb80e13c4084dfc37fec8f17706a1de4c248157561ff33e463399c4f5/fvcore-0.1.3.post20210317.tar.gz (47kB)
[K     |████████████████████████████████| 51kB 8.5MB/s 
[?25hCollecting yacs>=0.1.6
  Downloading https://files.pythonhoste

In [None]:
# Some basic setup:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import numpy as np
import os, json, cv2, random
from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog

# Train on a custom dataset

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!cp drive/MyDrive/detectron_model_v2.pth detectron_model.pth

In [None]:
!cp output/model_final.pth drive/MyDrive/detectron_model_v2.pth

In this section, we show how to train an existing detectron2 model on a custom dataset in a new format.

We use [the balloon segmentation dataset](https://github.com/matterport/Mask_RCNN/tree/master/samples/balloon)
which only has one class: balloon.
We'll train a balloon segmentation model from an existing model pre-trained on COCO dataset, available in detectron2's model zoo.

Note that COCO dataset does not have the "balloon" category. We'll be able to recognize this new class in a few minutes.

## Prepare the dataset

Register the balloon dataset to detectron2, following the [detectron2 custom dataset tutorial](https://detectron2.readthedocs.io/tutorials/datasets.html).
Here, the dataset is in its custom format, therefore we write a function to parse it and prepare it into detectron2's standard format. User should write such a function when using a dataset in custom format. See the tutorial for more details.


In [None]:
!mkdir -p detectron_test_data
!mkdir -p detectron_train_data
!mkdir -p test_output
!mkdir -p output

In [None]:
from detectron2.structures import BoxMode

def get_banner_dicts(main_dir):
    dataset_dicts = []
    img_dir = main_dir
    json_file = os.path.join(img_dir, "via_region_data.json")
    with open(json_file) as f:
        imgs_anns = json.load(f)


    for idx, v in enumerate(imgs_anns.values()):
        record = {}
        
        filename = os.path.join(img_dir, v["filename"])
        height, width = cv2.imread(filename).shape[:2]
        
        record["file_name"] = filename
        record["image_id"] = idx
        record["height"] = height
        record["width"] = width
      
        annos = v["regions"]
        objs = []
        for _, anno in annos.items():
            anno = anno["shape_attributes"]
            px = anno["all_points_x"]
            py = anno["all_points_y"]
            poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
            poly = [p for x in poly for p in x]

            obj = {
                "bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],
                "bbox_mode": BoxMode.XYXY_ABS,
                "segmentation": [poly],
                "category_id": 0,
            }
            objs.append(obj)
        record["annotations"] = objs
        dataset_dicts.append(record)
    return dataset_dicts

for d in [""]:
    DatasetCatalog.register("banner" + d, lambda d=d: get_banner_dicts("detectron_train_data/" + d))
    MetadataCatalog.get("banner").set(thing_classes=["banner"])
banners_metadata = MetadataCatalog.get("banner")

To verify the data loading is correct, let's visualize the annotations of randomly selected samples in the training set:



In [None]:
dataset_dicts = get_banner_dicts("detectron_train_data/")
for d in random.sample(dataset_dicts, 1):
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=banners_metadata, scale=0.5)
    out = visualizer.draw_dataset_dict(d)
    out.save("output.jpg")

## Train!

Now, let's fine-tune a COCO-pretrained R50-FPN Mask R-CNN model on the balloon dataset. It takes ~6 minutes to train 300 iterations on Colab's K80 GPU, or ~2 minutes on a P100 GPU.


In [None]:
from detectron2.engine import HookBase

class HelloHook(HookBase):
  def after_step(self):
    if self.trainer.iter % 1 == 0:
      print(f"Hello at iteration {self.trainer.iter}!")

In [None]:
from detectron2.engine import DefaultTrainer

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("banner",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
cfg.SOLVER.MAX_ITER = 1300
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)

In [None]:
# train
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.register_hooks(
    [HelloHook()]
)
trainer.train()

[32m[04/11 23:11:08 d2.engine.defaults]: [0mModel:
GeneralizedRCNN(
  (backbone): FPN(
    (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (top_block): LastLevelMaxPool()
    (bottom_up): ResNet(
      (stem): BasicStem(
        (conv1): Conv2d(
          3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
      )
 

model_final_f10217.pkl: 178MB [00:12, 14.7MB/s]                           
Skip loading parameter 'roi_heads.box_predictor.cls_score.weight' to the model due to incompatible shapes: (81, 1024) in the checkpoint but (2, 1024) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.box_predictor.cls_score.bias' to the model due to incompatible shapes: (81,) in the checkpoint but (2,) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.box_predictor.bbox_pred.weight' to the model due to incompatible shapes: (320, 1024) in the checkpoint but (4, 1024) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.box_predictor.bbox_pred.bias' to the model due to incompatible shapes: (320,) in the checkpoint but (4,) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.mask_head.predictor.weight' to the model due to i

[32m[04/11 23:11:41 d2.engine.train_loop]: [0mStarting training from iteration 0
Hello at iteration 0!
Hello at iteration 1!
Hello at iteration 2!
Hello at iteration 3!
Hello at iteration 4!
Hello at iteration 5!
Hello at iteration 6!
Hello at iteration 7!
Hello at iteration 8!
Hello at iteration 9!
Hello at iteration 10!
Hello at iteration 11!
Hello at iteration 12!
Hello at iteration 13!
Hello at iteration 14!
Hello at iteration 15!
Hello at iteration 16!
Hello at iteration 17!
Hello at iteration 18!
[32m[04/11 23:11:51 d2.utils.events]: [0m eta: 0:10:07  iter: 19  total_loss: 1.766  loss_cls: 0.7693  loss_box_reg: 0.1571  loss_mask: 0.69  loss_rpn_cls: 0.04836  loss_rpn_loc: 0.01251  time: 0.4966  data_time: 0.0355  lr: 4.9953e-06  max_mem: 2759M
Hello at iteration 19!
Hello at iteration 20!
Hello at iteration 21!
Hello at iteration 22!
Hello at iteration 23!
Hello at iteration 24!
Hello at iteration 25!
Hello at iteration 26!
Hello at iteration 27!
Hello at iteration 28!
Hello 

In [None]:
# cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained
cfg.MODEL.WEIGHTS = "detectron_model.pth"  # path to the model we just trained
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set a custom testing threshold
predictor = DefaultPredictor(cfg)

In [None]:
import os
files = os.listdir("detectron_test_data/")
files = list(filter(lambda x: not x.endswith('csv'), files))
print(files)

for file in files:
    print(file)
    im = cv2.imread("detectron_test_data/%s" % file)
    outputs = predictor(im)
    v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    out.save("test_output/%s" % file)
    masks = outputs["instances"].to("cpu").pred_masks
    np.save("test_output/%s" % file.split('.')[0], masks.numpy())
    scores = outputs["instances"].to("cpu").scores
    np.save("test_output/scores_%s" % file.split('.')[0], scores.numpy())

['397.jpg', '957.jpg', '18.jpg', '888.jpg', 'synthetic_3.png', '771.jpg', '566.jpg', 'synthetic_22.png', 'synthetic_9.png', 'synthetic_13.png', '688.jpg', '389.jpg', '159.jpg', '127.jpg', '951.jpg', '832.jpg', '778.jpg', '872.jpg', 'synthetic_26.png', 'synthetic_11.png', 'synthetic_14.png', '811.jpg', '339.jpg', 'synthetic_4.png', '312.jpg', 'synthetic_20.png', 'synthetic_7.png', '651.jpg', 'synthetic_28.png', '522.jpg', '245.jpg', '458.jpg', '781.jpg', 'synthetic_16.png', '545.jpg', 'synthetic_8.png', '570.jpg', '163.jpg', '444.jpg', 'synthetic_29.png', 'synthetic_25.png', '889.jpg', '602.jpg', '108.jpg', '196.jpg', 'synthetic_0.png', '51.jpg', '953.jpg', '218.jpg', '613.jpg', '290.jpg', '954.jpg', '234.jpg', '475.jpg', 'synthetic_27.png', '275.jpg', '897.jpg', '714.jpg', '685.jpg', 'synthetic_1.png', '743.jpg', 'synthetic_24.png', '304.jpg', '882.jpg', 'synthetic_12.png', '864.jpg', 'synthetic_17.png', '583.jpg', '291.jpg', 'synthetic_5.png', 'synthetic_2.png', 'synthetic_10.png', 's

In [None]:
!tar -cvf test_output.tar test_output

test_output/
test_output/scores_889.npy
test_output/scores_synthetic_12.npy
test_output/397.jpg
test_output/957.jpg
test_output/18.jpg
test_output/scores_864.npy
test_output/synthetic_14.npy
test_output/scores_339.npy
test_output/888.jpg
test_output/882.npy
test_output/synthetic_3.png
test_output/952.npy
test_output/234.npy
test_output/scores_714.npy
test_output/771.jpg
test_output/566.jpg
test_output/159.npy
test_output/scores_957.npy
test_output/312.npy
test_output/scores_synthetic_25.npy
test_output/613.npy
test_output/scores_389.npy
test_output/scores_888.npy
test_output/synthetic_22.png
test_output/synthetic_9.png
test_output/scores_127.npy
test_output/714.npy
test_output/scores_198.npy
test_output/synthetic_13.png
test_output/545.npy
test_output/688.jpg
test_output/389.jpg
test_output/475.npy
test_output/scores_291.npy
test_output/scores_synthetic_19.npy
test_output/scores_129.npy
test_output/scores_479.npy
test_output/scores_872.npy
test_output/159.jpg
test_output/127.jpg
test_o

In [None]:
!cp test_output.tar drive/MyDrive/test_output.tar

In [None]:
    file = "314.jpg"
    im = cv2.imread("detectron_test_data/input/%s" % file)
    outputs = predictor(im)
    masks = outputs["instances"].to("cpu").pred_masks
    print(str(masks))
    np.save('masks', masks.numpy())
    # v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
    # out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    # out.save("output.jpg")

tensor([0.9929, 0.9904, 0.7695])
