# Detectron2 Beginner's Tutorial

<img src="https://dl.fbaipublicfiles.com/detectron2/Detectron2-Logo-Horz.png" width="500">

Welcome to detectron2! This is the official colab tutorial of detectron2. Here, we will go through some basics usage of detectron2, including the following:
* Run inference on images or videos, with an existing detectron2 model
* Train a detectron2 model on a new dataset

You can make a copy of this tutorial by "File -> Open in playground mode" and make changes there. __DO NOT__ request access to this tutorial.


# Install detectron2

In [None]:
!pip install pyyaml==5.1
# This is the current pytorch version on Colab. Uncomment this if Colab changes its pytorch version
!pip install torch==1.9.0+cu102 torchvision==0.10.0+cu102 -f https://download.pytorch.org/whl/torch_stable.html

# Install detectron2 that matches the above pytorch version
# See https://detectron2.readthedocs.io/tutorials/install.html for instructions
!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html
exit(0)  # After installation, you need to "restart runtime" in Colab. This line can also restart runtime

In [None]:
!pip install torch==1.9.0+cu102 torchvision==0.10.0+cu102 -f https://download.pytorch.org/whl/torch_stable.html

In [2]:
# check pytorch installation: 
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())
assert torch.__version__.startswith("1.9")   # please manually install torch 1.9 if Colab changes its default version

1.9.0+cu102 True


In [3]:
torch.__version__

'1.9.0+cu102'

In [4]:
# Some basic setup:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import numpy as np
import os, json, cv2, random
from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog

# Train on a custom dataset

In this section, we show how to train an existing detectron2 model on a custom dataset in a new format.

We use [the balloon segmentation dataset](https://github.com/matterport/Mask_RCNN/tree/master/samples/balloon)
which only has one class: balloon.
We'll train a balloon segmentation model from an existing model pre-trained on COCO dataset, available in detectron2's model zoo.

Note that COCO dataset does not have the "balloon" category. We'll be able to recognize this new class in a few minutes.

## Prepare the dataset

### TACO dataset
#### - either download it from zip
#### - or mount from google drive

In [None]:
!git clone https://github.com/visriv/TACO.git
from google.colab import drive
drive.mount('/content/gdrive')

dataset_path = "/content/gdrive/MyDrive/data"
!python ./TACO/detector/split_dataset.py --dataset_dir ./gdrive/MyDrive/data --nr_trials 1


'''
Need to check if paths are correct/stitched correctly

zip_path = './data/taco_datset.zip'
!curl https://zenodo.org/record/3587843/files/TACO.zip --output $zip_path
import zipfile
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(zip_path.replace('.zip', '/'))
dataset_path = './data/taco_datset/TACO/data/'
'''



In [None]:
from detectron2.structures import BoxMode
import pandas as pd
import json
def myconverter(obj):
    if isinstance(obj, np.integer):
        return int(obj)
    elif isinstance(obj, np.floating):
        return float(obj)
    elif isinstance(obj, np.ndarray):
        return obj.tolist()
    elif isinstance(obj, datetime.datetime):
        return obj.__str__()

def get_taco_dicts(input_json_path,
                   filter_type = 'default',
                   n_top_categories=28,
                   label_transfer = None):
    # load data using Python JSON module
    dataset_path = '/'.join(input_json_path.split('/')[0:-1])
    with open(input_json_path,'r') as json_in:
      data_json = json.loads(json_in.read())
      global json_df_final_category

              
      json_df_image = pd.json_normalize(data_json, record_path =['images'])
      json_df_anno = pd.json_normalize(data_json, record_path =['annotations'])
      json_df_anno_u = json_df_anno.rename(columns = {'id': 'record_id', 'image_id': 'id'}, inplace = False)
      json_df_anno_u['segmentation'] = json_df_anno_u['segmentation'].apply(lambda x: x[0])

      json_df_cat = pd.json_normalize(data_json, record_path =['categories'])
      json_df_cat_new = json_df_cat.rename(columns= {'id':'category_id'})
      json_df_image = pd.merge(json_df_image, json_df_anno_u, on="id")
      
      json_df_final_category = pd.merge(json_df_image, json_df_cat_new, on="category_id")
      json_df_final_category.sort_values('id', inplace=True)
      #json_df_final_category = json_df_final_category.set_index('id')
      json_df_final_category['image_count']=json_df_final_category.groupby('supercategory')['id'].transform('count')
      json_df_final_category['category_rank'] = json_df_final_category['image_count'].rank(method='dense', ascending=False).astype('int32')
      json_df_final_category = json_df_final_category.astype({'id':'int32', 
                                                        'width':'int32', 
                                                        'height':'int32', 
                                                        'record_id':'int32', 
                                                        'category_id':'int32', 
                                                        'image_count':'int32',
                                                        'iscrowd':'int32'})
      print('total image df count', json_df_final_category.shape[0],
            'total anno df count', json_df_final_category.shape[0], 
            'total joined df count', json_df_final_category.shape[0])
      
      if filter_type == 'label_transfer':
        json_df_final_category = json_df_final_category[json_df_final_category['supercategory'].isin(list(label_transfer.keys()))]
        json_df_final_category['super_id'] = json_df_final_category['supercategory'].apply(lambda x: label_transfer[x])
      else:
        json_df_final_category = json_df_final_category[json_df_final_category['category_rank'] <= n_top_categories]
        json_df_final_category['super_id'] = json_df_final_category.groupby(['supercategory']).ngroup().astype('int32')
      
      print('selected image df count', json_df_final_category.shape[0],
            'selected anno df count', json_df_final_category.shape[0], 
            'selected joined df count', json_df_final_category.shape[0])
      print(json_df_final_category)

    

    dataset_dicts = []
    idx = 0
    while idx < len(json_df_final_category.index):
        row = json_df_final_category.iloc[idx]
        record = {}
        record["file_name"] = dataset_path + '/' + row['file_name']
        record["image_id"] = myconverter(row['id'])
        record["height"] = row['height']
        record["width"] = row['width']
        
        i = 0
        objs = []
        #print(type(row['id']), type(json_df_final_category.iloc[idx+i]['id']))
        #print(type(row['id']) ==  type(json_df_final_category.iloc[idx+i]['id']))
        #print(idx+i < len(json_df_final_category.index) and  (row['id'] == json_df_final_category.iloc[idx+i]['id']))
        while (idx+i < len(json_df_final_category.index) and row['id'] == json_df_final_category.iloc[idx+i]['id']):
          obj = {
              "bbox": json_df_final_category.iloc[idx+i]['bbox'],
              "bbox_mode": BoxMode.XYWH_ABS,
              "segmentation":[json_df_final_category.iloc[idx+i]['segmentation']],
              "category_id": json_df_final_category.iloc[idx+i]['super_id']
              }
          objs.append(obj)
          i = i+1
          #print('idx, i: ', idx, i)
          #print('appended annotations of length:', len(objs))

        record["annotations"] = objs
        dataset_dicts.append(record)
        idx = idx+i+1
        #print(idx)
    return dataset_dicts

'''
label_transfer = {'Plastic bag & wrapper':0,
                  'Cigarette':1,
                  'Unlabeled litter' :2,
                  'Aluminium foil':3,
                  'Battery':4,
                  'Blister pack':5,
                  'Bottle':6,
                  'Can':7,
                  'Carton':8,
                  'Broken glass':9,
                  'Bottle cap': 10, 'Cup':11,
                  'Straw':12,
                  'Other plastic':13,
                  'Paper':14,
                  'Paper bag':15,
                  'Styrofoam piece':16,
                  'Pop tab': 17,
                  'Lid':18,
                  'Plastic container':19}
'''
#for drivable (0)/non-drivable (1)
label_transfer = {'Plastic bag & wrapper':0, 'Cigarette':0, 'Unlabeled litter' :1, 'Aluminium foil':0, 'Battery':1, 
                 'Blister pack':0, 'Bottle':1, 'Can':1, 'Carton':0, 'Broken glass':1, 'Bottle cap': 0, 'Cup':0, 'Straw':0,
                  'Other plastic':1, 'Paper':0, 'Paper bag':0, 'Styrofoam piece':0, 'Pop tab': 0, 'Lid':0, 'Plastic container':0, 'Glass jar':1}




dataset_dicts = get_taco_dicts(dataset_path + "/annotations_0_train.json", 
                               filter_type = 'label_transfer',
                               label_transfer = label_transfer)
run_id = 19

         

In [48]:
for d in ["train" + str(run_id), "val"+ str(run_id)]:
    DatasetCatalog.register("taco_" + d, 
                            lambda d=d: get_taco_dicts(dataset_path + "/annotations_0_" + d[:-1*len(str(run_id))] + '.json', 
                            filter_type = 'label_transfer',
                            label_transfer = label_transfer))
    #supercategory_names = list(set(json_df_final_category["supercategory"]))
    supercategory_names = ['Driveable', 'Non-driveable']
    #supercategory_names.sort()
    MetadataCatalog.get("taco_" + d).set(thing_classes=supercategory_names)


taco_metadata = MetadataCatalog.get("taco_train" + str(run_id))


To verify the data loading is correct, let's visualize the annotations of randomly selected samples in the training set:



In [None]:
dataset_dicts = get_taco_dicts(dataset_path + "/annotations_0_train.json", filter_type = 'label_transfer', label_transfer = label_transfer)
from detectron2.utils.visualizer import ColorMode

for d in random.sample(dataset_dicts, 3):
    print(d, '\n')
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], scale=0.18,instance_mode=ColorMode.IMAGE_BW)
    out = visualizer.draw_dataset_dict(d)
    cv2_imshow(out.get_image()[:, :, ::-1])
    #print(json_df_final_category.loc[d['annotations'][0]['category_id']])

## Train!

Now, let's fine-tune a COCO-pretrained R50-FPN Mask R-CNN model on the taco dataset. It takes ~2 minutes to train 300 iterations on a P100 GPU.


In [None]:
from detectron2.engine import DefaultTrainer
model_prefix = "COCO-InstanceSegmentation" #"COCO-Detection" check on detectron2 github
model_name = 'mask_rcnn_R_50_FPN_3x'  # 'faster_rcnn_R_50_FPN_3x' 
suffix = '_taco2'
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file(model_prefix + "/" + model_name + ".yaml"))
cfg.DATASETS.TRAIN = ("taco_train" + str(run_id))
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(model_prefix + "/" + model_name + ".yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
cfg.SOLVER.MAX_ITER = 5000  # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = []        # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 2  # max 28 class (supercategories). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
cfg.OUTPUT_DIR = './output/' + model_name + suffix
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

In [None]:
# Look at training curves in tensorboard:
%load_ext tensorboard
%tensorboard --logdir {cfg.OUTPUT_DIR}
%reload_ext tensorboard

In [None]:
!tensorboard dev upload --logdir './output' --name "taco" --description "finetune pretrained faster/mask rcnn on taco_x dataset"


In [6]:
#save to output path in google drive
os.makedirs('gdrive/MyDrive/trash_detection/results/' + suffix + '/', exist_ok=True)
%cp -r {cfg.OUTPUT_DIR} gdrive/MyDrive/trash_detection/results/$suffix/

## Inference & evaluation using the trained model
Now, let's run inference with the trained model on the taco validation dataset. First, let's create a predictor using the model we just trained:



Then, we randomly select several samples to visualize the prediction results.

In [None]:
print(outputs["instances"].pred_classes)

In [None]:
'''
temp
to upload and restore the  checkpoint
'''
from detectron2.engine import DefaultTrainer

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file(model_prefix + "/" + model_name + ".yaml"))
cfg.DATASETS.TRAIN = ("taco_train" + str(run_id))
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
#cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
cfg.SOLVER.MAX_ITER = 5000  # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = []        # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 20  # max 28 class (supercategories). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)



In [None]:
# Inference should use the config with parameters that are used in training
# cfg now already contains everything we've set previously. We changed it a little bit for inference:
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.50   # set a custom testing threshold
predictor = DefaultPredictor(cfg)

from detectron2.utils.visualizer import ColorMode
import time

dataset_dicts_val = get_taco_dicts(dataset_path + "/annotations_0_test.json", filter_type = 'label_transfer', label_transfer = label_transfer)
os.makedirs("gdrive/MyDrive/trash_detection/Inference/" + suffix + "/" + model_name + "/", exist_ok = True)
for d in dataset_dicts_val:    
    im = cv2.imread(d["file_name"])
    #record start time of inference
    start_time = time.time()

    outputs = predictor(im)  # format is documented at https://detectron2.readthedocs.io/tutorials/models.html#model-output-format
    
    print("inference time", time.time() - start_time)
    
    '''
    out_classes = list(outputs["instances"].pred_classes.to("cpu").numpy())
    true_classes = [d['annotations'][i]['category_id'] for i in range(len(d['annotations']))]

    common_pred = list(set(out_classes) & set(true_classes))
    '''
    v = Visualizer(im[:, :, ::-1], metadata=taco_metadata, scale=0.5
                     # remove the colors of unsegmented pixels. This option is only available for segmentation models
    )
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    out_image = out.get_image()[:, :, ::-1]
    #cv2_imshow(out_image)
    img_file_name = d["file_name"].split('/')[-1]
    cv2.imwrite("gdrive/MyDrive/trash_detection/Inference/" + suffix + "/" + model_name + "/" + img_file_name + ".jpg", out_image)
    print(d["file_name"])

We can also evaluate its performance using AP metric implemented in COCO API.
This gives an AP of ~70. Not bad!

In [None]:
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader
evaluator = COCOEvaluator("taco_val" + str(run_id), output_dir=cfg.OUTPUT_DIR)
val_loader = build_detection_test_loader(cfg, "taco_val" + str(run_id))
print(inference_on_dataset(predictor.model, val_loader, evaluator))
# another equivalent way to evaluate the model is to use `trainer.test`

In [29]:
import fvcore.nn.parameter_count
from detectron2.modeling import build_model
model = build_model(cfg)  # returns a torch.nn.Module
cnt = fvcore.nn.parameter_count(model)
print(cnt)