Code Author: Ha Eungyeom (eungyeom_ha@yonsei.ac.kr)                
 This code is developed for training and evaluating a Faster-RCNN model on the HOD datasetCases

### Training Code - Normal Cases
#### Paper Section: 3.1 Environment Setup

In [1]:
# Displaying the current working directory
!pwd

/home/oks/people/egha/ob_code_faster_rcnn


In [2]:
# Changing to the parent directory
cd ..

/home/oks/people/egha


In [3]:
# Installing necessary library for file downloading
!pip install down

# Downloading the dataset using Google Drive link
!gdown --id 1NEQWK062dMREwDSbHOPPMx-99iUVebrN -O faster_rcnn_dataset.zip

# Creating a directory for the dataset and extracting the dataset there
!mkdir faster_rcnn_data
!unzip faster_rcnn_dataset.zip -d faster_rcnn_data

'\n!pip install down\n\n!gdown --id 1NEQWK062dMREwDSbHOPPMx-99iUVebrN -O faster_rcnn_dataset.zip\n!mkdir faster_rcnn_data\n!unzip faster_rcnn_dataset.zip -d faster_rcnn_data\n'

In [4]:
!pwd

/home/oks/people/egha


In [5]:
# Setting up the name for normal case training
name = '/rcnn_normal/'

In [None]:
# Importing required libraries
import os
import xml.etree.ElementTree as ET

# Specifying the directory path for annotations
directory_path = './faster_rcnn_data' + name  + 'Annotations/'

### Function to remove spaces from tags in XML files
#### Paper Section: 3.2 Data Preprocessing

In [None]:
def remove_spaces_from_tags(dir_path):
    for filename in os.listdir(dir_path):
        if filename.lower().endswith('.xml'):
            filepath = os.path.join(dir_path, filename)
            
            # Parsing the XML file
            tree = ET.parse(filepath)
            root = tree.getroot()

            # Removing spaces from <filename> and <path> tags
            for tag in ['filename', 'path']:
                element = root.find(tag)
                if element is not None:
                    # 공백 제거
                    element.text = element.text.replace(" ", "")
            
            # Saving the changes back to the XML file
            tree.write(filepath)
            print(f"Changed file: {filename}")

# Executing the function to preprocess XML annotations
remove_spaces_from_tags(directory_path)

### Installation of MMDetection
#### Paper Section: 3.3 Framework Preparation
Detailed steps and explanations for setting up MMDetection, including addressing version compatibilities.
.


In [7]:
# Checking the version of PyTorch
import torch
print(torch.__version__)

1.13.0+cu116


In [8]:
# Downgrading PyTorch to 1.13.0+cu116 for compatibility with MMDetection
!pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 --extra-index-url https://download.pytorch.org/whl/cu116

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu116

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [9]:
# Installing mmcv-full
!pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.13/index.html

Looking in links: https://download.openmmlab.com/mmcv/dist/cu116/torch1.13/index.html

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [10]:
# Cloning and installing MMDetection (version 2.x)
!git clone --branch 2.x https://github.com/open-mmlab/mmdetection.git
!cd mmdetection; python setup.py install

fatal: 대상 경로가('mmdetection') 이미 있고 빈 디렉터리가 아닙니다.
running install
running bdist_egg
running egg_info
writing mmdet.egg-info/PKG-INFO
writing dependency_links to mmdet.egg-info/dependency_links.txt
writing requirements to mmdet.egg-info/requires.txt
writing top-level names to mmdet.egg-info/top_level.txt
reading manifest file 'mmdet.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'mmdet.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/mmdet
copying build/lib/mmdet/__init__.py -> build/bdist.linux-x86_64/egg/mmdet
creating build/bdist.linux-x86_64/egg/mmdet/datasets
copying build/lib/mmdet/datasets/coco_occluded.py -> build/bdist.linux-x86_64/egg/mmdet/datasets
copying build/lib/mmdet/datasets/coco_panoptic.py -> build/bdist.linux-x86_64/egg/mmdet/datasets
copying build/lib/mmdet/data

In [11]:
from mmdet.apis import init_detector, inference_detector
import mmcv



### Conversion of PASCAL VOC dataset to MS-COCO format
#### Paper Section: 3.4 Dataset Conversion
Steps and code snippets for converting the dataset from PASCAL VOC format to MS-COCO format using a utility..


In [12]:
!pwd

/home/oks/people/egha


In [13]:
# Cloning the voc2coco utility
!git clone https://github.com/yukkyo/voc2coco.git # voc -> cooo

fatal: 대상 경로가('voc2coco') 이미 있고 빈 디렉터리가 아닙니다.


In [14]:
!pwd

/home/oks/people/egha


In [15]:
!cat ./faster_rcnn_data/labels.txt

alcohol
insulting_gesture
blood
cigarette
gun
knife


In [16]:
cd ./voc2coco/

/home/oks/people/egha/voc2coco


In [17]:
# Converting VOC to COCO format for train, validation, and test sets
!python voc2coco.py --ann_dir ../faster_rcnn_data/rcnn_normal/Annotations \
--ann_ids ../faster_rcnn_data/rcnn_normal/ImageSets/Main/train.txt \
--labels ../faster_rcnn_data/labels.txt \
--output ../faster_rcnn_data/rcnn_normal/train.json \
--ext xml

!python voc2coco.py --ann_dir ../faster_rcnn_data/rcnn_normal/Annotations \
--ann_ids ../faster_rcnn_data/rcnn_normal/ImageSets/Main/validation.txt \
--labels ../faster_rcnn_data/labels.txt \
--output ../faster_rcnn_data/rcnn_normal/val.json \
--ext xml

!python voc2coco.py --ann_dir ../faster_rcnn_data/rcnn_normal/Annotations \
--ann_ids ../faster_rcnn_data/rcnn_normal/ImageSets/Main/test.txt \
--labels ../faster_rcnn_data/labels.txt \
--output ../faster_rcnn_data/rcnn_normal/test.json \
--ext xml

Start converting !
100%|████████████████████████████████████| 4646/4646 [00:00<00:00, 15152.23it/s]
Start converting !
100%|██████████████████████████████████████| 552/552 [00:00<00:00, 14630.83it/s]
Start converting !
100%|██████████████████████████████████████| 270/270 [00:00<00:00, 14180.24it/s]


In [18]:
!cat ../faster_rcnn_data/rcnn_normal/train.json

{"images": [{"file_name": "img_hod_001565.jpg", "height": 303, "width": 455, "id": "img_hod_001565"}, {"file_name": "img_hod_002040.jpg", "height": 612, "width": 408, "id": "img_hod_002040"}, {"file_name": "img_hod_001836.jpg", "height": 408, "width": 612, "id": "img_hod_001836"}, {"file_name": "img_hod_001920.jpg", "height": 408, "width": 612, "id": "img_hod_001920"}, {"file_name": "img_hod_009630.jpg", "height": 626, "width": 418, "id": "img_hod_009630"}, {"file_name": "img_hod_001456.jpg", "height": 612, "width": 567, "id": "img_hod_001456"}, {"file_name": "img_hod_002027.jpg", "height": 408, "width": 612, "id": "img_hod_002027"}, {"file_name": "img_hod_001986.jpg", "height": 330, "width": 612, "id": "img_hod_001986"}, {"file_name": "img_hod_001512.jpg", "height": 324, "width": 395, "id": "img_hod_001512"}, {"file_name": "img_hod_001480.jpg", "height": 586, "width": 612, "id": "img_hod_001480"}, {"file_name": "img_hod_001560.jpg", "height": 359, "width": 239, "id": "img_hod_001560"}

In [19]:
!pwd

/home/oks/people/egha/voc2coco


In [20]:
cd ..

/home/oks/people/egha


### Configuration Setup and Model Training
#### Paper Section: 4.1 Training Procedure
Detailed code snippets for configuring the training setup, defining the custom dataset class, and initiating the training process.

In [21]:
# Configuring the dataset, model, and training parameters
from mmcv import Config
from mmdet.datasets.builder import DATASETS
from mmdet.datasets.coco import CocoDataset
from mmdet.apis import set_random_seed, train_detector
from mmdet.models import build_detector

# Defining the custom dataset class
@DATASETS.register_module(force=True)
class HOD(CocoDataset):
    CLASSES = ('alcohol', 'insulting_gesture', 'blood', 'cigarette', 'gun', 'knife') 

In [22]:
# Load the configuration file
config_file = './mmdetection/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = './mmdetection/checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'

In [23]:
!pwd

/home/oks/people/egha


In [24]:
!cd ./mmdetection; mkdir checkpoints
!wget -O ./mmdetection/checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth

mkdir: `checkpoints' 디렉토리를 만들 수 없습니다: 파일이 있습니다
--2023-10-12 23:06:50--  http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
Resolving download.openmmlab.com (download.openmmlab.com)... 163.181.22.138, 163.181.22.139, 163.181.22.142, ...
접속 download.openmmlab.com (download.openmmlab.com)|163.181.22.138|:80... 접속됨.
HTTP request sent, awaiting response... 200 OK
Length: 167287506 (160M) [application/octet-stream]
Saving to: ‘./mmdetection/checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth’


2023-10-12 23:07:04 (11.4 MB/s) - ‘./mmdetection/checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth’ saved [167287506/167287506]



In [25]:
!ls -lia ./mmdetection/checkpoints

합계 163380
3919874 drwxrwxr-x  2 oks oks      4096 10월 12 17:34 .
3608728 drwxrwxr-x 19 oks oks      4096 10월 12 17:34 ..
3914259 -rw-rw-r--  1 oks oks 167287506 11월  3  2021 faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth


In [26]:
cfg = Config.fromfile(config_file)
print(cfg.pretty_text)

model = dict(
    type='FasterRCNN',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.

In [27]:
name

'/rcnn_normal/'

In [28]:
!pwd

/home/oks/people/egha


In [29]:
from mmcv.runner import HOOKS, Hook

# Adding the SaveBestCheckpointHook class
# This class is designed to save the best model checkpoint based on a specified metric (e.g., bbox_mAP).
@HOOKS.register_module()
class SaveBestCheckpointHook(Hook):
    def __init__(self, out_dir, metric='bbox_mAP', save_optimizer=True):
        self.out_dir = out_dir  # directory where the best checkpoint will be saved
        self.metric = metric  # metric name to monitor and determine the best model
        self.save_optimizer = save_optimizer  # flag to decide whether to save optimizer state or not
        self.best_score = 0.0  # initialize the best score to 0

    def after_train_epoch(self, runner):
        # This method is called after each training epoch
        # It checks if the current epoch score is better than the best recorded so far and saves the model checkpoint if so
        if not self.every_n_epochs(runner, 1):
            return
        from mmcv.runner import save_checkpoint
        if runner.log_buffer.output.get(self.metric, 0) > self.best_score:
            self.best_score = runner.log_buffer.output[self.metric]
            save_checkpoint(runner.model, self.out_dir, optimizer=self.save_optimizer)

# Updating environment parameters for the dataset
cfg.dataset_type = 'HOD'  # Dataset type is set to 'HOD'
cfg.data_root = './faster_rcnn_data' + name  # Root directory path for data

# Updating type, data_root, ann_file, img_prefix environment parameters for train, val, and test datasets
cfg.data.train.type = 'HOD'
cfg.data.train.data_root = './faster_rcnn_data'+ name
cfg.data.train.ann_file = 'train.json'
cfg.data.train.img_prefix = 'JPEGImages'

cfg.data.val.type = 'HOD'
cfg.data.val.data_root = './faster_rcnn_data' + name
cfg.data.val.ann_file = 'val.json'
cfg.data.val.img_prefix = 'JPEGImages'

cfg.data.test.type = 'HOD'
cfg.data.test.data_root = './faster_rcnn_data' + name
cfg.data.test.ann_file = 'test.json'
cfg.data.test.img_prefix = 'JPEGImages'

# Updating the number of classes
cfg.model.roi_head.bbox_head.num_classes = 6  # Number of classes is set to 6

# Loading the pretrained model
cfg.load_from = './mmdetection/checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'  # Path to the pretrained model

# Setting the directory to save the training weight files
cfg.work_dir = './tutorial_exps_normal'  # Directory to save training logs and weight files

# Updating the learning rate environment parameter
cfg.optimizer.lr = 0.02 / 8  # Learning rate is set to 0.02 / 8
cfg.lr_config.warmup = None  # Warmup is disabled
cfg.log_config.interval = 2000  # Logging interval is set to 2000

# For CocoDataset, the metric should be set to 'bbox' (not mAP). Setting it to 'bbox' calculates mAP over a range of IoU thresholds (0.5 to 0.95)
cfg.evaluation.metric = 'bbox'
cfg.evaluation.classwise = True  # Additional setting for label-wise mAP

cfg.evaluation.interval = 2000  # Evaluation interval is set to 2000
cfg.checkpoint_config.interval = 5  # Checkpoint saving interval is set to 5

# Adding a setting to save the best performing model
# Adding a custom hook to the cfg setting
cfg.custom_hooks = [dict(type='SaveBestCheckpointHook', out_dir=cfg.work_dir, metric='bbox_mAP', save_optimizer=True)]

# If the config is loaded twice, the lr_config's policy disappears. So, it's set here again.
cfg.lr_config.policy='step'  # Setting the learning rate policy to 'step'

# Setting seed for reproducibility
cfg.seed = 0  # Seed is set to 0
set_random_seed(0, deterministic=False)  # Setting random seed with deterministic set to False
cfg.gpu_ids = range(1)  # Setting GPU IDs

# Changing the evaluation metric since a customized dataset is used
cfg.device = 'cuda'  # Setting device to cuda

cfg.runner.max_epochs = 150  # Setting max epochs to 150for training


In [30]:
print(cfg.pretty_text)

model = dict(
    type='FasterRCNN',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.

In [31]:
from mmdet.datasets import build_dataset
from mmdet.models import build_detector
from mmdet.apis import train_detector

# Create a dataset for training
datasets = [build_dataset(cfg.data.train)]

loading annotations into memory...
Done (t=0.03s)
creating index...
index created!


In [32]:
# Print the first dataset to check its content
print(datasets[0])

# Using datasets[0].__dict__ to view all the self variables' keys and values.
datasets[0].__dict__.keys()


HOD Train dataset with number of images 4646, and instance counts: 
+-------------+-------+-----------------------+-------+-----------+-------+---------------+-------+----------+-------+
| category    | count | category              | count | category  | count | category      | count | category | count |
+-------------+-------+-----------------------+-------+-----------+-------+---------------+-------+----------+-------+
| 0 [alcohol] | 453   | 1 [insulting_gesture] | 396   | 2 [blood] | 470   | 3 [cigarette] | 467   | 4 [gun]  | 849   |
|             |       |                       |       |           |       |               |       |          |       |
| 5 [knife]   | 2011  |                       |       |           |       |               |       |          |       |
+-------------+-------+-----------------------+-------+-----------+-------+---------------+-------+----------+-------+


dict_keys(['ann_file', 'data_root', 'img_prefix', 'seg_prefix', 'seg_suffix', 'proposal_file', 'test_mode', 'filter_empty_gt', 'file_client', 'CLASSES', 'coco', 'cat_ids', 'cat2label', 'img_ids', 'data_infos', 'proposals', 'flag', 'pipeline'])

In [33]:
datasets[0].data_infos

[{'file_name': 'img_hod_001565.jpg',
  'height': 303,
  'width': 455,
  'id': 'img_hod_001565',
  'filename': 'img_hod_001565.jpg'},
 {'file_name': 'img_hod_002040.jpg',
  'height': 612,
  'width': 408,
  'id': 'img_hod_002040',
  'filename': 'img_hod_002040.jpg'},
 {'file_name': 'img_hod_001836.jpg',
  'height': 408,
  'width': 612,
  'id': 'img_hod_001836',
  'filename': 'img_hod_001836.jpg'},
 {'file_name': 'img_hod_001920.jpg',
  'height': 408,
  'width': 612,
  'id': 'img_hod_001920',
  'filename': 'img_hod_001920.jpg'},
 {'file_name': 'img_hod_009630.jpg',
  'height': 626,
  'width': 418,
  'id': 'img_hod_009630',
  'filename': 'img_hod_009630.jpg'},
 {'file_name': 'img_hod_001456.jpg',
  'height': 612,
  'width': 567,
  'id': 'img_hod_001456',
  'filename': 'img_hod_001456.jpg'},
 {'file_name': 'img_hod_002027.jpg',
  'height': 408,
  'width': 612,
  'id': 'img_hod_002027',
  'filename': 'img_hod_002027.jpg'},
 {'file_name': 'img_hod_001986.jpg',
  'height': 330,
  'width': 612,

In [34]:
datasets[0].pipeline

Compose(
    LoadImageFromFile(to_float32=False, color_type='color', channel_order='bgr', file_client_args={'backend': 'disk'})
    LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})
    Resize(img_scale=[(1333, 800)], multiscale_mode=range, ratio_range=None, keep_ratio=True, bbox_clip_border=True)
    RandomFlip(flip_ratio=0.5)
    Normalize(mean=[123.675 116.28  103.53 ], std=[58.395 57.12  57.375], to_rgb=True)
    Pad(size=None, size_divisor=32, pad_to_square=False, pad_val={'img': 0, 'masks': 0, 'seg': 255})
    DefaultFormatBundle(img_to_float=True)
    Collect(keys=['img', 'gt_bboxes', 'gt_labels'], meta_keys=('filename', 'ori_filename', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'img_norm_cfg'))
)

In [35]:
model = build_detector(cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))
model.CLASSES = datasets[0].CLASSES
print(model.CLASSES)

('alcohol', 'insulting_gesture', 'blood', 'cigarette', 'gun', 'knife')


In [36]:
import os.path as osp
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_detector(model, datasets, cfg, distributed=False, validate=True) 

2023-10-12 23:07:08,355 - mmdet - INFO - Automatic scaling of learning rate (LR) has been disabled.
2023-10-12 23:07:08,365 - mmdet - INFO - load checkpoint from local path: ./mmdetection/checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth

size mismatch for roi_head.bbox_head.fc_cls.weight: copying a param with shape torch.Size([81, 1024]) from checkpoint, the shape in current model is torch.Size([7, 1024]).
size mismatch for roi_head.bbox_head.fc_cls.bias: copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([7]).
size mismatch for roi_head.bbox_head.fc_reg.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([24, 1024]).
size mismatch for roi_head.bbox_head.fc_reg.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([24]).
2023-10-12 23:07:08,501 - mmdet - INFO - Start running, host: oks@smart3, work_dir: /

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!


2023-10-12 23:17:32,837 - mmdet - INFO - Epoch [1][2000/2324]	lr: 2.500e-03, eta: 0:01:41, time: 0.312, data_time: 0.007, memory: 3778, loss_rpn_cls: 0.0072, loss_rpn_bbox: 0.0133, loss_cls: 0.0851, acc: 97.4375, loss_bbox: 0.0805, loss: 0.1861
2023-10-12 23:19:13,310 - mmdet - INFO - Saving checkpoint at 1 epochs


In [37]:
!pwd

/home/oks/people/egha


### Inference and Result Visualization
#### Paper Section: 4.3 Testing Procedure

In [None]:
import cv2
from mmdet.apis import inference_detector, init_detector, show_result_pyplot

# Loading a sample image for inference
img = cv2.imread('./faster_rcnn_data/rcnn_normal/JPEGImages/img_hod_001544.jpg')

# Setting the configuration for the model
model.cfg = cfg

# Performing inference on the sample image
result = inference_detector(model, img)

# Visualizing the inference results
show_result_pyplot(model, img, result)

#### This marks the end of code snippet for training a Faster-RCNN model on the HOD dataset for normal cases.