# Environment Setting

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os

root = '/content/drive/MyDrive/ITM_AI/CD-DETR'
os.chdir(root)

In [None]:
!chmod -R 777 .
!pip install -r requirements.txt

Collecting submitit (from -r requirements.txt (line 3))
  Downloading submitit-1.5.1-py3-none-any.whl (74 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m74.7/74.7 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
Collecting addict (from -r requirements.txt (line 8))
  Downloading addict-2.4.0-py3-none-any.whl (3.8 kB)
Collecting yapf (from -r requirements.txt (line 9))
  Downloading yapf-0.40.2-py3-none-any.whl (254 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m254.7/254.7 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting timm (from -r requirements.txt (line 10))
  Downloading timm-0.9.12-py3-none-any.whl (2.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: addict, submitit, yapf, timm
Successfully installed addict-2.4.0 submitit-1.5.1 timm-0.9.12 yapf-0.40.2


#Motivation

<img src='https://drive.google.com/uc?id=148hxhxhibMlL_YW56t35DrmPFOwjYIQn' height=400>

### **Project Motivation**
* Overcome deep learning's inability to adapt to new, emerging classes not
present in the initial dataset.
* Enhance real-world utility as data constantly evolves.

### **Problem Statement**
- Implementing class incremental learning in object detection, a vital part of computer vision, presents unique challenges.
- Complexity of identifying multiple classes within a single image in object detection.


### **Project Aim**
- Introduce a buffer training strategy optimized for object detection tasks.
- Improve replay methods to make incremental learning more suitable for real-world applicability.


#A description of the data

## COCO Dataset (Common Object in Context)


We choose the MS COCO 2017 (Microsoft Common Objects in Context) dataset
- One of the benchmark in Object Detection
   
**Why COCO 2017?**
- Scale and diversity of categories reflect the complexity of real-world data
- Uneven distribution of images across classes mirrors real-world variety and imbalance.
   
**Dataset Details:**
- Consists of over 200,000 images from various everyday scenes
- Annotated with object bounding boxes and class labels across 80 diverse object categories

### Dataset Details

<img src='https://drive.google.com/uc?id=1Serx-VakSQVrPEdmddeltIUmKjLQudEs' height=300 >

- Quite unbalanced class distribution
- This is normal in the context of Object Detection Task!

<img src='https://drive.google.com/uc?id=1bBf8fwdPev2Cl3vizpXmjFeR_FqunWiu' height=300 >

- Dataset Sample Image
- Configured to 'Common', life-related Objects


<img src='https://drive.google.com/uc?id=1CdTIuloNzSY1J_1cbm8my8n6jFDP2Izm' height = 300>

- Annotation format of Object Detection
- Object Detection needs class(category) id & bbox coordinations *BOTH*!


We used coco.sh to download the overall dataset in local!
> Too large dataset size (~=30GB), We downloaded into local and moved to Google Drive.


```
mkdir COCODIR
cd COCODIR

wget -c http://images.cocodataset.org/zips/train2017.zip
echo "Extracting train2017.zip"
unzip -qq train2017.zip
rm train2017.zip

wget -c http://images.cocodataset.org/zips/val2017.zip
...(similar to upon)...

wget -c http://images.cocodataset.org/annotations/annotations_trainval2017.zip
...(similar to upon)...

cd annotations
find . -type f \
    -not -wholename ./instances_train2017.json \
    -not -wholename ./instances_val2017.json \
    -delete

echo "DONE."
```



In [None]:
# COCO Dataset - benchmark of Object Detection
!bash /content/drive/MyDrive/ITM_AI/CD-DETR/coco.sh

mkdir: cannot create directory ‘COCODIR’: File exists
--2023-12-10 14:20:56--  http://images.cocodataset.org/zips/train2017.zip
Resolving images.cocodataset.org (images.cocodataset.org)... 52.217.12.36, 3.5.28.199, 52.216.221.129, ...
Connecting to images.cocodataset.org (images.cocodataset.org)|52.217.12.36|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19336861798 (18G) [application/zip]
Saving to: ‘train2017.zip’


2023-12-10 14:45:50 (12.3 MB/s) - ‘train2017.zip’ saved [19336861798/19336861798]

Extracting train2017.zip
replace train2017/000000147328.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

# Hyperparameter and architecture choices that were explored

### Selected Baseline Architecture - Deformable DETR

<img src='https://drive.google.com/uc?id=1PRx1SB55yMnqNmE6GF7BTfFmuyWTJvY1' height=300 >

- **Object Detector exploiting *Transformer***
  - Previous object detection architecture was usually established on CNN

<br/>

- **Deformable Attetion**
  - Enhancing spatial awareness and improve localization accuracy
  - particularly for deformed or irregular object shapes

<br/>

- DETR(DEtection TRansformer) => Increase **Stability** and **Peformance**!
- We selected this due to its *clear structure and stability*.

### Replay Management - for Incremental Object Detection

<br/>

<img src='https://drive.google.com/uc?id=1mskrvJgBJw4duszykDBgl9tEcJBFRAOc' height=200>

**ICaRL**(Rebuffi et al. 2017)
- Presented Replay method First
- Replay: collect some *sample* of respective class in the *buffer*
- Reuse the samples in the buffer when new dataset is trained
- Exploited mean-of-feature to collect the Replay Buffer
- **Targeted to Classification!**

<br/>

Then, in Object Detection?

<img src='https://drive.google.com/uc?id=1d_DwVJAtnc5_6lWeY5XIZREKl_uFQuJc' height=300 >

- Newly emerging classes(dataset)
- Multiple classes in one image => main difference between classification
- How to collect the buffer is main issue.


### Effective & handling Class imbalance in Buffer Management Strategy

<img src='https://drive.google.com/uc?id=1nfc66uhmi-bayybVOPxp5Xi7MQvJOCHi' height=300 >

To tailoring the Replay method into Object Detection domain, We establisehd *two* criteria.
- **Number of unique labels** in a scene
- **Train-loss based strategy**; more efficient retrieving

Also, Consider the possibility that sparse classes will not be collected in the buffer
- **GM(Guarantee Mimimum)**
  - Determine the minimum number of images that should be included per class


### Exploting Buffer in max!

<img src='https://drive.google.com/uc?id=1wJMktMN4WgVspSJ3Qhd2CvXRSJ3MPJFp' height=300 >

Buffer is relatevely too small than newly emerging dataset, So we devised how to fully use it.

- **CER (Circular Experience Replay) Training Strategy**
  - Repeatedly train buffer image into new training session.
  > We combines Experience Replay (ER) training with circular training, where older samples are replaced over time to avoid fixation on outdated information and overfitting.

### Selected Hyperparameters

1. **Task**.   
Current dataset + How many newly emerging dataset?
  - We selected 2
  - Simplest setting

2. **Guarantee Minimum.**  
Restricting number of images in buffer
  - limit image 1200
  - least image 12 (1%)

3. **Learning rate scheduler.**  
For effective learning.  
  - Step LR
  - lr:2e-4

<img src='https://drive.google.com/uc?id=1UVa2FmtM_CPU6Tk_-Vdyf-O9DNNNPLIO' height=200 >

# Results


BTW, We runned our code in owned server due to memory issue & assining GPU problem in GCP ...🤯

## We compared our results with

* Upper Bound
  - maximum potential performance achieved by joint training across all classes.
* Under Bound
  - Without the any incremental method, which experiences a significant forgetting.
* Mosaic
  - Random buffer sampling + Buffer Expansion through Mosaic Augmentation.  
  <img src='https://drive.google.com/uc?id=1LrPuImSoUpo1hyKLCr6SDKqSGIMJ-31q' height=200 >
  - Large buffer capacity more than 10%.

* Ours
  -  Small buffer limit but still performs well
  - 1% Buffer in regards to COCO Dataset (1200 images)

<img src='https://drive.google.com/uc?id=1HPF7oEBaF75mrYGVCZh7G3oVkeKzh6ps' >

<br/>
<br/>

### Ours with Ablation Study
  - Compare performance by adjusting the number of circular replay and normal replay epochs
    - (CER/ER)
    - 48/2
    - 47/3
    - 45/5

<img src='https://drive.google.com/uc?id=1gY8jIjvWYKDcAIqhQXpPlRnUdllEP9zv' height=300>

## Our performance + Abiliation

In [None]:
!python main.py \
    --output_dir exps/deform \
    --coco_path ../COCODIR \
    --batch_size 12 \
    --resume ./pth/cd-detr-v3.pth \
    --with_box_refine \
    --eval

Not using distributed mode
git:
  sha: 11169a60c33333af00a4849f1808023eba96a931, status: has uncommited changes, branch: main

Namespace(lr=0.0002, lr_backbone_names=['backbone.0'], lr_backbone=2e-05, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_linear_proj_mult=0.1, batch_size=12, weight_decay=0.0001, epochs=50, lr_drop=40, lr_drop_epochs=None, clip_max_norm=0.1, sgd=False, with_box_refine=True, two_stage=False, frozen_weights=None, backbone='resnet50', dilation=False, position_embedding='sine', position_embedding_scale=6.283185307179586, num_feature_levels=4, enc_layers=6, dec_layers=6, dim_feedforward=1024, hidden_dim=256, dropout=0.1, nheads=8, num_queries=300, dec_n_points=4, enc_n_points=4, masks=False, aux_loss=True, set_cost_class=2, set_cost_bbox=5, set_cost_giou=2, mask_loss_coef=1, dice_loss_coef=1, cls_loss_coef=2, bbox_loss_coef=5, giou_loss_coef=2, focal_alpha=0.25, dataset_file='coco', coco_path='../COCODIR', coco_panoptic_path=None, remove_difficult

In [None]:
!python main.py \
    --output_dir exps/deform \
    --coco_path ../COCODIR \
    --batch_size 12 \
    --resume ./pth/cd-detr-v1.pth \
    --with_box_refine \
    --eval

Not using distributed mode
git:
  sha: 11169a60c33333af00a4849f1808023eba96a931, status: has uncommited changes, branch: main

Namespace(lr=0.0002, lr_backbone_names=['backbone.0'], lr_backbone=2e-05, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_linear_proj_mult=0.1, batch_size=12, weight_decay=0.0001, epochs=50, lr_drop=40, lr_drop_epochs=None, clip_max_norm=0.1, sgd=False, with_box_refine=True, two_stage=False, frozen_weights=None, backbone='resnet50', dilation=False, position_embedding='sine', position_embedding_scale=6.283185307179586, num_feature_levels=4, enc_layers=6, dec_layers=6, dim_feedforward=1024, hidden_dim=256, dropout=0.1, nheads=8, num_queries=300, dec_n_points=4, enc_n_points=4, masks=False, aux_loss=True, set_cost_class=2, set_cost_bbox=5, set_cost_giou=2, mask_loss_coef=1, dice_loss_coef=1, cls_loss_coef=2, bbox_loss_coef=5, giou_loss_coef=2, focal_alpha=0.25, dataset_file='coco', coco_path='../COCODIR', coco_panoptic_path=None, remove_difficult

In [None]:
!python main.py \
    --output_dir exps/deform \
    --coco_path ../COCODIR \
    --batch_size 12 \
    --resume ./pth/cd-detr-v2.pth \
    --with_box_refine \
    --eval

Not using distributed mode
git:
  sha: 11169a60c33333af00a4849f1808023eba96a931, status: has uncommited changes, branch: main

Namespace(lr=0.0002, lr_backbone_names=['backbone.0'], lr_backbone=2e-05, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_linear_proj_mult=0.1, batch_size=12, weight_decay=0.0001, epochs=50, lr_drop=40, lr_drop_epochs=None, clip_max_norm=0.1, sgd=False, with_box_refine=True, two_stage=False, frozen_weights=None, backbone='resnet50', dilation=False, position_embedding='sine', position_embedding_scale=6.283185307179586, num_feature_levels=4, enc_layers=6, dec_layers=6, dim_feedforward=1024, hidden_dim=256, dropout=0.1, nheads=8, num_queries=300, dec_n_points=4, enc_n_points=4, masks=False, aux_loss=True, set_cost_class=2, set_cost_bbox=5, set_cost_giou=2, mask_loss_coef=1, dice_loss_coef=1, cls_loss_coef=2, bbox_loss_coef=5, giou_loss_coef=2, focal_alpha=0.25, dataset_file='coco', coco_path='../COCODIR', coco_panoptic_path=None, remove_difficult

## Comparision with Others

In [None]:
!python main.py \
    --output_dir exps/deform \
    --coco_path ../COCODIR \
    --batch_size 12 \
    --resume ./pth/underbound.pth \
    --with_box_refine \
    --eval

Not using distributed mode
git:
  sha: 11169a60c33333af00a4849f1808023eba96a931, status: has uncommited changes, branch: main

Namespace(lr=0.0002, lr_backbone_names=['backbone.0'], lr_backbone=2e-05, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_linear_proj_mult=0.1, batch_size=12, weight_decay=0.0001, epochs=50, lr_drop=40, lr_drop_epochs=None, clip_max_norm=0.1, sgd=False, with_box_refine=True, two_stage=False, frozen_weights=None, backbone='resnet50', dilation=False, position_embedding='sine', position_embedding_scale=6.283185307179586, num_feature_levels=4, enc_layers=6, dec_layers=6, dim_feedforward=1024, hidden_dim=256, dropout=0.1, nheads=8, num_queries=300, dec_n_points=4, enc_n_points=4, masks=False, aux_loss=True, set_cost_class=2, set_cost_bbox=5, set_cost_giou=2, mask_loss_coef=1, dice_loss_coef=1, cls_loss_coef=2, bbox_loss_coef=5, giou_loss_coef=2, focal_alpha=0.25, dataset_file='coco', coco_path='../COCODIR', coco_panoptic_path=None, remove_difficult

In [None]:
!python main.py \
    --output_dir exps/deform \
    --coco_path ../COCODIR \
    --batch_size 12 \
    --resume ./pth/upperbound.pth \
    --with_box_refine \
    --eval

Not using distributed mode
git:
  sha: 11169a60c33333af00a4849f1808023eba96a931, status: has uncommited changes, branch: main

Namespace(lr=0.0002, lr_backbone_names=['backbone.0'], lr_backbone=2e-05, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_linear_proj_mult=0.1, batch_size=12, weight_decay=0.0001, epochs=50, lr_drop=40, lr_drop_epochs=None, clip_max_norm=0.1, sgd=False, with_box_refine=True, two_stage=False, frozen_weights=None, backbone='resnet50', dilation=False, position_embedding='sine', position_embedding_scale=6.283185307179586, num_feature_levels=4, enc_layers=6, dec_layers=6, dim_feedforward=1024, hidden_dim=256, dropout=0.1, nheads=8, num_queries=300, dec_n_points=4, enc_n_points=4, masks=False, aux_loss=True, set_cost_class=2, set_cost_bbox=5, set_cost_giou=2, mask_loss_coef=1, dice_loss_coef=1, cls_loss_coef=2, bbox_loss_coef=5, giou_loss_coef=2, focal_alpha=0.25, dataset_file='coco', coco_path='../COCODIR', coco_panoptic_path=None, remove_difficult

In [None]:
!python main.py \
    --output_dir exps/deform \
    --coco_path ../COCODIR \
    --batch_size 12 \
    --resume ./pth/mosaic.pth \
    --with_box_refine \
    --eval

Not using distributed mode
git:
  sha: 11169a60c33333af00a4849f1808023eba96a931, status: has uncommited changes, branch: main

Namespace(lr=0.0002, lr_backbone_names=['backbone.0'], lr_backbone=2e-05, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_linear_proj_mult=0.1, batch_size=12, weight_decay=0.0001, epochs=50, lr_drop=40, lr_drop_epochs=None, clip_max_norm=0.1, sgd=False, with_box_refine=True, two_stage=False, frozen_weights=None, backbone='resnet50', dilation=False, position_embedding='sine', position_embedding_scale=6.283185307179586, num_feature_levels=4, enc_layers=6, dec_layers=6, dim_feedforward=1024, hidden_dim=256, dropout=0.1, nheads=8, num_queries=300, dec_n_points=4, enc_n_points=4, masks=False, aux_loss=True, set_cost_class=2, set_cost_bbox=5, set_cost_giou=2, mask_loss_coef=1, dice_loss_coef=1, cls_loss_coef=2, bbox_loss_coef=5, giou_loss_coef=2, focal_alpha=0.25, dataset_file='coco', coco_path='../COCODIR', coco_panoptic_path=None, remove_difficult

# Insights and discussions


### Contribution
- The project pioneers a replay scheme in class-incremental learning, specifically tailored for object detection, filling a significant gap in current research.

- Our project arranges the replay buffer based on training loss, enhancing the efficiency of knowledge retrieval and potentially inspiring future machine learning developments.

- The proposed circular training strategy addresses the data quantity asymmetry between new samples and the replay buffer, offering a solution to class imbalance.

### Limitation
- More Detailed experiement can be performed.
  - For example, Task1-70, Task2-10


### Future Research
- Implement this approach in *open-world settings*
  - Not divided datset, Add some samples subsequently.
  - can give a substantial impact in areas like autonomous driving and robotics.


# Reference

- https://www.researchgate.net/figure/Distribution-of-the-numbers-of-individual-categories-in-the-MS-COCO-dataset-The-dataset_fig1_368788377
- https://www.researchgate.net/figure/Sample-images-from-the-COCO-dataset_fig3_344601010
- https://github.com/fcdl94/MMA
- Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. iCaRL: Incremental classifier and representation learning. In CVPR, 2017
- Jeng-Lun Shieh, Qazi Mazhar ul Haq, Muhamad Amirul Haq, Said Karam, Peter Chon dro, De-Qin Gao, and Shanq-Jang Ruan. Continual learning strategy in one-stage obj ect detection framework based on experience replay for autonomous driving vehicle. Sensors, 2020
- Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M Ni, and Lei Zhang. Dn-detr: Acc elerate detr training by introducing query denoising. In CVPR, 2022
- Konstantin Shmelkov, Cordelia Schmid, and Karteek Alahari. Incremental learning of o bject detectors without catastrophic forgetting. In ICCV, 2017
- Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable transformers for end-to-end object detection. ICLR, 2020
- Manoj Acharya, Tyler L Hayes, and Christopher Kanan. Rodeo: Replay for online obje ct detection. BMVC, 2020
- Xialei Liu, Hao Yang, Avinash Ravichandran, Rahul Bhotika, and Stefano Soatto. Multi- task incremental learning for object detection. arXiv, 2022
