Real-world datasets follow an imbalanced distribution, which poses significant challenges in rare-category object detection. Recent studies tackle this problem by developing re-weighting and re-sampling methods, that utilise the class frequencies of the dataset. However, these techniques focus solely on the frequency statistics and ignore the distribution of the classes in image space, missing important information. In contrast to them, we propose FRActal CALibration (FRACAL): a novel post-calibration method for long-tailed object detection. FRACAL devises a logit adjustment method that utilises the fractal dimension to estimate how uniformly classes are distributed in image space. During inference, it uses the fractal dimension to inversely downweight the probabilities of uniformly spaced class predictions achieving balance in two axes: between frequent and rare categories, and between uniformly spaced and sparsely spaced classes. FRACAL is a post-processing method and it does not require any training, also it can be combined with many off-the-shelf models such as one-stage sigmoid detectors and two-stage instance segmentation models. FRACAL boosts the rare class performance by up to 8.6% and surpasses all previous methods on LVIS dataset, while showing good generalisation to other datasets such as COCO, V3Det and OpenImages.
- Training code.
- Evaluation code.
- Provide instance segmentation checkpoint models.
conda create --name fracal python=3.11 -y
conda activate fracal
- Install dependency packages
conda install pytorch torchvision -c pytorch
- Install MMDetection
pip install -U openmim
mim install mmengine
mim install "mmcv==2.1.0"
git clone https://github.com/kostas1515/FRACAL.git
- Create data directory, download COCO 2017 datasets at https://cocodataset.org/#download (2017 Train images [118K/18GB], 2017 Val images [5K/1GB], 2017 Train/Val annotations [241MB]) and extract the zip files:
mkdir data
cd data
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
#download and unzip LVIS annotations
wget https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip
wget https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_val.json.zip
- modify mmdetection/configs/base/datasets/lvis_v1_instance.py and make sure data_root variable points to the above data directory, e.g.,
data_root = '<user_path>'
./tools/dist_train.sh ./configs/<folder>/<model.py> <#GPUs>
./tools/dist_test.sh ./experiments/r50_rfs_cos_lr_norm_4x4_2x_softmax_carafe/r50_rfs_cos_lr_norm_4x4_2x_softmax_carafe.py ./experiments/r50_rfs_cos_lr_norm_4x4_2x_softmax_carafe/epoch_24.pth 8
To test the FRACAL-MaskRCNN ResNet50 RFS with Normalised Mask and Carafe on 4 GPUs run:
./tools/dist_test.sh ./experiments/r50_rfs_cos_lr_norm_4x4_2x_softmax_carafe/fracal_r50_rfs_cos_lr_norm_4x4_2x_softmax_carafe.py ./experiments/r50_rfs_cos_lr_norm_4x4_2x_softmax_carafe/epoch_24.pth 8
- run the get_statistics.py inside the folder ./stat_files/:
python get_statistics.py --dset_name lvis --path ../../../datasets/coco/annotations/lvis_v1_train.json --output ./lvis_v1_train_stats.csv
This will create a csv containing various bounding box statistics such the class, width, height, location etc...
- compute the fractal dimension based on those statistics:
python calculate_fractality.py --dset_name lvisv1 --path ./lvis_v1_train_stats.csv --output lvis_v1_train_fractal_dim.csv
To generate the frequency weights run:
python get_frequency.py --path ../../../datasets/coco/annotations/lvis_v1_train.json --output freq_lvis_v1_train.csv
This will create a csv containing various frequency weights based on instance frequency or image frequency using various link functions. The lvis_v1_train_fractal_dim.csv and freq_lvis_v1_train.csv are used inside the \mmdet\models\roi_heads\bbox_heads\fracal_bbox_head.py script.
The statistical calculations scripts support COCO,LVISv1,LVISv05,V3Det,OpenImages datasets.
| Method | AP | APr | APc | APf | APb | Model |
|---|---|---|---|---|---|---|
| FRACAL-MaskRCNN-R50 | 28.5 | 23.0 | 28.1 | 31.5 | 28.4 | weights |
| FRACAL-MaskRCNN-R101 | 29.9 | 24.6 | 29.3 | 32.8 | 29.8 | weights |
| FRACAL-MaskRCNN-Swin-B | 38.5 | 35.5 | 39.5 | 38.7 | 39.4 | weights |
@article{alexandridis2024fractal,
title={Fractal Calibration for long-tailed object detection},
author={Alexandridis, Konstantinos Panagiotis and Elezi, Ismail and Deng, Jiankang and Nguyen, Anh and Luo, Shan},
journal={arXiv preprint arXiv:2410.11774},
year={2024}
}This code uses Pytorch and the mmdet framework. Thank you for your wonderfull work!
