YOLOFT: An Extremely Small Video Object Detection Benchmark Baseline

📢 Introduction

This is the official implementation of the baseline model for XS-VID benchmark.

🎡 Dependencies

CUDA 11.7
Python 3.8
PyTorch 1.12.1(cu116)
TorchVision 0.13.1(cu116)
numpy 1.24.4

📂 Datasets

Our work is based on the large-scale extremely small video object detection benchmark XS-VID. Download the dataset(s) from corresponding links below.

[Google drive]：annotations; images(0-3); images(4-5);
[BaiduNetDisk]：annotations and images;

Please choose a download method to download the annotations and all images. Make sure all the split archive files (e.g., images.zip, images.z01, images.z02, etc.) are in the same directory. Use the following command to extract them:

unzip images.zip
unzip annotations.zip

We have released several annotation formats to facilitate subsequent research and use, including COCO, COCOVID, YOLO

🛠️ Install

This repository is build on Ultralytics 8.0.143 which can be installed by running the following scripts. Please ensure that all dependencies have been satisfied before setting up the environment.

conda create --name yoloft python=3.8
conda activate yoloft
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
git clone https://github.com/gjhhust/YOLOFT
cd YOLOFT
pip install -r requirements.txt 
pip install -e .

cd ./ultralytics/nn/modules/ops_dcnv3
python setup.py build install

cd ../alt_cuda_corr_sparse
python setup.py build install

⌛ Data preparation

If you want to use a custom video dataset for training tests, it needs to be converted to yolo format for annotation, and the dataset files are organized in the following format:

data_root_dir/               # Root data directory
├── test.txt                 # List of test data files, each line contains a relative path to an image file
├── train.txt                # List of training data files, each line contains a relative path to an image file
├── images/                  # Directory containing image files
│   ├── video1/              # Directory for image files of the first video
│   │   ├── 0000000.png      # First frame image file of the first video
│   │   └── 0000001.png      # Second frame image file of the first video
│   ├── video2/              # Directory for image files of the second video
│   │   └── ...              # More image files
│   └── ...                  # More video directories
└── labels/                  # Directory containing label files
    ├── video1/              # Directory for label files of the first video
    │   ├── 0000000.txt      # Label file for the first frame of the first video (matches the image file)
    │   └── 0000001.txt      # Label file for the second frame of the first video (matches the image file)
    ├── video2/              # Directory for label files of the second video
    │   └── ...              # More label files
    └── ...                  # More video directories

Note: The name of the image and the name of the label in yolo format must be the same, and the format is frameNumber.png, e.g. "0000001.png and 0000001.txt".

🚀 Training

One training session

python tools/train_yoloft.py

Multiple GPUs please change devices

Repeat training several times for a model configuration yaml file

python tools/XSVID/yoloft_baseline.py

Parameters:

repeats: repetition
model_config_path: Model yaml
pretrain_model: pre-training weight
dataset_config_path: Dataset Profiles
training_config_path: Training hypercamera configurations

Eventually, you will get a log file containing the results of all the repeated experiments, and you can analyze the log file to get the optimal results and the location where they are saved:

python tools/XSVID/analy_log.py path/to/xxxx.log

Comparison test on multiple model configuration yaml files

python tools/XSVID/yoloft_conpresion.py

Parameters:

repeats: Number of training repetitions for a single model file
model_config_dir: Directory of model profiles to be compared

Eventually, we will get all the log files of the comparison experiments and a csv of all the experimental results in runs/logs, which can be further analyzed to get the optimal configuration of the experimental model:

python tools/XSVID/analy_csv.py path/to/xxxx.csv

Here's a brief usage guide for running the script:

📈 Evaluation

python tools/test_yoloft.py

When save_json=True, evaluations in coco format will be output for training and testing, otherwise only Ultralytics' own results evaluations will be output

To evaluate the performance of other models, you can use the eval tool

Predict videos script

This script processes images or videos by loading data from the specified directories, applying a model for predictions, and saving the results and videos in the desired output directory.

Command-line Arguments

image_dir: (Required) Path to the directory containing images or subdirectories of images. If the --mode is set to 'one', this should be a directory containing all images. If set to 'muti', this should be a directory containing subdirectories, where each subdirectory represents a video.
checkpoint: (Required) Path to the model checkpoint that will be used for predictions.
--save_dir: (Required) Path to the directory where prediction results and generated videos will be saved.
--mode: (Required) Mode of operation. Choices are:
- 'one': The image_dir contains all images to be processed.
- 'muti': The image_dir contains multiple subdirectories, with each subdirectory corresponding to a different video.
--eval_json: (Optional) Path to the evaluation JSON file for model evaluation.

Example Command

Mode: one

Description: Use this mode when all image files are directly inside a single directory.

File Structure Example:

/path/to/video_name/            # Directory to be set as image_dir
├── 000001.png
├── 000002.png
├── 000003.png
└── ...

Usage: Set mode=one and image_dir=/path/to/video_name.

python tools/predict_yoloft.py /path/to/video_name yoloft-L.pt --save_dir /path/to/save --mode one

Mode: muti

Description: Use this mode when the directory contains multiple subdirectories, each representing a different video.

File Structure Example:

/path/to/videos_dir/            # Directory to be set as image_dir
├── video_name1/                    # Subdirectory for the first video
│   ├── 000001.png
│   ├── 000002.png
│   └── ...
├── video_name2/                    # Subdirectory for the second video
│   ├── 000001.png
│   ├── 000002.png
│   └── ...
└── ...

Usage: Set mode=muti and image_dir=/path/to/videos_dir.

python tools/predict_yoloft.py /path/to/videos_dir yoloft-L.pt --save_dir /path/to/save --mode muti

Convert to onnx

yolo export model=./YOLOFT-L.pt  imgsz=1024,1024 format=onnx opset=12

🏆 Result

Result on XS-VID

Method	Schedule	Backbone	$AP$	$AP_{50}$	$AP_{75}$	$AP_{eS}$	$AP_{rS}$	$AP_{gS}$	Inference(ms)
DFF	1x	R50	9.4	15	10.2	0.0	0.3	3.0	20.0
DFF	1x	x101	9.6	16.9	9.9	0.0	0.5	4.5	25.5
FGFA	1x	R50	7.8	18.8	5.0	1.1	2.0	6.1	151.0
FGFA	1x	x101	12.3	18.0	14.1	0.2	1.1	6.4	181.8
SELSA	1x	R50	13.6	18.1	15.5	0.0	2.2	8.1	88.5
SELSA	1x	x101	13.6	18.8	15.8	0.0	1.7	8.3	110.0
TROI	1x	R50	12.3	16.9	14.0	0.0	1.3	5.6	232.0
TROI	1x	x101	12.8	18.5	14.7	0.0	1.3	7.6	285.7
MEGA	1x	R101	7.8	18.8	5.0	1.1	2.0	6.1	nan
DiffusionVID	50e	R101	10.6	24.3	8.2	2.7	5.6	9.4	nan
TransVOD	50e	R50	21.8	39.6	21.1	8.8	13.6	20.5	136.0
StreamYOLO	1x	YOLOX	33.4	47.3	37.5	18.7	26.7	33.6	47.5
FCOS	1x	R50	24.9	41.3	24.8	7.7	17.3	22.6	31.8
ATSS	1x	R50	26.9	43.3	26.8	8.4	19.2	23.9	34.9
YOLOX-S	50e	YOLOX	29.1	44.0	30.4	15.0	20.0	25.6	24.0
YOLOX-L	50e	YOLOX	31.0	44.9	33.8	17.4	21.7	25.6	37.4
DyHead	1x	R50	23.7	39.6	22.7	7.0	15.9	20.5	98.0
RepPoints	1x	R50	23.7	41.7	22.8	9.1	18.6	23.9	37.8
Deformable-DETR	1x	R50	21.3	38.0	21.3	11.3	13.7	18.7	52.3
Sparse RCNN	1x	R50	21.0	34.2	21.8	9.0	13.9	17.5	41.8
Cascade RPN	1x	R50	27.0	44.5	26.6	13.5	19.4	22.1	45.3
CESCE	15e	nan	22.6	40.1	21.5	10.3	16.2	21.3	31.0
CFINet	1x	R50	29.5	48.8	31.0	16.6	21.8	25.1	47.1
Yolov8-s	2x	YOLOv8	30.0	45.3	32.1	17.8	24.1	27.0	14.0
Yolov8-L	2x	YOLOv8	33.6	48.8	36.9	21.3	27.4	32.7	26.0
Yolov9-C	2x	nan	31.6	47.0	34.3	18.4	24.6	31.2	22.0
YOLOFT-S	2x	YOLOv8	32.9	49.2	36.5	21.4	26.5	34.2	16.0
YOLOFT-L	2x	YOLOv8	36.4	52.9	41.2	24.7	28.9	33.4	36.0

Result on Visdrone2019 VID(test-dev)

Method	Schedule	Backbone	$AP$	$AP_{50}$	$AP_{75}$	$AP_{eS}$	$AP_{rS}$	$AP_{gS}$	$AP_{m}$	$AP_{l}$
DFF	1x	R50	5.8	12.2	4.9	0.0	0.2	1.1	6.9	12.4
DFF	1x	x101	10.3	20.8	9.1	0.0	0.1	3.4	13.6	21.8
FGFA	1x	R50	7.5	14.5	7.1	0.0	0.2	1.5	9.6	17.0
FGFA	1x	x101	13.6	29.2	10.5	0.0	0.9	6.3	17.8	28.5
SELSA	1x	R50	6.7	12.7	6.4	0.0	0.2	1.2	8.6	15.0
SELSA	1x	x101	11.8	23.0	11.1	0.0	0.5	2.7	14.3	30.2
TROI	1x	R50	7.9	15.9	7.0	0.0	0.2	1.5	10.3	16.3
TROI	1x	x101	12.0	23.9	10.4	0.0	0.1	4.8	16.6	24.7
TransVOD	50e	R50	9.7	21.1	8.0	1.0	3.2	4.9	11.5	23.8
StreamYOLO	1x	YOLOX	18.0	35.0	16.7	1.6	5.1	10.6	22.3	33.9
FCOS	1x	R50	12.4	24.6	11.5	1.3	3.1	4.8	13.8	30.6
ATSS	1x	R50	13.7	28.2	11.9	1.5	4.6	7.2	16.2	29.9
YOLOX-S	50e	YOLOX	7.8	17.0	6.4	1.6	3.5	5.6	10.4	12.8
DyHead	1x	R50	9.3	19.3	8.0	1.4	3.5	5.0	10.7	20.7
RepPoints	1x	R50	13.6	28.3	11.7	0.7	3.9	5.4	16.3	29.0
Deformable-DETR	1x	R50	9.8	20.2	8.4	2.5	3.7	5.1	11.9	19.5
Sparse RCNN	1x	R50	8.1	16.6	7.1	1.0	2.9	4.5	9.5	16.0
Cascade RPN	1x	R50	12.5	25.0	11.3	0.9	3.9	6.2	15.1	25.3
CESCE	15e	nan	11.0	23.4	9.3	1.7	3.5	4.4	13.0	23.8
CFINet	1x	R50	12.2	25.8	10.0	1.0	3.3	6.3	15.1	25.8
Yolov8-s	2x	YOLOv8	13.2	26.1	12.1	3.9	5.0	10.1	16.1	22.9
Yolov8-L	2x	YOLOv8	16.0	31.2	15.2	3.6	5.1	9.9	19.7	27.3
Yolov9-C	2x	nan	15.5	30.3	14.3	1.8	5.8	9.8	19.1	33.4
YOLOFT-S	2x	YOLOv8	14.8	29.4	13.6	4.4	6.1	10.8	16.4	26.2
YOLOFT-L	2x	YOLOv8	15.8	31.4	14.4	4.9	6.5	11.8	19.4	25.8

📚 Checkpoints

Model	Params (M)	FLOPs (G)	Inference (ms)	Dataset	Checkpoint
YOLOFT-L	45.16	230.14	36	XS-VID	yoloft-L.pt
YOLOFT-S	53.58	13.02	16	XS-VID	yoloft-S.pt

📧 Contact

If you have any problems about this repo or XS-VID benchmark, please be free to contact us at gjh_hust@hust.edu.cn 😉

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.config/Ultralytics		.config/Ultralytics
.vscode		.vscode
config		config
docs		docs
examples		examples
pth		pth
tools		tools
ultralytics		ultralytics
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README.zh-CN.md		README.zh-CN.md
__init__.py		__init__.py
bus.jpg		bus.jpg
mkdocs.yml		mkdocs.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YOLOFT: An Extremely Small Video Object Detection Benchmark Baseline

📢 Introduction

🎡 Dependencies

📂 Datasets

🛠️ Install

⌛ Data preparation

🚀 Training

One training session

Repeat training several times for a model configuration yaml file

Comparison test on multiple model configuration yaml files

📈 Evaluation

Predict videos script

Command-line Arguments

Example Command

Convert to onnx

🏆 Result

Result on XS-VID

Result on Visdrone2019 VID(test-dev)

📚 Checkpoints

📧 Contact

About

Releases

Packages

Languages

License

gjhhust/YOLOFT

Folders and files

Latest commit

History

Repository files navigation

YOLOFT: An Extremely Small Video Object Detection Benchmark Baseline

📢 Introduction

🎡 Dependencies

📂 Datasets

🛠️ Install

⌛ Data preparation

🚀 Training

One training session

Repeat training several times for a model configuration yaml file

Comparison test on multiple model configuration yaml files

📈 Evaluation

Predict videos script

Command-line Arguments

Example Command

Convert to onnx

🏆 Result

Result on XS-VID

Result on Visdrone2019 VID(test-dev)

📚 Checkpoints

📧 Contact

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages