DeFB: Decomposed Feature Learning for Real-Time Multi-Person Eyeblink Detection in Untrimmed In-the-Wild Videos
Jinfang Gan1,
Wenzheng Zeng1,2*,
Yang Xiao1†,
Xintao Zhang1,
Chaoyang Zheng1,
Ran Zhao1,
Ran Wang3,4,
Min Du5,
Zhiguo Cao1
1Huazhong University of Science and Technology,
2National University of Singapore,
3School of Journalism and Information Communication, HUST,
4School of Future Technology, HUST,
5ByteDance
DeFB achieves a superior accuracy-efficiency balance compared to other SOTA methods.
This repository contains the official implementation of the AAAI 2026 paper "DeFB: Decomposed Feature Learning for Real-Time Multi-Person Eyeblink Detection in Untrimmed In-the-Wild Videos".
-
🔍 Rethinking Unified Models: We identify two critical limitations in existing unified multi-person eyeblink detection models: (1) feature granularity conflict between face localization and eyeblink detection, and (2) unstable face-eye feature learning during joint training.
-
🧩 Decomposed Feature Learning: We propose DeFB, which models faces and eyes in granularity-specific feature spaces. This enables fine-grained spatio-temporal modeling for eyeblink detection while maintaining efficiency for face localization.
-
⚡ Asynchronous Training Strategy: We adopt an asynchronous learning mechanism where eye feature learning refines well-trained coarse face features, significantly improving training stability and convergence.
-
🏆 State-of-the-Art Performance: DeFB doubles the performance compared to previous SOTA (Blink-AP: 24.65% vs. 10.11%) while boosting efficiency by nearly 35%.
-
🔌 Plug-and-Play Capability: DeFB can be integrated as a plug-in to substantially augment the eyeblink detection capabilities of general action detectors.
-
Create a new conda environment:
conda create -n defb python=3.9 conda activate defb
-
Install PyTorch (2.0.1+ is recommended):
pip install torch>=2.0.1 torchvision>=0.15.2
-
Install other dependencies:
pip install -r requirements.txt
-
Download the MPEblink dataset from Zenodo.
-
Organize the dataset as follows:
data/ └── mpeblink/ ├── videos/ │ ├── train/ │ └── val/ ├── annotations/ │ ├── train.json │ └── val.json └── raw_frames/ # Generated in next step -
Convert videos to raw frames:
python tools/mpeblink_build_raw_frames_dataset.py --root $YOUR_DATA_PATH -
Update the dataset path in
configs/dataset/mpeblink.yml.
We provide a video introduction of our work:
We provide a complete pipeline script run_mpeblinkv1.sh that includes all stages:
bash run_mpeblinkv1.shThe pipeline consists of the following stages:
Stage 1: Facial Modeling Training
# First phase training (blink_len=10)
torchrun --nproc_per_node=2 tools/train.py \
-c configs/rtdetrv2/rtdetrv2_r50vd_mpeblink_trainval.yml \
--use-amp \
--seed=0
# Second phase training (blink_len=30)
torchrun --nproc_per_node=2 tools/train.py \
-c configs/rtdetrv2/rtdetrv2_r50vd_mpeblink_trainval_30.yml \
--use-amp \
--seed=0 \
-r output/rtdetrv2_r50vd_mpeblink_trainval/checkpoint.pthStage 2: Inference on Training Set
# Inference on validation set
python test.py -c configs/rtdetrv2/rtdetrv2_r50vd_mpeblink_trainval_30.yml \
-r output/rtdetrv2_r50vd_mpeblink_trainval_30/checkpoint.pth
# Inference on training set for blink module
python infer_trainset.py -c configs/rtdetrv2/rtdetrv2_r50vd_mpeblink_trainval_30.yml \
-r output/rtdetrv2_r50vd_mpeblink_trainval_30/checkpoint.pthStage 3: Blink Module Training
# Split dataset for blink detection
python BlinkModel/split_dataset.py
# Train blink detection module
python BlinkModel/train_blink_detector.py \
-c configs/BlinkModule/blink_module.ymlStage 4: Evaluation
# Full model testing
python BlinkModel/test_eval.py \
-c configs/BlinkModule/blink_module.yml \
--track_result output/rtdetrv2_r50vd_mpeblink_trainval_30/val_results.json
# Convert results with threshold
python tools/instblink_plus_result_convertor_args.py \
--input output/blink_results.json \
--output output/final_results.json \
--threshold 0.07
# Evaluate on MPEblink
python tools/eval_mpeblink.py \
--pred output/final_results.json \
--gt data/mpeblink/annotations/val.json| Type | Method | Blink-AP | Blink-AP0.5 | Blink-AP0.75 | Blink-AP0.95 | Inst-AP |
|---|---|---|---|---|---|---|
| Multi-stage | BlinkFormer | 4.69 | 19.95 | 0.54 | 0.00 | 56.70 |
| Unified | InstBlink | 10.11 | 27.19 | 7.16 | 0.62 | 67.89 |
| Unified | DeFB (Ours) | 24.65 | 44.17 | 24.62 | 4.40 | 76.07 |
| Method | Time per image |
|---|---|
| Multi-stage methods | T (=9.3ms) + latency × #faces |
| InstBlink | 8.9 + D (=2.6ms) |
| DeFB (Ours) | 6.1 + D (=2.6ms) |
This code is built upon RT-DETRv2 and InstBlink. We thank the authors for their excellent work.
If you find our work useful in your research, please consider citing our paper:
@inproceedings{gan2026defb,
title={DeFB: Decomposed Feature Learning for Real-Time Multi-Person Eyeblink Detection in Untrimmed In-the-Wild Videos},
author={Gan, Jinfang and Zeng, Wenzheng and Xiao, Yang and Zhang, Xintao and Zheng, Chaoyang and Zhao, Ran and Wang, Ran and Du, Min and Cao, Zhiguo},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}If you use the MPEblink dataset, please also cite:
@inproceedings{zeng2023real,
title={Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video},
author={Zeng, Wenzheng and Xiao, Yang and Wei, Sicheng and Gan, Jinfang and Zhang, Xintao and Cao, Zhiguo and Fang, Zhiwen and Zhou, Joey Tianyi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages={13854--13863},
year={2023}
}This project is released under the Apache 2.0 license.
For questions and suggestions, please open an issue or contact Jinfang Gan (jinfangan@hust.edu.cn).

