Author: Qing Zhou, Tao Yang, BingXuan Zhao, Hongyuan Zhang, Junyu Gao, Qi Wang
RePB achieves lossless dynamic pruning by resolving score drift and gradient bias, requiring only 3 lines of code to integrate.
Support pruning methods: Random, InfoBatch, SeTa and RePB.
pip install git+https://github.com/mrazhou/RePBOr you can clone this repo and install it locally.
git clone https://github.com/mrazhou/RePB
cd RePB
pip install -e .To adapt your code with RePB/SeTa/InfoBatch, just change the following three lines:
from repb import prune
# 1. Wrap dataset
train_data = prune(train_data, args) # args: prune_type, epochs...
# 2. Pass sampler to DataLoader
train_loader = DataLoader(train_data, sampler=train_data.sampler)
for epoch in range(args.epochs):
for batch in train_loader:
# 3. Update loss
loss = train_data.update(loss)In this repository, we provide two examples to demonstrate the usage of RePB. CIFAR10/CIFAR100 (support resnet18/50/101) and ImageNet (support various CNNs/Transformers/Mamba) are used as the datasets.
- CIFAR10/CIFAR100 (for exploratory research.)
# For lossless pruning with RePB
bash scripts/cifar.sh RePB 0.6
# For with SeTa
bash scripts/cifar.sh SeTa 0.1
# For InfoBatch
bash scripts/cifar.sh InfoBatch 0.5
# For Static Random
bash scripts/cifar.sh Static- ImageNet (for large-scale and cross-architecture comprehensive validation.)
# For lossless pruning
bash scripts/imagenet.sh
# For CNNs
bash scripts/imagenet.sh RePB mobilenetv3_small_050
# For Transformers
bash scripts/imagenet.sh RePB vit_tiny_path16_224
# For Vim
# refer to https://github.com/hustvl/Vim for more detailsIf you find this repository helpful, please consider citing our paper:
@inproceedings{zhou2026inconsistency,
title={Inconsistency Biases in Dynamic Data Pruning},
author={Qing Zhou and Tao Yang and Bingxuan Zhao and Hongyuan Zhang and Junyu Gao and Qi Wang},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
}and the SeTa/InfoBatch paper:
@inproceedings{zhou2025scale,
title={Scale Efficient Training for Large Datasets},
author={Zhou, Qing and Gao, Junyu and Wang, Qi},
booktitle={CVPR},
pages={20458-20467},
year={2025}
}
@inproceedings{qin2024infobatch,
title={InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning},
author={Qin, Ziheng and Wang, Kai and Zheng, Zangwei and Gu, Jianyang and Peng, Xiangyu and Zhaopan Xu and Zhou, Daquan and Lei Shang and Baigui Sun and Xuansong Xie and You, Yang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024}
}Thanks very much to InfoBatch for implementing the basic framework for dynamic pruning by three lines of code.

