Weiheng Zhao1 · Zilong Huang2 · Xinggang Wang1 · Jiashi Feng2
HUST Vision Lab1 & Bytedance2
The reason you should try SuperCLIP: It offers significant gains with only a 0.077% increase in FLOPs and no extra annotated data needed. It dramatically alleviates the performance drop of CLIP-style models under small-batch training, and is fully compatible with modern CLIP variants (e.g., SigLIP, FLIP) while also delivering clear improvements when integrated into multi-modal LLM frameworks like LLaVA.
- 2025-09-19: Accepted by NeurIPS 2025. [✔]
- 2025-11-06: Code release. [✔]
# Clone the repository
git clone https://github.com/hustvl/SuperCLIP.git
cd SuperCLIP
# Install dependencies
pip install -r requirements.txt- Datacomp-1B: https://github.com/mlfoundations/datacomp
- ImageNet-1K: https://www.image-net.org/download.php
Update the paths in the training script to point to your local datasets:
- Set DATA_PATH to the Datacomp-1B root.
- Set VAL_DATA_PATH to the ImageNet-1K validation set.
File to edit: train.sh
Start SuperCLIP training with:
bash train.sh <config_path> superclipOur codebase is built upon:
- OpenCLIP: https://github.com/mlfoundations/open_clip
- SuperClass: https://github.com/x-cls/superclass?tab=readme-ov-file
We thank the OpenCLIP and SuperClass teams for contributing such impressive code and models to the community.
