PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models

Seongjae Kang, Dong Bok Lee, Hyungjoon Jang, Dong Sup Kim, Sung Ju Hwang

[Project Page] [Paper]

✨ Introduces an ActiveKD (PCoreSet) framework for Active Learning under Knowledge Distillation
🎯 Enables efficient knowledge transfer from Vision-Language Models to smaller student models
🔄 Combines active learning strategies with knowledge distillation for optimal sample selection

🔥 News

[2024.06.03] 📦 Code released!
[2024.06.01] 📄 ActiveKD paper is released.

🛠️ Code

🚀 Training ActiveKD

We provide convenient script files to run all experiments:

scripts_others_zeroshot.sh: Run zero-shot experiments on other datasets
scripts_others_fewshot.sh: Run few-shot experiments on other datasets
scripts_imgnet_zeroshot.sh: Run all ImageNet zero-shot experiments
scripts_imgnet_fewshot.sh: Run all ImageNet few-shot experiments

These script files run multiple experiments in batch, while the individual training commands below can be used to run specific experiments separately.

ImageNet

Zero-shot teacher distillation:

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=29500 train_imgnet.py \
    --dataset imagenet \
    --shots 1 \
    --teacher_type zs \
    --batch_size 256 \
    --train_epoch 10 \
    --lr 0.001 \
    | tee ./logs/imagenet/imgnet_zeroshot.log

Few-shot teacher (CLAP) distillation:

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=29500 train_imgnet.py \
    --dataset imagenet \
    --shots 1 \
    --teacher_type fs \
    --teacher_ckpt ./ckpt/fewshot_teacher/imagenet/tip_adapter_1shot/best_tip_adapter_F_1shots_round0.pt \
    --batch_size 256 \
    --train_epoch 10 \
    --lr 0.001 \
    | tee ./logs/imagenet/imgnet_fewshot.log

Main parameters explained:

--shots: Initial number of samples per class to train with (default: 1, options: 1, 2, 4, 8, 16)
--teacher_type: Type of teacher model
- zs: Zero-shot CLIP teacher (no fine-tuning)
- fs: Few-shot teacher (requires checkpoint path)
--teacher_ckpt: Path to the few-shot teacher checkpoint
--root_path: Path to the dataset directory

Other Datasets (few-shot setting)

Zero-shot teacher distillation:

CUDA_VISIBLE_DEVICES=0 python train_others.py \
    --dataset caltech101 \
    --shots 1 \
    --teacher_type zs \
    --student_model res18 \
    --batch_size 64 \
    --train_epoch 10 \
    --lr 0.001 \
    --root_path ./data \
    | tee ./logs/caltech101/others_zeroshot.log

Few-shot teacher distillation:

CUDA_VISIBLE_DEVICES=1 python train_others.py \
    --dataset caltech101 \
    --shots 1 \
    --teacher_type fs \
    --teacher_ckpt ./ckpt/fewshot_teacher/caltech101/tip_adapter_1shot/best_tip_adapter_F_1shots_round0.pt \
    --student_model res18 \
    --batch_size 64 \
    --train_epoch 10 \
    --lr 0.001 \
    --root_path ./data \
    | tee ./logs/caltech101/others_fewshot.log

Parameters for other datasets:

--dataset: Name of the dataset (e.g., caltech101, oxford_pets, stanford_cars, etc.)
--shots: Initial number of samples per class to train with (default: 1, options: 1, 2, 4, 8, 16)
--teacher_type: Type of teacher model
- zs: Zero-shot CLIP teacher (no fine-tuning)
- fs: Few-shot teacher (requires checkpoint path)
--teacher_ckpt: Path to the few-shot teacher checkpoint
--student_model: Student architecture
- res18: ResNet-18 (lightweight option)
- Other options: mobilenet
--root_path: Path to the dataset directory

Note: For few-shot teacher distillation on other datasets, ensure that the --teacher_ckpt path points to the correct pre-trained few-shot teacher model for your specific dataset and shot setting.

📝 Citation

@article{kang2025pcoreset,
  title={PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models},
  author={Kang, Seongjae and Lee, Dong Bok and Jang, Hyungjoon and Kim, Dongseop and Hwang, Sung Ju},
  journal={arXiv preprint arXiv:2506.00910},
  year={2025}
}

🙏 Acknowledgement

We appreciate the open-source implementations from DHO, Tip-Adapter, CLIP, and CLAP.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
datasets		datasets
figures		figures
DATASET.md		DATASET.md
README.md		README.md
clap_imgnet.py		clap_imgnet.py
clap_others.py		clap_others.py
requirements.txt		requirements.txt
scripts_active_imgnet_fewshot.sh		scripts_active_imgnet_fewshot.sh
scripts_active_imgnet_zeroshot.sh		scripts_active_imgnet_zeroshot.sh
scripts_active_others_fewshot.sh		scripts_active_others_fewshot.sh
scripts_active_others_zeroshot.sh		scripts_active_others_zeroshot.sh
strategies.py		strategies.py
train_active_imgnet.py		train_active_imgnet.py
train_active_others.py		train_active_others.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models

🔥 News

🛠️ Code

🚀 Training ActiveKD

ImageNet

Other Datasets (few-shot setting)

📝 Citation

🙏 Acknowledgement

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models

🔥 News

🛠️ Code

🚀 Training ActiveKD

ImageNet

Other Datasets (few-shot setting)

📝 Citation

🙏 Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages