The official pytorch implementation of KDFS.
Paper link: Filter Pruning for Efficient CNNs via Knowledge-driven Differential Filter Sampler
The illustration of KDFS. Its key components lie in differential filter sampler, masked filter modeling between teacher output and student decoder, and FLOPs regularization term
If you find KDFS useful in your research, please consider citing:
@article{lin2023filter,
title={Filter Pruning for Efficient CNNs via Knowledge-driven Differential Filter Sampler},
author={Lin, Shaohui and Huang, Wenxuan and Xie, Jiao and Zhang, Baochang and Shen, Yunhang and Yu, Zhou and Han, Jungong and Doermann, David},
journal={arXiv preprint arXiv:2307.00198},
year={2023}
}
- Pytorch 1.10
- tensorboard 2.11.0
- nvidia-dali-cuda110 1.23.0 (optional)
First you need to download the pre-trained model, you can get the pre-trained model we used at the following link:
Baidu Wangpan (password: zshb)
The performance of the pre-trained models are shown below:
model | FLOPs | Params | TOP-1 Acc. |
---|---|---|---|
resnet_56_cifar10 | 125.49M | 0.85M | 93.26% |
resnet_110_cifar10 | 252.89M | 1.72M | 93.50% |
resnet_56_cifar100 | 125.49M | 0.85M | 71.33% |
resnet_50_imagenet | 4134M | 25.50M | 76.15% |
Place the pre-trained model in folder teacher_dir
.
We provide training scripts for the models at different compression rates, they are in folder run
.
You can train directly using the following script, for example, to train resnet56 on cifar10:
bash run/run_resnet56_cifar10/run_resnet56_cifar10_prune1.sh
The details of the script are as follows, you need to change the path and such parameters:
run/run_resnet56_cifar10/run_resnet56_cifar10_prune1.sh:
arch=resnet_56 # Model name
result_dir=result/run_resnet56_cifar10_prune1 # The path where you want to go to save the results
dataset_dir=dataset_cifar10 # dataset path
dataset_type=cifar10 # dataset type
teacher_ckpt_path=teacher_dir/resnet_56.pt # path of the pre-trained model
device=0 # gpu id
CUDA_VISIBLE_DEVICES=$device python main.py \
--phase train \
--dataset_dir $dataset_dir \
--dataset_type $dataset_type \
--num_workers 8 \
--pin_memory \
--device cuda \
--arch $arch \
--seed 3407 \
--result_dir $result_dir \
--teacher_ckpt_path $teacher_ckpt_path \
--num_epochs 350 \
--lr 1e-2 \
--warmup_steps 20 \
--warmup_start_lr 1e-4 \
--lr_decay_T_max 350 \
--lr_decay_eta_min 1e-4 \
--weight_decay 1e-4 \
--train_batch_size 256 \
--eval_batch_size 256 \
--target_temperature 3 \
--gumbel_start_temperature 1 \
--gumbel_end_temperature 0.1 \
--coef_kdloss 0.05 \
--coef_rcloss 1000 \
--coef_maskloss 10000 \
--compress_rate 0.57 \
&& \
CUDA_VISIBLE_DEVICES=$device python main.py \
--phase finetune \
--dataset_dir $dataset_dir \
--dataset_type $dataset_type \
--num_workers 8 \
--pin_memory \
--device cuda \
--arch $arch \
--seed 3407 \
--result_dir $result_dir \
--finetune_student_ckpt_path $result_dir"/student_model/"$arch"_sparse_last.pt" \
--finetune_num_epochs 50 \
--finetune_lr 1e-4 \
--finetune_warmup_steps 10 \
--finetune_warmup_start_lr 1e-6 \
--finetune_lr_decay_T_max 50 \
--finetune_lr_decay_eta_min 1e-6 \
--finetune_weight_decay 1e-4 \
--finetune_train_batch_size 256 \
--finetune_eval_batch_size 256 \
--sparsed_student_ckpt_path $result_dir"/student_model/finetune_"$arch"_sparse_best.pt" \
The performance corresponding to these scripts is as follows:
cifar10:
script | baseline FLOPs | FLOPs (FLOPs reduction) | TOP-1 Acc. |
---|---|---|---|
run_resnet56_cifar10_prune1.sh | 125.49M | 74.22M (40.85%) | 93.78% |
run_resnet56_cifar10_prune2.sh | 125.49M | 61.25M (51.19%) | 93.58% |
run_resnet56_cifar10_prune3.sh | 125.49M | 51.24M (59.17%) | 93.19% |
run_resnet110_cifar10_prune1.sh | 252.89M | 122.61M (51.52%) | 94.23% |
run_resnet110_cifar10_prune2.sh | 252.89M | 98.80M (60.93%) | 93.65% |
cifar100:
script | baseline FLOPs | FLOPs (FLOPs reduction) | TOP-1 Acc. |
---|---|---|---|
run_resnet56_cifar100_prune1.sh | 125.49M | 60.26M (51.98%) | 71.65% |
imagenet:
script | baseline FLOPs | FLOPs (FLOPs reduction) | TOP-1 Acc. | TOP-5 Acc. |
---|---|---|---|---|
run_resnet50_imagenet_prune1.sh | 4134M | 2384M (42.32%) | 76.26% | 93.07% |
run_resnet50_imagenet_prune2.sh | 4134M | 1845M (55.36%) | 75.80% | 92.66% |
We provide trained weights and they can be accessed from the following links:
Baidu Wangpan (password: 9qyz)
You can use the test scripts we provide in folder run
to get the performance of the trained weights, for example, to test resnet56 on cifar10:
bash run/run_resnet56_cifar10/test_resnet56_cifar10.sh
The details of the script are as follows, you need to change the path and such parameters:
run/run_resnet56_cifar10/test_resnet56_cifar10.sh:
arch=resnet_56 # Model name
dataset_dir=dataset_cifar10 # dataset path
dataset_type=cifar10 # dataset type
ckpt_path=ckpt_path # The weight path you want to test
device=0 # gpu id
CUDA_VISIBLE_DEVICES=$device python main.py \
--phase test \
--dataset_dir $dataset_dir \
--dataset_type $dataset_type \
--num_workers 8 \
--pin_memory \
--device cuda \
--arch $arch \
--test_batch_size 256 \
--sparsed_student_ckpt_path $ckpt_path \
If you find any problems, please feel free to contact to the authors (osilly0616@gmail.com).