April 1, 2024
- Our preprint is available at Arxiv
February 27, 2024
- Our paper have been accepted to CVPR 2024!
Table of Contents
Quantized neural networks employ reduced precision representations for both weights and activations. This quantization process significantly reduces the memory requirements and computational complexity of the network. Binary Neural Networks (BNNs) are the extreme quantization case, representing values with just one bit. Since the sign function is typically used to map real values to binary values, smooth approximations are introduced to mimic the gradients during error backpropagation. Thus, the mismatch between the forward and backward models corrupts the direction of the gradient causing training inconsistency problems and performance degradation. In contrast to current BNN approaches, we propose to employ a binary periodic (BiPer) function during binarization. Specifically, we use a square wave for the forward pass to obtain the binary values and employ the trigonometric sine function with the same period of the square wave as a differentiable surrogate during the backward pass. We demonstrate that this approach can control the quantization error by using the frequency of the periodic function and improves network performance. Extensive experiments validate the effectiveness of BiPer in benchmark datasets and network architectures, with improvements of up to 1% and 0.69% with respect to state-of-the-art methods in the classification task over CIFAR-10 and ImageNet, respectively.
Clone our repo to your local machine using the following command:
git clone https://github.com/edmav4/BiPer.git
cd BiPer
Create a new conda environment using the provided environment.yml
file.
conda env create --prefix ./venv -f environment.yml
conda activate ./venv
Our BiPer was trained on CIFAR-10 and ImageNet datasets. You can download the datasets using the following commands:
-
CIFAR-10
python cifar10/dataset/download.py --dataset cifar10 --data_path cifar10/data/CIFAR10
-
ImageNet
See ImageNet for more details.
Our approach consists of a two-stage training strategy. In the first stage, the network is trained with real weights and binary features. Then, in the second stage, a warm weight initialization is employed based on the binary representation of the output weights from the first stage, and the model is fully trained to binarize the weights. Thus, the problem is split into two subproblems: weight and feature binarization.
To train stage1, you can use a similar command as follows:
# Example for BiPer-ResNet18 model
python -u main.py \
--gpus 0 \
--model resnet18_1w1a \
--results_dir ./result/stage1 \
--dataset cifar10 \
--epochs 600 \
--lr 0.021 \
-b 256 \
-bt 128 \
--lr_type cos \
--warm_up \
--weight_decay 0.0016 \
--tau 0.037 \
--freq 20
See this example in run_stage1.sh
, and run it with bash run_stage1.sh
.
After training the first stage, you can train the second stage using the following command:
# Example for BiPer-ResNet18 model
python -u main_stage2.py \
--gpus 0 \
--model resnet18_1w1a \
--results_dir ./result/stage2 \
--dataset cifar10 \
--epochs 300 \
--lr 0.0037 \
-b 256 \
-bt 128 \
--lr_type cos \
--warm_up \
--weight_decay 0.00016 \
--tau 0.0468 \
--load_ckpt_stage1 ./result/stage1/model_best.pth.tar
Note that --load_ckpt_stage1
should be specified to load the pretrained model from the first stage. See this example in run_stage2.sh
, and run it with bash run_stage2.sh
.
To evaluate a pretrained model, you can use the following command:
# see eval.sh
python main_stage2.py \
--gpus 0 \
-e {checkpoint_path} \
--model {model arch} \
--dataset cifar10 \
-bt 128 \
for example, using the pretrained model of BiPer-ResNet18:
# example ResNet18
python main_stage2.py \
--gpus 0 \
-e ./pretrained_models/biper_cifar10_resnet18_stage2/model_best.pth.tar \
--model resnet18_1w1a \
--dataset cifar10 \
-bt 128 \
To compute the quantization error, you can use the following command:
python compute_QE.py
Please specify the model and data path in the script.
Quantized Model | Dataset | Params (M) | Top-1 | Config | Download |
---|---|---|---|---|---|
BiPer-ResNet18 | CIFAR-10 | 11.01 | 93.75 | Config File | Model | Log |
BiPer-ResNet20 | CIFAR-10 | 0.27 | 86.98 | Config File | Model | Log |
BiPer-VGG-Small | CIFAR-10 | 4.66 | 92.46 | Config File | Model | Log |
Similar to CIFAR10, here we specify the training process for ImageNet.
To train stage1, you can use a similar command as follows:
# example BiPer-ResNet18
python main.py \
--gpus 0,1,2,3 \
--model resnet18_1w1a \
--data_path data \
--dataset imagenet \
--epochs 200 \
--lr 0.1 \
--weight_decay 1e-4 \
-b 512 \
-bt 256 \
--lr_type cos \
--freq 20 \
--warm_up \
--tau_min 0.85 \
--tau_max 0.99 \
--print_freq 250 \
--use_dali
See this example in run_stage1.sh
, and run it with bash run_stage1.sh
.
After training the first stage, you can train the second stage using a similar command as following:
python main_stg2.py \
--gpus 0 \
--model resnet18_1w1a \
--data_path data \
--dataset imagenet \
--epochs 100 \
--lr 0.01 \
-b 512 \
-bt 256 \
--lr_type cos \
--weight_decay 1e-4 \
--tau_min 0.0 \
--tau_max 0.0 \
--freq 20 \
--load_ckpt_2tage ./result/stage1/model_best.pth.tar \
--use_dali \
# --resume
See this example in run_stage2.sh
, and run it with bash run_stage2.sh
.
To evaluate a pretrained model, you can use the following command:
# see eval.sh
python main_stage2.py \
--gpus 0 \
-e {checkpoint_path} \
--model {model arch} \
--dataset imagenet \
-bt 256
for example, using the pretrained model of ResNet18:
# example BiPer-ResNet18
python main_stage2.py \
--gpus 0 \
-e pretrained_models/biper_imagenet_resnet18_stage2/model_best.pth.tar \
--model resnet18_1w1a \
--dataset imagenet \
-bt 256
Quantized Model | Dataset | Params (M) | Top-1 | Config | Download |
---|---|---|---|---|---|
BiPer-ResNet18 | ImageNet1K | 11.69 | 61.40 | Config File | Model | Log |
BiPer-ResNet34 | ImageNet1K | 21.81 | 65.73 | Config File | Model | Log |
If you use the code or models from this project in your research, please cite our work as follows:
@article{vargas2024biper,
title={BiPer: Binary Neural Networks using a Periodic Function},
author={Vargas, Edwin and Correa, Claudia and Hinojosa, Carlos and Arguello, Henry},
journal={arXiv preprint arXiv:2404.01278},
year={2024}
}
Biper is distributed under the MIT License. See LICENSE
for more information.
-
Edwin Vargas
- Linkedin: https://www.linkedin.com/in/edwin-vargas-80ab7873/
- Twitter: @edmav47
- Email: edwin.vargas@rice.edu
- Webpage: https://www.researchgate.net/profile/Edwin-Vargas-13
-
Carlos Hinojosa
- Linkedin: https://www.linkedin.com/in/phdcarloshinojosa/
- Twitter: @CarlosH_93
- Email: carlos.hinojosamontero@kaust.edu.sa
- Webpage: https://carloshinojosa.me/
- Our code is based on the ReCU repository: https://github.com/z-hXu/ReCU. We thank the authors for making their code publicly available.
- This work was supported by the Vicerrectoría de Investigacion y Extensión of Universidad Industrial de Santander (UIS), Colombia under the research project VIE-3735.