This is an unofficial implementation of: BitNet: Scaling 1-bit Transformers for Large Language Models and The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits.
BitLinear and BitLinear1_58 are candle modules for training and inferencing binarized linear layers, which is a key component of the BitNet and BitNet1.58 architecture. It can effectively reduce the memory footprint and energy consumption of large language models while maintaining competitive performance compared to full-precision models.
- Quantization of weights and activations to a specified bit-width
- Support for bias addition and optional use of bias
- Option to apply quantization before or after the non-linear activation
- Configurable parameters for bit-width, epsilon value, and quantization method
- Import the necessary modules:
use candle_core::{Result, Tensor};
use candle_nn::{VarBuilder, init};
use bitnet_rs::bitlinear::{bitlinear, BitLinearConfig};
- Create a
BitLinearConfig
with your desired settings:
let config = BitLinearConfig {
bit_width: 8,
eps: 1e-5,
bias: true,
use_before_nonlinear: false,
};
- Create a
BitLinear
layer using thebitlinear
function:
let bitlinear = bitlinear(input_dim, output_dim, config, var_builder)?;
- Forward pass through the
BitLinear
layer:
let output = bitlinear.forward(&input_tensor)?;
The BitLinearConfig
struct allows you to customize the behavior of the BitLinear
layer:
bit_width
: The number of bits to use for quantization (default: 8).eps
: A small value used to clamp the quantized values (default: 1e-5).bias
: Whether to include a bias term (default: true).use_before_nonlinear
: Whether to apply quantization before or after the non-linear activation (default: false).
You can find examples of using BitLinear
and BitLinear1_58
in the examples.
Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
If you use BitLinear in your research, please cite the following paper:
BitNet: Scaling 1-bit Transformers for Large Language Models
@article{wang2023bitnet,
title={BitNet: Scaling 1-bit Transformers for Large Language Models},
author={Wang, Hongyu and Ma, Shuming and Dong, Li and Huang, Shaohan and Wang, Huaijie and Ma, Lingxiao and Yang, Fan and Wang, Ruiping and Wu, Yi and Wei, Furu},
journal={arXiv preprint arXiv:2310.11453},
year={2023}
}
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
@article{ma2024era,
title={The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits},
author={Ma, Shuming and Wang, Hongyu and Ma, Lingxiao and Wang, Lei and Wang, Wenhui and Huang, Shaohan and Dong, Li and Wang, Ruiping and Xue, Jilong and Wei, Furu},
journal={arXiv preprint arXiv:2402.17764},
year={2024}
}