StableQAT

This repository is the Pytorch implementation of StableQAT, a simple and effective QAT framework that stabilizes training in ultra low-bit settings via a novel, lightweight, and theoretically grounded surrogate for backpropagation derived from a discrete Fourier analysis of the rounding operator. StableQAT strictly generalizes STE as the latter arises as a special case of our more expressive surrogate family, yielding smooth, bounded, and inexpensive gradients that improve QAT training performance and stability across various hyperparameter choices. In experiments, StableQAT exhibits stable and efficient QAT at 2-4 bit regimes across Llamas and ViTs, demonstrating improved training stability and superior performance with negligible training overhead against other QAT techniques.

(The current release supports models of the Llama-3.2-1b series.)

Setup

conda create -yn stableqat python=3.11
conda activate stableqat

pip install -r requirements.txt

Train

Step 1: Sample data

Sample from SlimPjama

python sample_data/sample_slim_pajama.py

Sample from Fineweb-edu

python sample_data/sample_fineweb_edu.py

Step 2: Pre-tokenize

python pre_tokenize.py \
--tokenize_config_file $tokenize_config_file\
--train_data_dir $train_data_dir \
--train_data_file $train_data_file \
--tokenized_data_dir $tokenized_data_dir

Argument $tokenize_config_file refers to the file under the tokenize_configs folder

Step 3: Train

torchrun --nnodes=2 --nproc_per_node=8 train.py \
--train_config_file $train_config_file \
--train_data_dir $train_data_dir \
--tokenized_data_name $tokenized_data_dir \
--output_dir $output_dir

Argument $train_config_file refers to the file under the train_configs folder

Evaluation

python -m lm_eval \
--model hf \
--tasks piqa,winogrande,arc_challenge,hellaswag,arc_easy,sciq,openbookqa,boolq \
--batch_size auto \
--model_args pretrained=$output_dir/last_checkpoint \
--device cuda:0 \
--output_path $output_dir/evaluation_results

Efficiency Comparison

To compare the time cost of StableQAT, DSQ, ParetoQ for Llama-3.2-1B, run:

python efficiency_benchmark/compare_model_time_cost.py

Citation

If you find this work useful, please cite our paper:

@article{chen2026stableqat,
  title={StableQAT: Stable Quantization-Aware Training at Ultra-Low Bitwidths},
  author={Chen, Tianyi and Chen, Sihan and Qu, Xiaoyi and Zhao, Dan and Yan, Ruomei and Ko, Jongwoo and Liang, Luming and Cameron, Pashmina},
  journal={arXiv preprint arXiv:2601.19320},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StableQAT

Contents

Setup

Train

Step 1: Sample data

Step 2: Pre-tokenize

Step 3: Train

Evaluation

Efficiency Comparison

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
efficiency_benchmark		efficiency_benchmark
lm_eval		lm_eval
models		models
sample_data		sample_data
tokenize_configs		tokenize_configs
train_configs		train_configs
utils		utils
.DS_Store		.DS_Store
README.md		README.md
SECURITY.md		SECURITY.md
pre_tokenize.py		pre_tokenize.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

StableQAT

Contents

Setup

Train

Step 1: Sample data

Step 2: Pre-tokenize

Step 3: Train

Evaluation

Efficiency Comparison

Citation

About

Resources

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages