This repository contains the official implementation, pretraining code, and evaluation scripts for Avey-B, as presented in the paper "Avey-B" (ICLR 2026).
Abstract: Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention’s ability to deliver high-quality bidirectional contextualization with sequence-level parallelism, as popularized by BERT-style architectures. Recently, Avey was introduced as an autoregressive, attention-free alternative that naturally admits an encoder-only adaptation. In this paper, we reformulate Avey for the encoder-only paradigm and propose several innovations to its architecture, including decoupled static and dynamic parameterizations, stability-oriented normalization, and neural compression. Results show that this reformulated architecture compares favorably to four widely used Transformer-based encoders, consistently outperforming them on standard token-classification and information-retrieval benchmarks while scaling more efficiently to long contexts.
.
├── avey_b/ # Core implementation of the Avey-B model architecture
├── EncodEval/ # Evaluation framework (SC, TC, QA, IR benchmarks)
├── EncodEval/neobert/ # Custom implementations for NeoBERT baseline comparisons
├── bench_latency.py # Script for benchmarking inference latency
├── bench_throughput.py # Script for benchmarking training/inference throughput
├── setup.sh # Environment setup script
├── train.sh # Training launcher script
├── train_mlm.py # Masked Language Modeling (MLM) pretraining script
└── pyproject.toml # Dependency management via uv
The codebase is tested on Ubuntu 22.04 using NVIDIA A100 and H100 GPUs. Python environments are managed using uv for strict reproducibility.
-
clone the repo
git clone https://github.com/rimads/avey-b cd avey-b -
Initialize Environment: The provided
setup.shscript installs system dependencies (includingawscli), installsuv, and syncs the Python environment defined inpyproject.toml.bash setup.sh
-
Activate Environment:
source .venv/bin/activate
We provide scripts to pretrain Avey-B from scratch using the Masked Language Modeling (MLM) objective. Note that running pre-training will download the dataset specified in dataloader.py (sample-10BT from HuggingFaceFW/fineweb by default).
-
Configuration:
- Either login to wandb with
wandb loginor disable itwandb disabled - It is recommended to login to hf with
hf auth login, to prevent rate limit errors while downloading the datasets
- Either login to wandb with
-
Model Config: Adjust model hyperparameters inside
train_mlm.py(approx. line 242) if needed. -
Launch Training: Use
train.shto automatically detect available GPUs and launch the training run. You can control the per-device batch size via environment variables.# Example: Set batch size to 16 (fits on 80GB VRAM) export BATCH_SIZE=16 bash train.sh
Note:
train.shhandles single-node multi-GPU setups. for multi-node training, please invoketorchrunmanually with the appropriate rendezvous arguments.
Our evaluation framework is adapted from EncodEval.
Navigate to the EncodEval directory (cd EncodEval) to run evals.
If you intend to run long-range Needle-In-A-Haystack (NIAH) benchmarks:
python gen_niah.py-
Open
EncodEval/run.pyand specify:model_name: Local path or HuggingFace ID (e.g.,google-bert/bert-base-uncased).learning_rates: List of LRs to sweep.- Benchmarks and random seeds.
-
Ensure YAML configurations for your chosen benchmarks, learning rate, and seeds exist in
EncodEval/configs(configs for specified values inrun.pyare already provided). -
Run the evaluation:
python run.py
run.pywill automatically schedule the benchmarks to run on all GPUs on the machine as they become available. -
Print results by running
python print_results.py
Model name and learning rates will need to be specified inside
print_results.sh. The script will print results in a format that can be pasted into Google Sheets.
If evaluating NeoBERT, specific token classification implementations are required:
- Download the NeoBERT model.
- Move the files from
EncodEval/neobert/(in this repo) into your downloaded NeoBERT model directory. - Point
model_nameinEncodEval/run.pyto this local directory.
To reproduce the efficiency plots (throughput and latency) found in the paper:
# Generate throughput data
python bench_throughput.py
# Generate latency data
python bench_latency.pyTo run the unoptimized version of Avey-B, the @torch.compile decorator can be removed from the implementation. To test the optimized versions of the other models, flash-attention will need to be installed.
Note on NeoBERT Efficiency Testing: To test NeoBERT beyond its training window (solely for efficiency measurements), you must manually override its config:
- Download the NeoBERT checkpoint.
- Modify
config.json: Setmax_lengthto a large value (e.g., 100000). - Update the benchmarking scripts to point to this modified local checkpoint.
If you use Avey-B or this codebase in your research, please cite our paper:
@inproceedings{2026aveyb,
title={Avey-B},
author={Acharya, Devang and Hammoud, Mohammad},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}