This repository contains the official implementation of ReFINE, introduced in the paper:
Reinforced Fast Weights with Next Sequence Prediction
Hee Seung Hwang*, Xindi Wu*, Sanghyuk Chun, Olga Russakovsky
Fast weight architectures (e.g., LaCT, DeltaNet, GatedDeltaNet) are typically pre-trained with next-token prediction (NTP), which provides only token-level supervision. ReFINE addresses this limitation by optimizing for Next-Sequence Prediction (NSP) via reinforcement learning.
ReFINE is phase-agnostic and can be applied during:
- Mid-training - Training on long-context corpora
- Post-training - Task-specific fine-tuning
- Test-time training - Adaptation at inference time
ReFINE improves sequence-level understanding through four key steps:
-
Entropy-Based Token Selection → Select informative positions based on NTP entropy
-
Rollout Generation → Generate multi-token continuations from truncated prefixes
-
Reward Assignment → Compute sequence-level rewards using cosine similarity (or exact match)
-
Optimization with RL → Optimize NSP using GRPO, combined with standard NTP loss
REFINE consistently improves long-context performance over supervised fine-tuning (SFT):
Results on 12 tasks with up to 16K context length:
See the paper for detailed tables and ablations.
- Create conda environment:
conda create -n refine python=3.12 -y
conda activate refine- Install verl dependencies:
bash ./verl/scripts/install_refine.sh- Install additional dependencies:
pip install -r requirements.txtDownload the pre-trained fast weight models:
| Model | Parameters | Code | Checkpoints |
|---|---|---|---|
| LaCT | 760M | GitHub | HuggingFace |
| DeltaNet-1.3B | 1.3B | GitHub | HuggingFace |
Train REFINE on long-context data:
-
Prepare Dataset: The original Long-Data-Collections dataset is no longer available. We recommend using the SlimPajama-6B dataset instead:
- Download the parquet files.
- Filter for samples with at least 16K tokens (only for train data)
-
Configure Script: Update the variables in
verl/examples/refine_trainer/demo/run_midtrain_demo.sh -
Run Training:
cd verl/examples/refine_trainer/demo bash run_midtrain_demo.sh
Fine-tune on task-specific long-context data:
-
Use Provided Datasets: Post-training datasets are available in
data/ruler/ -
Configure Script: Update the variables in
verl/examples/refine_trainer/demo/run_posttrain_demo.sh -
Run Training:
cd verl/examples/refine_trainer/demo bash run_posttrain_demo.sh
Adapt the model at test time for specific tasks:
-
Use Provided Dataset: LongBench dataset (filtered for <16K tokens) is included
- Raw dataset: LongBench on HuggingFace
-
Configure Script: Update the variables in
verl/examples/refine_trainer/demo/run_testtimetrain_demo.sh -
Run Training:
cd verl/examples/refine_trainer/demo bash run_testtimetrain_demo.sh
We recommend using the demo scripts for validation (e.g. Ruler SQuADQA, HotpotQA, LongBench). For evaluation with LM-Eval-Harness (e.g. RULER NIAH), please follow the instructions here.
If you find this work helpful, please cite our paper:
@article{refine2026,
title={Reinforced Fast Weights with Next Sequence Prediction},
author={TBD},
journal={TBD},
year={2026}
}(BibTeX will be updated upon publication)
This project builds upon verl for distributed RL training infrastructure.
-
Zhang, Tianyuan, et al. "Test-time training done right." arXiv preprint arXiv:2505.23884 (2025).
-
Yang, Songlin, et al. "Parallelizing linear transformers with the delta rule over sequence length." Advances in Neural Information Processing Systems 37 (2024): 115491-115522.
-
Yang, Songlin, Jan Kautz, and Ali Hatamizadeh. "Gated delta networks: Improving mamba2 with delta rule." arXiv preprint arXiv:2412.06464 (2024).
-
Gao, Leo, et al. "The pile: An 800gb dataset of diverse text for language modeling." arXiv preprint arXiv:2101.00027 (2020).
-
Hsieh, Cheng-Ping, et al. "RULER: What's the Real Context Size of Your Long-Context Language Models?." arXiv preprint arXiv:2404.06654 (2024).
-
Bai, Yushi, et al. "Longbench: A bilingual, multitask benchmark for long context understanding." arXiv preprint arXiv:2308.14508 (2023).



