Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. In this paper, we introduce the Length Value Model (LenVM), a token-level framework that models the remaining generation length at each decoding step. By formulating length modeling as a value estimation problem and assigning a constant negative reward to each generated token, LenVM predicts a bounded, discounted return that serves as a proxy for the remaining generation horizon.
- Token-level value prediction — Fine-grained length modeling beyond coarse sequence-level objectives
- Annotation-free pretraining — Scalable supervision from generated trajectories without manual labels
- Cross-modal support — Works seamlessly across language-only and vision-language models
- Inference-time control — Dynamic length adjustment and performance-efficiency trade-offs
- Rich visualization tools — Interactive demos and value inspection utilities
Length-Value-Model/
├── data_generation/ # Data generation pipeline and sampling scripts
├── LlamaFactory-LenVM/ # LenVM training framework (LlamaFactory fork)
├── inference/ # SGLang serving and guided decoding
├── sglang-LenVM/ # LenVM-enabled SGLang runtime
└── tools/ # Visualization and demo utilities
The data generation pipeline is located in ./data_generation/.
Complete pipeline: data_generation/run_data_generation.sh
Pipeline steps:
- Downloads and prepares datasets (
deepmath-103k,OpenCodeReasoning-2,wildchat,R1-Onevision) - Launches SGLang server for trajectory sampling
- Generates training/test data for math, code, chat, and VLM tasks
- Groups samples by prompt index for LenVM training
Data management:
Generated data can be downloaded or uploaded using hf.sh.
LenVM training is built on the customized LlamaFactory-LenVM/ fork.
Training configuration:
Example configs are available in Length-Value-Model/LlamaFactory-LenVM/examples/train_lenvm/
Launch training:
cd Length-Value-Model
llamafactory-cli train \
LlamaFactory-LenVM/examples/train_lenvm/base-qwen2.5-7b-instruct-lenvm-qwen2.5-1.5b-instruct.yamlMore examples:
See Length-Value-Model/train_lf.sh for additional training configurations.
LenVM-enabled inference scripts are in Length-Value-Model/inference/.
Launch SGLang server: inference/sglang_server.sh
Supported models:
Qwen3-30B-A3B-Instruct-2507Qwen2.5-3B-InstructQwen2.5-7B-InstructQwen2.5-VL-7B-Instruct
Quick testing:
For visualization and sanity checks: inference/test_sglang_lvm.sh
Build a standalone interactive demo from logged model outputs:
cd Length-Value-Model
python tools/build_lenvm_hover_demo.pyThis generates an HTML demo for inspecting token-level values and generation dynamics.
If you find this work useful, please cite:
