Scaling Long-Horizon LLM Agent via Context-Folding

Paper: https://arxiv.org/pdf/2510.11967

Training

Coming soon

Evaluation

Start Search Server

cd envs && python search_server.py \
  --model Qwen/Qwen3-Embedding-8B \
  --corpus Tevatron/browsecomp-plus-corpus \
  --corpus-embedding-dataset miaolu3/browsecomp-plus \
  --host 0.0.0.0 \
  --port 8000

Evaluate on BrowseComp

Download and decompress: https://drive.google.com/file/d/1aX5xXAN5R-gLKd8A0AY-troxXJRawyAM/view?usp=sharing
Fold Agent: workflow=search_branch

export OPENAI_API_KEY='your-key'

python scripts/eval_bc.py \
  --data_path data/bc_test.parquet \
  --model_name gpt-5-nano \
  --num_workers 150 \
  --workflow search_branch \
  --prompt_length 16384 \
  --response_length 32768 \
  --max_turn 200 \
  --val_max_turn 200 \
  --max_session 10 \
  --val_max_session 10 \
  --local_search_url http://localhost:8000 \
  --output_dir results

Output:

Evaluating: 100%|█████████████| 150/150 [32:52<00:00, 13.15s/item, avg_score=0.407, id=122]

============================================================
Overall - Avg Score: 0.4067, Success: 150/150

By Data Source:
  bc_test_easy: 0.8200 (50 items)
  bc_test_hard: 0.0400 (50 items)
  bc_test_meduim: 0.3600 (50 items)

ReAct Agent: workflow=search

python scripts/eval_bc.py --workflow search [...]

Summary Agent: workflow=search, enable_summary

python scripts/eval_bc.py --workflow search --enable_summary [...]

Using vLLM

# Start vLLM server
vllm serve ByteDance-Seed/Seed-OSS-36B-Instruct --port 8001 --max-model-len 131072

# Run evaluation
export OPENAI_API_KEY='dummy'
export OPENAI_BASE_URL='http://localhost:8001/v1'

python scripts/eval_bc.py \
  --model_name ByteDance-Seed/Seed-OSS-36B-Instruct \
  --workflow search_branch \
  --num_workers 32 \
  --prompt_length 16384 \
  --response_length 32768 \
  --max_turn 100 \
  --val_max_turn 100 \
  --max_session 10 \
  --val_max_session 10 \
  --output_dir results

Evaluation on SWE-Bench Verified

Coming soon

Cite

@article{sun2025scaling,
  title   = {Scaling Long-Horizon LLM Agent via Context-Folding},
  author  = {Sun, Weiwei and Lu, Miao and Ling, Zhan and Liu, Kang and Yao, Xuesong and Yang, Yiming and Chen, Jiecao},
  journal = {arXiv preprint arXiv:2510.11967},
  year    = {2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
agents		agents
envs		envs
external		external
scripts		scripts
.gitmodules		.gitmodules
README.md		README.md
verl		verl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Scaling Long-Horizon LLM Agent via Context-Folding

Training

Evaluation

Start Search Server

Evaluate on BrowseComp

Using vLLM

Evaluation on SWE-Bench Verified

Cite

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

sunnweiwei/FoldAgent

Folders and files

Latest commit

History

Repository files navigation

Scaling Long-Horizon LLM Agent via Context-Folding

Training

Evaluation

Start Search Server

Evaluate on BrowseComp

Using vLLM

Evaluation on SWE-Bench Verified

Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages