Skip to content

simplexai-labs/LiteResearcher

LiteResearcher

A Low-Cost, Scalable Agentic RL Training Framework for Deep Research Agent

Paper Project Models Datasets Live Demo License

If you like our project, please give us a star ⭐ on GitHub for the latest update.

News

2026-06

2026-04

  • 🎯 RL model weights released (LiteResearcher-4B)
  • 📈 Evaluation code & project page released

LiteResearcher-4B is a 4B deep research agent that matches frontier systems at a fraction of the size — trained with $0 marginal API cost by replacing live-web interaction with a stable local search/browse environment that mirrors real-world search dynamics.

Highlights

  • Open-source SOTA71.3% GAIA / 78.0% Xbench-DS, beating 30B open-source agents and surpassing Claude-4.5-Sonnet on GAIA and GPT-5-high on Xbench-DS.
  • +15.7 GAIA points from RL — SFT 55.6% → RL 71.3%, vs. only +3.8 for AgentCPM-Explore when training with live web interaction.
  • $0 marginal API cost73.2M local tool calls during RL; the same volume would cost $59K–$243K via live search/browse APIs.

Left: Xbench-DeepSearch accuracy vs. model size — our 4B model reaches 78.0%, matching/surpassing 100×+ larger systems. Right: Average rollout time and cost per turn — LiteResearcher is the fastest and cheapest.

Results

Comparison across commercial models and open-source deep research agents on eight benchmarks. Best score among ≤8B models is in bold; LiteResearcher-4B leads on 6 of 8.

Across all 8 benchmarks, LiteResearcher-4B is the best ≤8B agent on 6 — Mirothinker-8B leads on BrowseComp and BrowseComp-ZH. Full numbers are also in the paper and training/README.md.

Method Overview

Three pillars enable low-cost, scalable Agentic RL:

  1. Co-construct Training Data & Corpus — Scale up information sources with a simple-but-effective synthesis pipeline, then co-evolve training QA pairs and the local webpage corpus.
  2. Stable Local Tool Environment — Build local search engine (Milvus + BGE-M3) and local browse tool (PostgreSQL) from ~32M real webpages, enabling the RL stage to run fully locally with no API consumption, 10–46× speedup, and zero marginal tool cost.
  3. Difficulty-Aware Curriculum RL — Multi-stage curriculum with on-policy GRPO, filtering tasks by pass@8 difficulty to sustain monotonic improvement.

Trajectory Cases

We release 15 hand-audited rollout trajectories from LiteResearcher-4B across 8 deep-research benchmarks (GAIA, Xbench-DS, Frames, HLE, Seal-0, WebWalker, BrowseComp, BrowseComp-ZH).

🔎 Live viewer: https://simplexai-labs.github.io/LiteResearcher/cases/

Each trajectory renders 40–170 steps showing the model's thinksearchvisitanswer chain, with tool queries, visited URLs, and tool responses inline. Source data lives under docs/cases/.

Repository Structure

├── inference/              # Inference & evaluation (released)
├── training/               # RL training — GRPO + curriculum (released)
├── datagen/                # Data synthesis (released)
├── environment/            # Local search/browse environment (released)
└── docs/                   # Project page

Quick Start — Evaluation

cd inference
pip install -r requirements.txt
cp .env.example .env
# Edit .env: set MODEL, SERPER_KEY_ID (browser uses Jina Reader by default; set SCRAPEDO_API_KEY only if using BROWSER_PROVIDER=scrapedo)

# Start model server (SGLang/vLLM)
bash scripts/start_sglang.sh

# Run evaluation
bash scripts/run_all.sh

See inference/README.md for detailed configuration and usage.

Quick Start — Training

The full two-stage RL training pipeline (GRPO + TIS + difficulty-aware curriculum) is in training/, and the training data is hosted on 🤗 LiteResearcher-Data.

Prerequisites

  • GPU — Stage 1: 8×H20 (1 node); Stage 2: 16×H20 (2 nodes).
  • Local tool backend — RL runs against the local search/browse environment, not live web. Bring up the search service (Milvus + Redis) and the browse service (PostgreSQL) before training. See environment/ for the search backend and examples/sglang_multiturn/search_browser/tool_backend/ for the browse backend.

1. Install

cd training
pip install -e .[sglang]                   # verl-based training stack

2. Configure the tool backend

cp examples/sglang_multiturn/search_browser/tool_backend/.env.example \
   examples/sglang_multiturn/search_browser/tool_backend/.env
# Edit .env: PG_* (browse DB), SUMMARY_API_*, LLM_JUDGE_API_*, optional SCRAPEDO_API_KEY

# Start the browse service (reads the .env above)
bash examples/sglang_multiturn/search_browser/tool_backend/start_browse.sh

3. Download the training data

hf download simplex-ai-inc/LiteResearcher-Data --repo-type dataset \
            --local-dir ./literesearcher_data    # 28K prompts, 19 MB

4. Stage 1 — RAG-only warmup (8×H20, 32K ctx)

export TRAIN_DATA=./literesearcher_data/stage1/train.parquet
export VAL_DATA="$TRAIN_DATA"     # no separate val bundled; verl needs a non-empty val_files
export MODEL_PATH=$(hf download simplex-ai-inc/LiteResearcher-4B-SFT \
                                --local-dir ./literesearcher_sft)
bash examples/sglang_multiturn/search_browser/stage1_rag_only.sh

5. Stage 2 — mixed curriculum (16×H20, 48K ctx)

Resume from a Stage-1 checkpoint (around step 220).

export TRAIN_DATA=./literesearcher_data/stage2/train.parquet
export VAL_DATA="$TRAIN_DATA"
export MODEL_PATH=/path/to/stage1-ckpt/global_step_220
bash examples/sglang_multiturn/search_browser/stage_2_mix_rag_on_policy_48k.sh

See training/README.md for the full reproduction recipe (including the SFT cold-start prerequisite, environment variables, and config knobs) and the dataset card for the data schema and curriculum design.

Release Plan

Acknowledgements

LiteResearcher's training stack is built on verl, ByteDance's RL training library, which we fork and extend with the multi-turn search/browse agent loop, difficulty-aware curriculum, and local-tool reward pipeline. We also build on SGLang for rollout serving, Qwen3 as the base model, and Milvus + BGE-M3 for the local search environment. We thank these projects and their communities.

Contributing

Contributions are welcome — see CONTRIBUTING.md for development setup, pull-request guidelines, and our Code of Conduct.

Powered By

LiteResearcher is the engine behind lev8, Simplex AI's parallel agentic search platform — frontier-grade deep research, fast and cheap enough to run hundreds of agents per query. Explore → lev8.com

Citation

@article{li2026literesearcher,
  title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
  author={Li, Wanli and Qu, Bince and Pan, Bo and Zhang, Jianyu and Liu, Zheng and Zhang, Pan and Chen, Wei and Zhang, Bo},
  journal={arXiv preprint arXiv:2604.17931},
  year={2026}
}

License

Released under the Apache License 2.0.

About

A Scalable Agentic RL Training Framework for Deep Research Agent

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors