LiteResearcher

A Low-Cost, Scalable Agentic RL Training Framework for Deep Research Agent

If you like our project, please give us a star ⭐ on GitHub for the latest update.

News

2026-06

🚀 Training code released — GRPO + difficulty-aware curriculum (training/)
📊 Training data released — Stage-1 & Stage-2 prompts (LiteResearcher-Data)
🧊 SFT cold-start checkpoint released (LiteResearcher-4B-SFT)
🛠️ Data synthesis pipeline released (datagen/)
🌐 Local search/browse environment released (environment/)
📚 32M-record search corpus released (LiteResearcher-Corpus)

2026-04

🎯 RL model weights released (LiteResearcher-4B)
📈 Evaluation code & project page released

LiteResearcher-4B is a 4B deep research agent that matches frontier systems at a fraction of the size — trained with $0 marginal API cost by replacing live-web interaction with a stable local search/browse environment that mirrors real-world search dynamics.

Highlights

Open-source SOTA — 71.3% GAIA / 78.0% Xbench-DS, beating 30B open-source agents and surpassing Claude-4.5-Sonnet on GAIA and GPT-5-high on Xbench-DS.
+15.7 GAIA points from RL — SFT 55.6% → RL 71.3%, vs. only +3.8 for AgentCPM-Explore when training with live web interaction.
$0 marginal API cost — 73.2M local tool calls during RL; the same volume would cost $59K–$243K via live search/browse APIs.

Left: Xbench-DeepSearch accuracy vs. model size — our 4B model reaches 78.0%, matching/surpassing 100×+ larger systems. Right: Average rollout time and cost per turn — LiteResearcher is the fastest and cheapest.

Results

Comparison across commercial models and open-source deep research agents on eight benchmarks. Best score among ≤8B models is in bold; LiteResearcher-4B leads on 6 of 8.

Across all 8 benchmarks, LiteResearcher-4B is the best ≤8B agent on 6 — Mirothinker-8B leads on BrowseComp and BrowseComp-ZH. Full numbers are also in the paper and training/README.md.

Method Overview

Three pillars enable low-cost, scalable Agentic RL:

Co-construct Training Data & Corpus — Scale up information sources with a simple-but-effective synthesis pipeline, then co-evolve training QA pairs and the local webpage corpus.
Stable Local Tool Environment — Build local search engine (Milvus + BGE-M3) and local browse tool (PostgreSQL) from ~32M real webpages, enabling the RL stage to run fully locally with no API consumption, 10–46× speedup, and zero marginal tool cost.
Difficulty-Aware Curriculum RL — Multi-stage curriculum with on-policy GRPO, filtering tasks by pass@8 difficulty to sustain monotonic improvement.

Trajectory Cases

We release 15 hand-audited rollout trajectories from LiteResearcher-4B across 8 deep-research benchmarks (GAIA, Xbench-DS, Frames, HLE, Seal-0, WebWalker, BrowseComp, BrowseComp-ZH).

🔎 Live viewer: https://simplexai-labs.github.io/LiteResearcher/cases/

Each trajectory renders 40–170 steps showing the model's think → search → visit → answer chain, with tool queries, visited URLs, and tool responses inline. Source data lives under docs/cases/.

Repository Structure

├── inference/              # Inference & evaluation (released)
├── training/               # RL training — GRPO + curriculum (released)
├── datagen/                # Data synthesis (released)
├── environment/            # Local search/browse environment (released)
└── docs/                   # Project page

Quick Start — Evaluation

cd inference
pip install -r requirements.txt
cp .env.example .env
# Edit .env: set MODEL, SERPER_KEY_ID (browser uses Jina Reader by default; set SCRAPEDO_API_KEY only if using BROWSER_PROVIDER=scrapedo)

# Start model server (SGLang/vLLM)
bash scripts/start_sglang.sh

# Run evaluation
bash scripts/run_all.sh

See inference/README.md for detailed configuration and usage.

Quick Start — Training

The full two-stage RL training pipeline (GRPO + TIS + difficulty-aware curriculum) is in training/, and the training data is hosted on 🤗 LiteResearcher-Data.

Prerequisites

GPU — Stage 1: 8×H20 (1 node); Stage 2: 16×H20 (2 nodes).
Local tool backend — RL runs against the local search/browse environment, not live web. Bring up the search service (Milvus + Redis) and the browse service (PostgreSQL) before training. See environment/ for the search backend and examples/sglang_multiturn/search_browser/tool_backend/ for the browse backend.

1. Install

cd training
pip install -e .[sglang]                   # verl-based training stack

2. Configure the tool backend

cp examples/sglang_multiturn/search_browser/tool_backend/.env.example \
   examples/sglang_multiturn/search_browser/tool_backend/.env
# Edit .env: PG_* (browse DB), SUMMARY_API_*, LLM_JUDGE_API_*, optional SCRAPEDO_API_KEY

# Start the browse service (reads the .env above)
bash examples/sglang_multiturn/search_browser/tool_backend/start_browse.sh

3. Download the training data

hf download simplex-ai-inc/LiteResearcher-Data --repo-type dataset \
            --local-dir ./literesearcher_data    # 28K prompts, 19 MB

4. Stage 1 — RAG-only warmup (8×H20, 32K ctx)

export TRAIN_DATA=./literesearcher_data/stage1/train.parquet
export VAL_DATA="$TRAIN_DATA"     # no separate val bundled; verl needs a non-empty val_files
export MODEL_PATH=$(hf download simplex-ai-inc/LiteResearcher-4B-SFT \
                                --local-dir ./literesearcher_sft)
bash examples/sglang_multiturn/search_browser/stage1_rag_only.sh

5. Stage 2 — mixed curriculum (16×H20, 48K ctx)

Resume from a Stage-1 checkpoint (around step 220).

export TRAIN_DATA=./literesearcher_data/stage2/train.parquet
export VAL_DATA="$TRAIN_DATA"
export MODEL_PATH=/path/to/stage1-ckpt/global_step_220
bash examples/sglang_multiturn/search_browser/stage_2_mix_rag_on_policy_48k.sh

See training/README.md for the full reproduction recipe (including the SFT cold-start prerequisite, environment variables, and config knobs) and the dataset card for the data schema and curriculum design.

Release Plan

Evaluation code
Project page
Model weights — RL (LiteResearcher-4B)
Model weights — SFT cold-start (LiteResearcher-4B-SFT, built on Qwen3-4B-Thinking-2507) 🆕
Local search/browse environment setup (environment/)
Search corpus — 32M records (LiteResearcher-Corpus)
Training code — GRPO + curriculum RL (training/)
Training data — Stage-1 & Stage-2 prompts (LiteResearcher-Data)
Data synthesis pipeline (datagen/)

Acknowledgements

LiteResearcher's training stack is built on verl, ByteDance's RL training library, which we fork and extend with the multi-turn search/browse agent loop, difficulty-aware curriculum, and local-tool reward pipeline. We also build on SGLang for rollout serving, Qwen3 as the base model, and Milvus + BGE-M3 for the local search environment. We thank these projects and their communities.

Contributing

Contributions are welcome — see CONTRIBUTING.md for development setup, pull-request guidelines, and our Code of Conduct.

Powered By

LiteResearcher is the engine behind lev8, Simplex AI's parallel agentic search platform — frontier-grade deep research, fast and cheap enough to run hundreds of agents per query. Explore → lev8.com

Citation

@article{li2026literesearcher,
  title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
  author={Li, Wanli and Qu, Bince and Pan, Bo and Zhang, Jianyu and Liu, Zheng and Zhang, Pan and Chen, Wei and Zhang, Bo},
  journal={arXiv preprint arXiv:2604.17931},
  year={2026}
}

License

Released under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
datagen		datagen
docs		docs
environment		environment
figures		figures
huggingface		huggingface
inference		inference
training		training
video		video
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
build_benchmark_figure.py		build_benchmark_figure.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LiteResearcher

A Low-Cost, Scalable Agentic RL Training Framework for Deep Research Agent

News

Results

Method Overview

Trajectory Cases

Repository Structure

Quick Start — Evaluation

Quick Start — Training

Prerequisites

1. Install

2. Configure the tool backend

3. Download the training data

4. Stage 1 — RAG-only warmup (8×H20, 32K ctx)

5. Stage 2 — mixed curriculum (16×H20, 48K ctx)

Release Plan

Acknowledgements

Contributing

Powered By

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LiteResearcher

A Low-Cost, Scalable Agentic RL Training Framework for Deep Research Agent

News

Results

Method Overview

Trajectory Cases

Repository Structure

Quick Start — Evaluation

Quick Start — Training

Prerequisites

1. Install

2. Configure the tool backend

3. Download the training data

4. Stage 1 — RAG-only warmup (8×H20, 32K ctx)

5. Stage 2 — mixed curriculum (16×H20, 48K ctx)

Release Plan

Acknowledgements

Contributing

Powered By

Citation

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages