If you like our project, please give us a star ⭐ on GitHub for the latest update.
2026-06
- 🚀 Training code released — GRPO + difficulty-aware curriculum (
training/) - 📊 Training data released — Stage-1 & Stage-2 prompts (
LiteResearcher-Data) - 🧊 SFT cold-start checkpoint released (
LiteResearcher-4B-SFT) - 🛠️ Data synthesis pipeline released (
datagen/) - 🌐 Local search/browse environment released (
environment/) - 📚 32M-record search corpus released (
LiteResearcher-Corpus)
2026-04
- 🎯 RL model weights released (
LiteResearcher-4B) - 📈 Evaluation code & project page released
LiteResearcher-4B is a 4B deep research agent that matches frontier systems at a fraction of the size — trained with $0 marginal API cost by replacing live-web interaction with a stable local search/browse environment that mirrors real-world search dynamics.
Highlights
- Open-source SOTA — 71.3% GAIA / 78.0% Xbench-DS, beating 30B open-source agents and surpassing Claude-4.5-Sonnet on GAIA and GPT-5-high on Xbench-DS.
- +15.7 GAIA points from RL — SFT 55.6% → RL 71.3%, vs. only +3.8 for AgentCPM-Explore when training with live web interaction.
- $0 marginal API cost — 73.2M local tool calls during RL; the same volume would cost $59K–$243K via live search/browse APIs.
Left: Xbench-DeepSearch accuracy vs. model size — our 4B model reaches 78.0%, matching/surpassing 100×+ larger systems. Right: Average rollout time and cost per turn — LiteResearcher is the fastest and cheapest.
Comparison across commercial models and open-source deep research agents on eight benchmarks. Best score among ≤8B models is in bold; LiteResearcher-4B leads on 6 of 8.
Across all 8 benchmarks, LiteResearcher-4B is the best ≤8B agent on 6 — Mirothinker-8B leads on BrowseComp and BrowseComp-ZH. Full numbers are also in the paper and training/README.md.
Three pillars enable low-cost, scalable Agentic RL:
- Co-construct Training Data & Corpus — Scale up information sources with a simple-but-effective synthesis pipeline, then co-evolve training QA pairs and the local webpage corpus.
- Stable Local Tool Environment — Build local search engine (Milvus + BGE-M3) and local browse tool (PostgreSQL) from ~32M real webpages, enabling the RL stage to run fully locally with no API consumption, 10–46× speedup, and zero marginal tool cost.
- Difficulty-Aware Curriculum RL — Multi-stage curriculum with on-policy GRPO, filtering tasks by pass@8 difficulty to sustain monotonic improvement.
We release 15 hand-audited rollout trajectories from LiteResearcher-4B across 8 deep-research benchmarks (GAIA, Xbench-DS, Frames, HLE, Seal-0, WebWalker, BrowseComp, BrowseComp-ZH).
🔎 Live viewer: https://simplexai-labs.github.io/LiteResearcher/cases/
Each trajectory renders 40–170 steps showing the model's think → search → visit → answer chain, with tool queries, visited URLs, and tool responses inline. Source data lives under docs/cases/.
├── inference/ # Inference & evaluation (released)
├── training/ # RL training — GRPO + curriculum (released)
├── datagen/ # Data synthesis (released)
├── environment/ # Local search/browse environment (released)
└── docs/ # Project page
cd inference
pip install -r requirements.txt
cp .env.example .env
# Edit .env: set MODEL, SERPER_KEY_ID (browser uses Jina Reader by default; set SCRAPEDO_API_KEY only if using BROWSER_PROVIDER=scrapedo)
# Start model server (SGLang/vLLM)
bash scripts/start_sglang.sh
# Run evaluation
bash scripts/run_all.shSee inference/README.md for detailed configuration and usage.
The full two-stage RL training pipeline (GRPO + TIS + difficulty-aware curriculum)
is in training/, and the training data is hosted on
🤗 LiteResearcher-Data.
- GPU — Stage 1: 8×H20 (1 node); Stage 2: 16×H20 (2 nodes).
- Local tool backend — RL runs against the local search/browse environment,
not live web. Bring up the search service (Milvus + Redis) and the browse
service (PostgreSQL) before training. See
environment/for the search backend andexamples/sglang_multiturn/search_browser/tool_backend/for the browse backend.
cd training
pip install -e .[sglang] # verl-based training stackcp examples/sglang_multiturn/search_browser/tool_backend/.env.example \
examples/sglang_multiturn/search_browser/tool_backend/.env
# Edit .env: PG_* (browse DB), SUMMARY_API_*, LLM_JUDGE_API_*, optional SCRAPEDO_API_KEY
# Start the browse service (reads the .env above)
bash examples/sglang_multiturn/search_browser/tool_backend/start_browse.shhf download simplex-ai-inc/LiteResearcher-Data --repo-type dataset \
--local-dir ./literesearcher_data # 28K prompts, 19 MBexport TRAIN_DATA=./literesearcher_data/stage1/train.parquet
export VAL_DATA="$TRAIN_DATA" # no separate val bundled; verl needs a non-empty val_files
export MODEL_PATH=$(hf download simplex-ai-inc/LiteResearcher-4B-SFT \
--local-dir ./literesearcher_sft)
bash examples/sglang_multiturn/search_browser/stage1_rag_only.shResume from a Stage-1 checkpoint (around step 220).
export TRAIN_DATA=./literesearcher_data/stage2/train.parquet
export VAL_DATA="$TRAIN_DATA"
export MODEL_PATH=/path/to/stage1-ckpt/global_step_220
bash examples/sglang_multiturn/search_browser/stage_2_mix_rag_on_policy_48k.shSee training/README.md for the full reproduction recipe
(including the SFT cold-start prerequisite, environment variables, and config
knobs) and the
dataset card
for the data schema and curriculum design.
- Evaluation code
- Project page
- Model weights — RL (
LiteResearcher-4B) - Model weights — SFT cold-start (
LiteResearcher-4B-SFT, built onQwen3-4B-Thinking-2507) 🆕 - Local search/browse environment setup (
environment/) - Search corpus — 32M records (
LiteResearcher-Corpus) - Training code — GRPO + curriculum RL (
training/) - Training data — Stage-1 & Stage-2 prompts (
LiteResearcher-Data) - Data synthesis pipeline (
datagen/)
LiteResearcher's training stack is built on verl, ByteDance's RL training library, which we fork and extend with the multi-turn search/browse agent loop, difficulty-aware curriculum, and local-tool reward pipeline. We also build on SGLang for rollout serving, Qwen3 as the base model, and Milvus + BGE-M3 for the local search environment. We thank these projects and their communities.
Contributions are welcome — see CONTRIBUTING.md for development setup, pull-request guidelines, and our Code of Conduct.
LiteResearcher is the engine behind lev8, Simplex AI's parallel agentic search platform — frontier-grade deep research, fast and cheap enough to run hundreds of agents per query. Explore → lev8.com
@article{li2026literesearcher,
title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
author={Li, Wanli and Qu, Bince and Pan, Bo and Zhang, Jianyu and Liu, Zheng and Zhang, Pan and Chen, Wei and Zhang, Bo},
journal={arXiv preprint arXiv:2604.17931},
year={2026}
}Released under the Apache License 2.0.

