CoSearch: Joint Training of Reasoning and Document Ranking via Reinforcement Learning for Agentic Search

CoSearch jointly trains a multi-step reasoning agent and a generative document ranker via GRPO for agentic search. The main agent issues sub-queries; the ranker reorders candidate documents from a fixed dense retriever before the agent observes them — both are optimized end-to-end from answer correctness.

Two technical contributions make this work:

Semantic grouping: clusters sub-queries by token-level F1 similarity to form valid GRPO groups for the ranker, improving sampling efficiency without additional rollouts.
Composite reward: combines a ranking quality signal (Hit@k) with trajectory-level answer correctness to give the ranker both immediate and long-term learning signals.

Step 1: Set Up Environments

Training environment (`search-llm`)

bash conda_setup/setup_conda_env.sh
conda activate search-llm

See conda_setup/README.md for optional flags (CUDA version, force-recreate, skip flash-attn).

Retriever environment (`retriever`)

We use e5-base as the retriever. The retrieval server setup follows Search-R1, which is already cloned at Search-R1/. You can create the conda environment with:

conda create -n retriever python=3.10
conda activate retriever

# Install torch with conda (needed for faiss-gpu)
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers datasets

# faiss-gpu for efficient retrieval
conda install -c pytorch -c nvidia faiss-gpu=1.8.0

# FastAPI server
pip install uvicorn fastapi

Step 2: Download Retrieval Index and Corpus

The retriever uses a Wikipedia passage index (e5-base-v2 embeddings). Download from Search-R1:

save_path=/your/data/path

cd Search-R1
python scripts/download.py --save_path $save_path

# Merge split index files
cat $save_path/part_* > $save_path/e5_Flat.index

# Decompress corpus
gzip -d $save_path/wiki-18.jsonl.gz

Step 3: Launch the Retrieval Server

The retrieval server must be running before training starts. It exposes a /retrieve endpoint on port 8000.

Edit scripts/launch_retriever_server.sh and set your data paths:

INDEX_FILE="/path/to/e5_Flat.index"
CORPUS_FILE="/path/to/wiki-18.jsonl"
RETRIEVER_MODEL="intfloat/e5-base-v2"
TOPK=50

Then submit:

sbatch scripts/launch_retriever_server.sh

The server will log its hostname to sbatch_out/retriever_<jobid>.out. Note the hostname (e.g., gpu013) — you will need it for training.

Tip: You can also launch the server interactively (without sbatch):
conda activate retriever
bash scripts/launch_retriever_server.sh

Step 4: Download CoSearch Training and EvaluaDon ata

Before launching training, download the train/eval parquet files to the local data/ path expected by the training pipeline:

python scripts/download_from_hf.py

By default, this pulls from:

hzeng/co-search-train
hzeng/co-search-eval

If needed, override the dataset repos:

python scripts/download_from_hf.py --train-repo <train_repo> --eval-repo <eval_repo>

Step 5: Launch CoSearch Training

Pass the retriever URL as an environment variable at submission time:

RETRIEVAL_SERVICE_URL="http://<retriever-hostname>:8000/retrieve" sbatch scripts/train_co_search_grpo.sh

The retriever hostname comes from the sbatch_out/retriever_<jobid>.out log from Step 3.

The script generates the tool config at runtime with this URL injected — no manual file editing needed. The default is http://localhost:8000/retrieve if RETRIEVAL_SERVICE_URL is not set.

Citation

If you use CoSearch in your research, please cite:

https://arxiv.org/abs/2604.17555

@article{zeng2026cosearch,
	title={CoSearch: Joint Training of Reasoning and Document Ranking via Reinforcement Learning for Agentic Search},
	author={Hansi Zeng, Liam Collins, Bhuvesh Kumar, Neil Shah, Hamed Zamani},
	journal={arXiv preprint arXiv:2604.17555},
	year={2026},
	doi={10.48550/arXiv.2604.17555}
}

Authorship

This repository contains sample code developed as part of a collaboration between Snap Inc. and the University of Massachusetts Amherst. Rights to the sample code remain with the original author(s) and are licensed under the terms described in the LICENSE file.

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Search-R1 @ 598e61b		Search-R1 @ 598e61b
conda_setup		conda_setup
config		config
papers		papers
scripts		scripts
verl		verl
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
main_co_search_ppo.py		main_co_search_ppo.py
reranker_dataset.py		reranker_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoSearch: Joint Training of Reasoning and Document Ranking via Reinforcement Learning for Agentic Search

Step 1: Set Up Environments

Training environment (`search-llm`)

Retriever environment (`retriever`)

Step 2: Download Retrieval Index and Corpus

Step 3: Launch the Retrieval Server

Step 4: Download CoSearch Training and EvaluaDon ata

Step 5: Launch CoSearch Training

Citation

Authorship

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CoSearch: Joint Training of Reasoning and Document Ranking via Reinforcement Learning for Agentic Search

Step 1: Set Up Environments

Training environment (search-llm)

Retriever environment (retriever)

Step 2: Download Retrieval Index and Corpus

Step 3: Launch the Retrieval Server

Step 4: Download CoSearch Training and EvaluaDon ata

Step 5: Launch CoSearch Training

Citation

Authorship

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Training environment (`search-llm`)

Retriever environment (`retriever`)

Packages