Skip to content

ldilab/R2U

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bridge_icon Relevance to Utility: Process-Supervised Rewrite for RAG

💡 Overview

TL;DR:
We propose a new bridging method for RAG that rewrites retrieved documents to maximize answer generation utility, using LLM-guided process supervision and scalable distillation. Our method outperforms existing baselines across multiple QA benchmarks.

🔧 Installation

1. Environment Setup

# Create conda environment
conda create -n rtou python=3.9
conda activate rtou

# Install requirements
pip install -r requirements.txt

🏃 Quick Start

Data Preparation

Use the code in notebook/{dataset_name}.ipynb to preprocess each dataset into our standardized JSON format. In this work, we utilize the datasets as follows:

  • Multi-hop QA: HotpotQA, 2WikiMultihopQA, MuSiQue
  • Disambiguation QA: AmbigQA
  • Web corpus
    • Single-hop QA: MS MARCO
    • Comprehensive QA: CRAG

Custom tasks:
For other generation tasks (e.g., QA, math, code), format your data as follows:

{
  "Question": "your question here",
  "answer": ["answer1", "answer2"]
}

Also modify these scripts to support your task:

  • scripts/evaluate.py: for task-specific evaluation
  • scripts/prompts.py: to customize prompts for your task
  • scripts/run_xxx_xxx.py: to define the end-to-end pipeline

Model Inference

Please make sure the required file paths are correct before running the script.

  1. RAG
  • please give correct search_cache_name.
bash runs/run_naive_rag.sh

Training

  1. Generating Bridging Document Distribution
bash runs/run_rewrite_docs.sh
  1. Training Student Model
bash runs/convert_cache_to_train.sh
cd train
conda activate test
bash train/bash/run_train_rewriting.sh
cd ..
conda activate rtou
bash runs/convert_train_to_cache.sh
  1. Preference learning
cd dpo-train
bash run.sh
cd ..
conda activate rtou
bash runs/convert_train_to_cache.sh
  1. Using the trained model to write docs
bash runs/run_rewrite_docs_fromtrain.sh

baselines

  • Please check files in reranker/ and runs/baselines/

Acknowlegement

We acknowledge this repository is based on Search-o1.

About

ACL 2026 findings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors