RAG Methods Benchmark — KDD

A benchmark framework for evaluating multiple Retrieval-Augmented Generation (RAG) and GraphRAG methods across QA and summarization tasks.

Supported Methods

Method	Scripts
RAG (standard vector-based)	`retrieval.py`, `retrieval_multiple.py`
RaptorRAG	`raptor_retrieval.py`, `raptor_retrieval_multiple.py`
KG-GraphRAG (Triplets / Triplets+Text)	`graph_retrieval.py`, `graph_retrieval_multiple.py`
Community-GraphRAG (Local)	`graphrag_local.py`, `graphrag_local_multiple.py`
Community-GraphRAG (Global)	`graphrag_global.py`, `graphrag_global_multiple.py`
HippoRAG2	`hippo_retrieval.py`, `hippo_retrieval_multiple.py`

Datasets

Type	Dataset
Full-document (single index)	news (MultiHop-RAG), hotpot, Story, Meeting
Sub-document (per-document index)	NovelQA, NQ, SQU, QM
QA task	news (MultiHop-RAG), hotpot, NovelQA, NQ
Summarization task	Story, Meeting, SQU, QM

Pipeline Overview

1. Indexing  →  2. Retrieval  →  3. QA / Summarization  →  4. Evaluation

Step 1 — Indexing

Indexing is handled automatically inside each retrieval script via dataset.py. The index is cached on disk and reused on subsequent runs.

Step 2 — Retrieval

Use retrieval.py for full-document datasets and retrieval_multiple.py for sub-document datasets.

Basic retrieval:

python retrieval.py --dataset news --topk 10
python retrieval_multiple.py --dataset NovelQA --topk 10 --subdocs

With IRCoT (iterative retrieval):

python retrieval.py --dataset news --topk 10 --ircot
python retrieval_multiple.py --dataset NovelQA --topk 10 --subdocs --ircot

With reranking:

python retrieval.py --dataset news --topk 20 --rerank
python retrieval_multiple.py --dataset NovelQA --topk 20 --subdocs --rerank

Other method-specific retrieval scripts follow the same argument conventions. For KG-GraphRAG, pass --withtext to include source text alongside triplets.

Step 3 — QA / Summarization

QA datasets (news, hotpot, NovelQA, NQ) — use qa/qa_batch.py:

cd qa
python qa_batch.py --dataset news --graphrag_local
python qa_batch.py --dataset hotpot --rag

Summarization datasets (SQU, QM, Story, Meeting) — use qa/summarization.py:

cd qa
python summarization.py --dataset Story --graphrag_local
python summarization.py --dataset Meeting --rag

Method flags: --rag, --raptor_rag, --hippo_rag, --graphrag_local, --graphrag (global).

Step 4 — Evaluation

Dataset	Script
news (MultiHop-RAG)	`evaluation/qa_evaluation.py`
hotpot, NQ	`evaluation/hotpot_evaluation.py`
NovelQA	`evaluation/NovelQA_evaluation.py`
SQU, QM, Story, Meeting	`evaluation/summarization_evaluation.py`

Examples:

cd evaluation

# news
python qa_evaluation.py --dataset news --graphrag_local

# hotpot / NQ
python hotpot_evaluation.py --dataset NQ --subdocs --graphrag_local

# NovelQA
python NovelQA_evaluation.py --dataset NovelQA --graphrag_local

# Summarization
python summarization_evaluation.py --dataset Meeting --graphrag_local

Common Arguments

Argument	Description
`--dataset`	Dataset name (e.g., `news`, `hotpot`, `NovelQA`, `NQ`, `Story`, `Meeting`, `SQU`, `QM`)
`--topk`	Number of retrieved documents
`--subdocs`	Use per-document (sub-document) indices
`--rerank`	Enable reranking with `BAAI/bge-reranker-large`
`--ircot`	Enable IRCoT iterative retrieval
`--model_size`	LLM size for IRCoT (default: `8B`)
`--embed_model`	Embedding model (default: `text-embedding-ada-002`)

Dependencies

LlamaIndex
vLLM
HippoRAG
RAPTOR
Microsoft GraphRAG
OpenAI API (for embeddings and LLM calls)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Methods Benchmark — KDD

Supported Methods

Datasets

Pipeline Overview

Step 1 — Indexing

Step 2 — Retrieval

Step 3 — QA / Summarization

Step 4 — Evaluation

Common Arguments

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
evaluation		evaluation
qa		qa
README.md		README.md
dataset.py		dataset.py
graph_retrieval.py		graph_retrieval.py
graph_retrieval_multiple.py		graph_retrieval_multiple.py
graphrag_global.py		graphrag_global.py
graphrag_global_multiple.py		graphrag_global_multiple.py
graphrag_local.py		graphrag_local.py
graphrag_local_multiple.py		graphrag_local_multiple.py
hippo_dataset.py		hippo_dataset.py
hippo_retrieval.py		hippo_retrieval.py
hippo_retrieval_multiple.py		hippo_retrieval_multiple.py
raptor.py		raptor.py
raptor_dataset.py		raptor_dataset.py
raptor_retrieval.py		raptor_retrieval.py
raptor_retrieval_multiple.py		raptor_retrieval_multiple.py
retrieval.py		retrieval.py
retrieval_multiple.py		retrieval_multiple.py

Folders and files

Latest commit

History

Repository files navigation

RAG Methods Benchmark — KDD

Supported Methods

Datasets

Pipeline Overview

Step 1 — Indexing

Step 2 — Retrieval

Step 3 — QA / Summarization

Step 4 — Evaluation

Common Arguments

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages