Skip to content

haoyuhan1/RAGvsGraphRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Methods Benchmark — KDD

A benchmark framework for evaluating multiple Retrieval-Augmented Generation (RAG) and GraphRAG methods across QA and summarization tasks.

Supported Methods

Method Scripts
RAG (standard vector-based) retrieval.py, retrieval_multiple.py
RaptorRAG raptor_retrieval.py, raptor_retrieval_multiple.py
KG-GraphRAG (Triplets / Triplets+Text) graph_retrieval.py, graph_retrieval_multiple.py
Community-GraphRAG (Local) graphrag_local.py, graphrag_local_multiple.py
Community-GraphRAG (Global) graphrag_global.py, graphrag_global_multiple.py
HippoRAG2 hippo_retrieval.py, hippo_retrieval_multiple.py

Datasets

Type Dataset
Full-document (single index) news (MultiHop-RAG), hotpot, Story, Meeting
Sub-document (per-document index) NovelQA, NQ, SQU, QM
QA task news (MultiHop-RAG), hotpot, NovelQA, NQ
Summarization task Story, Meeting, SQU, QM

Pipeline Overview

1. Indexing  →  2. Retrieval  →  3. QA / Summarization  →  4. Evaluation

Step 1 — Indexing

Indexing is handled automatically inside each retrieval script via dataset.py. The index is cached on disk and reused on subsequent runs.


Step 2 — Retrieval

Use retrieval.py for full-document datasets and retrieval_multiple.py for sub-document datasets.

Basic retrieval:

python retrieval.py --dataset news --topk 10
python retrieval_multiple.py --dataset NovelQA --topk 10 --subdocs

With IRCoT (iterative retrieval):

python retrieval.py --dataset news --topk 10 --ircot
python retrieval_multiple.py --dataset NovelQA --topk 10 --subdocs --ircot

With reranking:

python retrieval.py --dataset news --topk 20 --rerank
python retrieval_multiple.py --dataset NovelQA --topk 20 --subdocs --rerank

Other method-specific retrieval scripts follow the same argument conventions. For KG-GraphRAG, pass --withtext to include source text alongside triplets.


Step 3 — QA / Summarization

QA datasets (news, hotpot, NovelQA, NQ) — use qa/qa_batch.py:

cd qa
python qa_batch.py --dataset news --graphrag_local
python qa_batch.py --dataset hotpot --rag

Summarization datasets (SQU, QM, Story, Meeting) — use qa/summarization.py:

cd qa
python summarization.py --dataset Story --graphrag_local
python summarization.py --dataset Meeting --rag

Method flags: --rag, --raptor_rag, --hippo_rag, --graphrag_local, --graphrag (global).


Step 4 — Evaluation

Dataset Script
news (MultiHop-RAG) evaluation/qa_evaluation.py
hotpot, NQ evaluation/hotpot_evaluation.py
NovelQA evaluation/NovelQA_evaluation.py
SQU, QM, Story, Meeting evaluation/summarization_evaluation.py

Examples:

cd evaluation

# news
python qa_evaluation.py --dataset news --graphrag_local

# hotpot / NQ
python hotpot_evaluation.py --dataset NQ --subdocs --graphrag_local

# NovelQA
python NovelQA_evaluation.py --dataset NovelQA --graphrag_local

# Summarization
python summarization_evaluation.py --dataset Meeting --graphrag_local

Common Arguments

Argument Description
--dataset Dataset name (e.g., news, hotpot, NovelQA, NQ, Story, Meeting, SQU, QM)
--topk Number of retrieved documents
--subdocs Use per-document (sub-document) indices
--rerank Enable reranking with BAAI/bge-reranker-large
--ircot Enable IRCoT iterative retrieval
--model_size LLM size for IRCoT (default: 8B)
--embed_model Embedding model (default: text-embedding-ada-002)

Dependencies

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages