In this tutorial, we are going to evaluate the performance of the naive RAG and the GraphRAG algorithm on a [multi-hop RAG task](https://github.com/yixuantt/MultiHop-RAG).

## Setup
Make sure you install the necessary dependencies by running the following commands:

In [None]:
!pip install ragas nest_asyncio datasets

Import the necessary libraries, and set up your openai api key if needed:

In [21]:
import os
# os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"
import json
import sys
sys.path.append("../..")

import nest_asyncio
nest_asyncio.apply()
import logging

logging.basicConfig(level=logging.WARNING)
logging.getLogger("nano-graphrag").setLevel(logging.INFO)
from nano_graphrag import GraphRAG, QueryParam
from datasets import Dataset 
from ragas import evaluate
from ragas.metrics import (
    answer_correctness,
    answer_similarity,
)

Download the dataset from [Github Repo](https://github.com/yixuantt/MultiHop-RAG/tree/main/dataset). 
If should contain two files:
- `MultiHopRAG.json`
- `corpus.json`

After downloading the dataset, replace the below paths to the paths on your machine.

In [3]:

multi_hop_rag_file = "./fixtures/MultiHopRAG.json"
multi_hop_corpus_file = "./fixtures/corpus.json"

## Preprocess

In [4]:

with open(multi_hop_rag_file) as f:
    multi_hop_rag_dataset = json.load(f)
with open(multi_hop_corpus_file) as f:
    multi_hop_corpus = json.load(f)

corups_url_refernces = {}
for cor in multi_hop_corpus:
    corups_url_refernces[cor['url']] = cor

We only use the top-100 queries for evaluation.

In [5]:
multi_hop_rag_dataset = multi_hop_rag_dataset[:100]
print("Queries have types:", set([q['question_type'] for q in multi_hop_rag_dataset]))
total_urls = set()
for q in multi_hop_rag_dataset:
    total_urls.update([up['url'] for up in q['evidence_list']])
corups_url_refernces = {k:v for k, v in corups_url_refernces.items() if k in total_urls}

total_corups = [f"## {cor['title']}\nAuthor: {cor['author']}, {cor['source']}\nCategory: {cor['category']}\nPublised: {cor['published_at']}\n{cor['body']}" for cor in corups_url_refernces.values()]

print(f"We will need {len(total_corups)} articles:")
print(total_corups[0][:200], "...")

Queries have types: {'inference_query', 'comparison_query', 'null_query', 'temporal_query'}
We will need 139 articles:
## ASX set to drop as Wall Street’s September slump deepens
Author: Stan Choe, The Sydney Morning Herald
Category: business
Publised: 2023-09-26T19:11:30+00:00
ETF provider Betashares, which manages $ ...


Add index for the `total_corups` using naive RAG and GraphRAG

In [6]:
# First time indexing will cost many time, roughly 15~20 minutes
graphrag_func = GraphRAG(working_dir="nano_graphrag_cache_multi_hop_rag_test", enable_naive_rag=True,
                         embedding_func_max_async=4)
graphrag_func.insert(total_corups)

INFO:nano-graphrag:Load KV full_docs with 139 data
INFO:nano-graphrag:Load KV text_chunks with 408 data
INFO:nano-graphrag:Load KV llm_response_cache with 1634 data
INFO:nano-graphrag:Load KV community_reports with 794 data
INFO:nano-graphrag:Loaded graph from nano_graphrag_cache_multi_hop_rag_test/graph_chunk_entity_relation.graphml with 6181 nodes, 5423 edges
INFO:nano-graphrag:Writing graph with 6181 nodes, 5423 edges


Look at the response of different RAG methods on the first query:

In [24]:
response_formate = "Single phrase or sentence, concise and no redundant explanation needed. If you don't have the answer in context, Just response 'Insufficient information'"
naive_rag_query_param = QueryParam(mode='naive', response_type=response_formate)
naive_rag_query_only_context_param = QueryParam(mode='naive', only_need_context=True)
local_graphrag_query_param = QueryParam(mode='local', response_type=response_formate)
local_graphrag_only_context__param = QueryParam(mode='local', only_need_context=True)

In [8]:
query = multi_hop_rag_dataset[0]
print("Question:", query['query'])
print("GroundTruth Answer:", query['answer'])

Question: Who is the individual associated with the cryptocurrency industry facing a criminal trial on fraud and conspiracy charges, as reported by both The Verge and TechCrunch, and is accused by prosecutors of committing fraud for personal gain?
GroundTruth Answer: Sam Bankman-Fried


In [9]:
print("NaiveRAG Answer:", graphrag_func.query(query['query'], param=naive_rag_query_param))

INFO:nano-graphrag:Truncate 20 to 12 chunks


NaiveRAG Answer: Sam Bankman-Fried


In [10]:
print("Local GraphRAG Answer:", graphrag_func.query(query['query'], param=local_graphrag_query_param))

INFO:nano-graphrag:Using 20 entites, 3 communities, 124 relations, 3 text units


Local GraphRAG Answer: Sam Bankman-Fried


Great! Now we're ready to evaluate more detailed metrics. We will use [ragas](https://docs.ragas.io/en/stable/) to evalue the answers' quality.

In [11]:
questions = [q['query'] for q in multi_hop_rag_dataset]
labels = [q['answer'] for q in multi_hop_rag_dataset]

In [12]:
from tqdm import tqdm
logging.getLogger("nano-graphrag").setLevel(logging.WARNING)

naive_rag_answers = [
    graphrag_func.query(q, param=naive_rag_query_param) for q in tqdm(questions)
]

  0%|          | 0/100 [00:00<?, ?it/s]

100%|██████████| 100/100 [03:53<00:00,  2.33s/it]


In [14]:
local_graphrag_answers = [
    graphrag_func.query(q, param=local_graphrag_query_param) for q in tqdm(questions)
]

100%|██████████| 100/100 [09:10<00:00,  5.50s/it]


In [34]:
naive_results = evaluate(
    Dataset.from_dict({
        "question": questions,
        "ground_truth": labels,
        "answer": naive_rag_answers,
    }),
    metrics=[
        # answer_relevancy,
        answer_correctness,
        answer_similarity,
    ],
)

 70%|███████   | 70/100 [04:25<01:53,  3.79s/it]8,  6.38it/s]
Evaluating: 100%|██████████| 200/200 [00:32<00:00,  6.19it/s]


In [36]:
local_graphrag_results = evaluate(
    Dataset.from_dict({
        "question": questions,
        "ground_truth": labels,
        "answer": local_graphrag_answers,
    }),
    metrics=[
        # answer_relevancy,
        answer_correctness,
        answer_similarity,
    ],
)

Evaluating: 100%|██████████| 200/200 [00:23<00:00,  8.59it/s]


In [39]:
print("Naive RAG results", naive_results)
print("Local GraphRAG results", local_graphrag_results)

Naive RAG results {'answer_correctness': 0.5896, 'answer_similarity': 0.8935}
Local GraphRAG results {'answer_correctness': 0.7380, 'answer_similarity': 0.8619}
