## Initial setup and utility function definitions


In [71]:
import nest_asyncio
from dotenv import load_dotenv

load_dotenv("./.env")
nest_asyncio.apply()

def pretty_print_response(response):
    # Extract main components
    print("Response:")
    print("---------")
    print(f"Main response text: {response.response}\n")
    
    print("Source Nodes:")
    print("-------------")
    for i, node in enumerate(response.source_nodes, 1):
        print(f"\nNode {i}:")
        print(f"ID: {node.node.id_}")
        print(f"Text snippet: {node.node.text}")
        print(f"Score: {node.score}")
        print("-" * 50)

## Parse and load corpus of articles

In [74]:
from llama_index.readers.json import JSONReader

reader = JSONReader()

documents = reader.load_data(input_file="./data/corpus.json")
# documents = reader.load_data(input_file="./data/corpus_subset.json")

## GraphRAG index construction

In [75]:
from llama_index.core import PropertyGraphIndex, StorageContext, load_index_from_storage
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
import os

GRAPH_RAG_PERSIST_DIR = "./storage/graph_rag"
# GRAPH_RAG_PERSIST_DIR = "./storage/graph_rag_subset"

if not os.path.exists(GRAPH_RAG_PERSIST_DIR):
    os.makedirs(GRAPH_RAG_PERSIST_DIR, exist_ok=True)
    index = PropertyGraphIndex.from_documents(
        documents,
        llm=OpenAI(model="gpt-4o-mini"),
        embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
        show_progress=True,
    )
    index.storage_context.persist(persist_dir=GRAPH_RAG_PERSIST_DIR)
else:
    index = load_index_from_storage(
        StorageContext.from_defaults(persist_dir=GRAPH_RAG_PERSIST_DIR)
    )

query_engine_KnowledgeGraph = index.as_query_engine(
    include_text=True,
)

Parsing nodes: 100%|██████████| 1/1 [00:00<00:00,  6.18it/s]
Extracting paths from text: 100%|██████████| 136/136 [01:28<00:00,  1.54it/s]
Extracting implicit paths: 100%|██████████| 136/136 [00:00<00:00, 66990.65it/s]
Generating embeddings: 100%|██████████| 2/2 [00:01<00:00,  1.04it/s]
Generating embeddings: 100%|██████████| 27/27 [00:02<00:00,  9.43it/s]


### Visualize the created knowledge graph

In [76]:
index.property_graph_store.save_networkx_graph(name="./knowledge_graph.html")

Open the new file `knowledge_graph.html` in your browser to visualize the graph.

## Traditional Vector RAG index construction

In [22]:
from llama_index.core import VectorStoreIndex

VECTOR_RAG_PERSIST_DIR = "./storage/vector_rag"

if not os.path.exists(VECTOR_RAG_PERSIST_DIR):
    index = VectorStoreIndex.from_documents(documents, show_progress=True)
    index.storage_context.persist(persist_dir=VECTOR_RAG_PERSIST_DIR)
else:
    index = load_index_from_storage(
        StorageContext.from_defaults(persist_dir=VECTOR_RAG_PERSIST_DIR)
)

query_engine_vector_RAG = index.as_query_engine()

# Comparing GraphRAG vs Vector RAG Performance

## **Question 1:** Who became a prominent figure in generative AI technology, notably with ChatGPT, and was recently the subject of controversy involving his departure from OpenAI, as discussed in both Fortune and TechCrunch articles?

- **Ground truth answer:** Sam Altman
- **Question type:** Inference Query


### GraphRAG Query

In [59]:
response = query_engine_KnowledgeGraph.query("Who became a prominent figure in generative AI technology, notably with ChatGPT, and was recently the subject of controversy involving his departure from OpenAI, as discussed in both Fortune and TechCrunch articles? Short answer.")
pretty_print_response(response)

Response:
---------
Main response text: Sam Altman

Source Nodes:
-------------

Node 1:
ID: 6967b73f-96cc-49f0-a9a8-c78164ce98cf
Text snippet: Here are some facts extracted from the provided text:

Openai -> Experienced -> Whirlwind four days
Openai -> Worth -> Just under $30 billion
Openai -> Ousted -> Ceo sam altman
Openai -> Partners with...
Score: 1.0
--------------------------------------------------

Node 2:
ID: 33b28936-9151-442f-b209-ebf98445e02e
Text snippet: Here are some facts extracted from the provided text:

Chatgpt -> Outpaces -> All other mobile apps
Chatgpt -> Runs on -> Gpt-3.5
Chatgpt -> Changed -> The world
Chatgpt -> Can -> Complete code
Chatgp...
Score: 1.0
--------------------------------------------------

Node 3:
ID: ff27b533-c1a5-4d9b-bf92-9f2864973982
Text snippet: Here are some facts extracted from the provided text:

Looking glass -> Utilizes -> Chatgpt

ChatGPT can engage with a range of topics, including programming, TV scripts and scientific concepts. W

### Vector RAG Query

In [60]:
response = query_engine_vector_RAG.query("Who became a prominent figure in generative AI technology, notably with ChatGPT, and was recently the subject of controversy involving his departure from OpenAI, as discussed in both Fortune and TechCrunch articles? Short answer.")
pretty_print_response(response)

Response:
---------
Main response text: Dave Willner

Source Nodes:
-------------

Node 1:
ID: 1179fc20-d8b6-46d2-a94a-97aa7489a40e
Text snippet: The judge was not amused.\n\nBy June, a little bit of ChatGPT’s shine had started to wear off. Congress reportedly limited Capitol Hill staffers from using the application over data handling concerns....
Score: 0.8771951947295035
--------------------------------------------------

Node 2:
ID: 4016fb04-06c8-4646-8de3-d8faeb48afaa
Text snippet: Her remarks (which will no doubt suggest the need for international harmony in AI regulation, with the U.S. modestly taking the lead) will be streamed on November 1, and you should be able to tune in ...
Score: 0.8686533145033473
--------------------------------------------------


## **Question 2:** Does 'The Verge' article suggest that Sam Bankman-Fried set withdrawal permissions based on FTX's total trading revenue, while 'Fortune' and 'TechCrunch' articles focus on the jury's determination of his truthfulness and allegations of committing fraud for personal gain, respectively, without mentioning specific operational practices like withdrawal permissions?

- **Ground truth answer:** Yes
- **Question type:** Comparison Query

In [65]:
response = query_engine_KnowledgeGraph.query("Does 'The Verge' article suggest that Sam Bankman-Fried set withdrawal permissions based on FTX's total trading revenue, while 'Fortune' and 'TechCrunch' articles focus on the jury's determination of his truthfulness and allegations of committing fraud for personal gain, respectively, without mentioning specific operational practices like withdrawal permissions?")
pretty_print_response(response)

Response:
---------
Main response text: Yes, the 'The Verge' article does suggest that Sam Bankman-Fried set withdrawal permissions based on FTX's total trading revenue. On the other hand, the 'Fortune' and 'TechCrunch' articles focus on the jury's determination of his truthfulness and allegations of committing fraud for personal gain, respectively, without mentioning specific operational practices like withdrawal permissions.

Source Nodes:
-------------

Node 1:
ID: 7d9243b8-700a-41de-80dc-940dcf9879ff
Text snippet: Here are some facts extracted from the provided text:

Sam bankman-fried -> Is charged with -> Seven counts
Sam bankman-fried -> Defrauded -> Thousands of customers
Sam bankman-fried -> Raised -> $2 b...
Score: 1.0
--------------------------------------------------

Node 2:
ID: b73715c0-2c79-4fe4-8b7a-5879f511b467
Text snippet: Here are some facts extracted from the provided text:

John j. ray iii -> Is leading -> Ftx

Depending on what evidence is introduced during the t

In [66]:
response = query_engine_vector_RAG.query("Does 'The Verge' article suggest that Sam Bankman-Fried set withdrawal permissions based on FTX's total trading revenue, while 'Fortune' and 'TechCrunch' articles focus on the jury's determination of his truthfulness and allegations of committing fraud for personal gain, respectively, without mentioning specific operational practices like withdrawal permissions?")
pretty_print_response(response)

Response:
---------
Main response text: The 'The Verge' article does not suggest that Sam Bankman-Fried set withdrawal permissions based on FTX's total trading revenue. 'Fortune' and 'TechCrunch' articles focus on the jury's determination of his truthfulness and allegations of committing fraud for personal gain, respectively, without mentioning specific operational practices like withdrawal permissions.

Source Nodes:
-------------

Node 1:
ID: a5b5b633-a0ab-4b53-9821-1d16b87532f2
Text snippet: Though FTX hadn’t been in the business as long as competing exchanges such as Coinbase, Kraken, or Gemini, Bankman-Fried positioned himself as an important, boyish face for crypto. (At one point, Bank...
Score: 0.8762131696278546
--------------------------------------------------

Node 2:
ID: df6cb509-db36-4ad0-b800-b61ba23343af
Text snippet: Jane Rosenberg | Reuters\n\nTwo of Sam Bankman-Fried's former friends from MIT, who also worked at crypto exchange FTX while living with the company's foun

## **Question 3:** Which entity is currently engaged with Amazon to address competition concerns, facilitating dialogue with consumer groups against Meta, deploying staff within its AI Office for future regulations, and has previously focused on illegal content and disinformation issues related to the Israel-Hamas war, as reported by TechCrunch?

- **Ground truth answer:** The European Commission
- **Question type:** Inference Query

### GraphRAG Query

In [72]:
response = query_engine_KnowledgeGraph.query("Which entity is currently engaged with Amazon to address competition concerns, facilitating dialogue with consumer groups against Meta, deploying staff within its AI Office for future regulations, and has previously focused on illegal content and disinformation issues related to the Israel-Hamas war, as reported by TechCrunch? Short answer.")
pretty_print_response(response)

Response:
---------
Main response text: The European Commission.

Source Nodes:
-------------

Node 1:
ID: 03fd1aa2-299c-4d0c-accd-933a78a14326
Text snippet: Here are some facts extracted from the provided text:

Meta -> Added -> New defaults and features
Meta -> Monitors -> Platforms for violations
Meta -> Noted -> 20% of 12-year-olds used instagram daily
Meta -> Didn’t bother -> Attempting a smart display
Meta -> Allows -> Users choice
Meta -> Documented -> 4 million people under 13 on instagram in 2015
Meta -> Internally tracks -> Under-13s
Meta -> Operates without -> Legal basis
Meta -> Protects -> Walled gardens
Meta -> Taking feedback from -> Local partners
Meta -> Switch to -> Tracking users
Meta -> Flouted -> Datatilsynet order
Meta -> Supports -> Federal legislation
Meta -> Made -> Led light non-disableable
Meta -> Began regulating -> Itself
Meta -> Continues serving -> People in eu
Meta -> Proposed -> Consent-based model
Meta -> Takes risks -> Open sourcing
Meta -> Rolled out

### Vector RAG Query

In [70]:
response = query_engine_vector_RAG.query("Which entity is currently engaged with Amazon to address competition concerns, facilitating dialogue with consumer groups against Meta, deploying staff within its AI Office for future regulations, and has previously focused on illegal content and disinformation issues related to the Israel-Hamas war, as reported by TechCrunch? Short answer.")
pretty_print_response(response)

Response:
---------
Main response text: Microsoft

Source Nodes:
-------------

Node 1:
ID: f633fd8f-3ea6-4d5c-9233-cbbeddfef12f
Text snippet: We’re prioritizing livestream reports related to this crisis, above and beyond our existing prioritization of Live videos,” Meta wrote, highlighting measure it took in the wake of the 2019 Christchurc...
Score: 0.8402097497102254
--------------------------------------------------

Node 2:
ID: fba4da45-125a-4638-b5fb-6c559d32dee6
Text snippet: To help address misinformation, Google has also announced that it will soon be integrating new innovations in watermarking, metadata, and other techniques into its latest generative models.\n\n“Google...
Score: 0.8336075377108088
--------------------------------------------------


## Results

GraphRAG
| Question | Answer | Ground Truth | Correct? |
| -------- | -------- | ----------- | -------- |
| 1        | Sam Altman | Sam Altman | Yes      |
| 2        | Yes      | Yes         | Yes      |
| 3        | The European Commission | The European Commission | Yes      |

Vector RAG
| Question | Answer | Ground Truth | Correct? |
| -------- | -------- | ----------- | -------- |
| 1        | Dave Willner | Sam Altman | No      |
| 2        | No      | Yes         | No      |
| 3        | Microsoft | The European Commission | No      |

