# GraphRAG in Memgraph

In this tutorial, we will build GraphRAG using the Memgraph ecosystem and
OpenAI. This example is based on a portion of a fixed Game of Thrones dataset,
which will be enriched with unstructured data to create a knowledge graph. 

To search for relevant information, in this example we will use vector search on
node embeddings to find schematically relevant data. Following this, the
structured data will be extracted from the graph and passed to LLM to answer the
question. 

## Prerequisites

To begin with this tutorial, you will need Docker, Python and an OpenAI API key.
With a few small tweaks, you can adapt this setup to run on your local Ollama
environment. 

First, we need to start Memgraph with the vector search capabilities. You can do
this by running the following command: 


```
docker run -p 7687:7687 -p 7444:7444 memgraph/memgraph-mage:1.22-memgraph-2.22 \
  --log-level=TRACE \
  --also-log-to-stderr \
  --telemetry-enabled=False \
  --experimental-enabled=vector-search \
  --experimental-config='{"vector-search": {"got_index": {"label": "Entity", "property": "embedding", "dimension": 384, "capacity": 1000, "metric": "cos"}}}'
```



You can run this command outside of this notebook. 

Once Memgraph is running in the background, make sure to load the initial Game
of Thrones dataset:  


```
cat ./data/memgraph-export-got.cypherl | docker run -i memgraph/mgconsole --host=localhost
echo "MATCH (n), ()-[r]->() RETURN count(DISTINCT n) AS node_count, count(DISTINCT r) AS relationship_count;" | docker run -i memgraph/mgconsole --host=host.docker.internal
```
```
+--------------------+--------------------+
| node_count         | relationship_count |
+--------------------+--------------------+
| 2677               | 11967              |
+--------------------+--------------------+
```

After the dataset is ingested, install a few Python packages needed to run the demo:  

In [None]:

%pip install neo4j                   # for driver and connection to Memgraph
%pip install sentence-transformers   # for calculating sentence embeddings
%pip install openai                  # for access to LLM
%pip install python-dotenv           # for environment variables


Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Enrich knowledge graph with the embeddings 

In GraphRAG, you are not writing actual Cypher queries, but you are
asking the questions about your domain knowledge graph in plain English. From your question, you want to
retrieve relevant parts of the knowledge graph. 

To achieve this, you can encode the semantic meaning into the graph so you can locate
the semantically similar parts of the graph based on the question you have provided.

There are a several approaches to consider: embedding the node labels and
properties, embedding the triplets related to a node or embedding specific paths
a node can take. Adding more data into embeddings requires a vector with more
dimensions, which can be costly in terms of memory and performance. 

However, embedding triplets or paths, will yield better results, and you can
locate semantically similar parts of the graph with greater accuracy. This means
that for longer questions, semantic search is more likely to find the right part
of the graph. 

If the semantic search misses relevant parts of the graph, the LLM will not be
able to answer the question correctly. 

To illustrate a basic example, here is a function that calculates embeddings
based on the node labels an properties: 


In [27]:
def compute_embeddings(driver, model):
    with driver.session() as session:

        # Retrieve all nodes
        result = session.run("MATCH (n) RETURN n")

        for record in result:
            node = record["n"]
            # Combine node labels and properties into a single string
            node_data = (
                " ".join(node.labels)
                + " "
                + " ".join(f"{k}: {v}" for k, v in node.items())
            )

            # Compute the embedding for the node
            node_embedding = model.encode(node_data)

            # Store the embedding back into the node
            session.run(
                f"MATCH (n) WHERE id(n) = {node.element_id} SET n.embedding = {node_embedding.tolist()}"
            )

        # Set the label to Entity for all nodes
        session.run("MATCH (n) SET n:Entity")

If we have a node `:Character {name:"Viserys Targaryen"}` in the graph, the
encoded embedding will include the label `:Charater` and the property
`name:Viserys Targaryen`.

Asking the question `Who is Viserys Targaryen?` will yield a very similar
embedding, allowing you to locate that node in the graph. However, if you ask a
longer question like, `To whom was Viserys Targaryen Loyal in season 1 of Game
of Thrones?`, there is a chance that this question might not locate the `Viserys
Targaryen` node in the graph due to its length and complexity. 

Embedding a triplet on the node will yield a better result in this case. 

In the end, each node will get a `Entity` label so you can perform the vector search on top of all the nodes in the database. 

## Finding the relevant part of the graph

Once embeddings are calculated in your graph, you can perform a search based on
these embeddings by using a vector search. 

Memgraph supports vector search starting from version 2.22.  

The goal is to find the most similar node that resembles your question and to
extract the relevant knowledge from it. The function takes the question's
embedding and compares it to the embeddings stored on the nodes.

In [28]:
def find_most_similar_node(driver, question_embedding):

    with driver.session() as session:
        # Perform the vector search on all nodes based on the question embedding
        result = session.run(
            f"CALL vector_search.search('got_index', 10, {question_embedding.tolist()}) YIELD * RETURN *;"
        )
        nodes_data = []
        
        # Retrieve all similar nodes and print them
        for record in result:
            node = record["node"]
            properties = {k: v for k, v in node.items() if k != "embedding"}
            node_data = {
                "distance": record["distance"],
                "id": node.element_id,
                "labels": list(node.labels),
                "properties": properties,
            }
            nodes_data.append(node_data)
        print("All similar nodes:")
        for node in nodes_data:
            print(node)

        # Return the most similar node
        return nodes_data[0] if nodes_data else None

Based on the similarity between the question embeddings and node embeddings, we
get the most similar node. This node serves as a pivot point from which we can
pull relevant data. For example, if we are searching for information about
`Viserys Targaryen`, we would pull data surrounding that node, making it our
pivot node. 

## Getting the relevant data

Once we have the pivot node, we can begin retrieving the relevant structured
data around it. The most straightforward approach is to perform multiple hops
from the pivot node. 

Here is the function that fetches the data around pivot node, a specified number
of `hops` away from the pivot node.  


In [29]:
def get_relevant_data(driver, node, hops):
    with driver.session() as session:
        # Retrieve the paths from the node to other nodes that are 'hops' away
        query = (
            f"MATCH path=((n)-[r*..{hops}]-(m)) WHERE id(n) = {node['id']} RETURN path"
        )
        result = session.run(query)

        paths = []
        for record in result:
            path_data = []
            for segment in record["path"]:

                # Process start node without 'embedding' property
                start_node_data = {
                    k: v for k, v in segment.start_node.items() if k != "embedding"
                }

                # Process relationship data
                relationship_data = {
                    "type": segment.type,
                    "properties": segment.get("properties", {}),
                }

                # Process end node without 'embedding' property
                end_node_data = {
                    k: v for k, v in segment.end_node.items() if k != "embedding"
                }

                # Add to path_data as a tuple (start_node, relationship, end_node)
                path_data.append((start_node_data, relationship_data, end_node_data))

            paths.append(path_data)

        # Return all paths
        return paths

TODO: Insert a picture showing this. 

To avoid overloading the LLM's limited context with non-relevant data, we drop
the embedding property from the nodes. Embeddings contain a lot of data that
isn't particularly relevant to the LLM. 

## Helper functions 

For the LLM to understand its task, we need specific prompts. The `RAG_prompt`
describes how the LLM should answer the question, while the `question_prompt` is
optimized for calculating question embeddings by extracting only the key pices
of information to improve embedding accuracy. For example, if you ask, `Who is
Viserys Targaryen?`, only the `Viserys Targaryen` will be extracted from the
question. Ultimately, the LLM will receive the full question back in the
`RAG_prompt`.

In [30]:
def RAG_prompt(question, relevance_expansion_data):
    prompt = f"""
    You are an AI language model. I will provide you with a question and a set of data obtained through a relevance expansion process in a graph database. The relevance expansion process finds nodes connected to a target node within a specified number of hops and includes the relationships between these nodes.

    Question: {question}

    Relevance Expansion Data:
    {relevance_expansion_data}

    Based on the provided data, please answer the question, make sure to base your answers only based on the provided data. Add a context on what data did you base your answer on.
    """
    return prompt


def question_prompt(question):
    prompt = f"""
    You are an AI language model. I will provide you with a question. 
    Extract the key information from the questions. The key information is important information that is required to answer the question.

    Question: {question}

    The output format should be like this: 
    Key Information: [key information 1], [key information 2], ...
    """
    return prompt


async def get_response(client, prompt):
    response = await client.chat.completions.create(
        model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content


## Running the graphRAG

Now, it all comes together in the `main` function: 

1. Connect to the database 
2. Load the .env file with the `OPENAI_API_KEY=` defined
3. Compute and store the node embeddings 
4. Compute the question embedding based on key information 
5. Perform the vector search to find the most semantically similar node
6. Get the relevant data that is a few hops away from the pivot node
7. Ask LLM the question with the relevant data 

In [32]:
## Get all dependencies 
from sentence_transformers import SentenceTransformer
from dotenv import load_dotenv
import neo4j
import asyncio
from openai import AsyncOpenAI
import os
import json
from collections import Counter
from pathlib import Path
import nest_asyncio

# Create a driver
driver = neo4j.GraphDatabase.driver("bolt://localhost:7687", auth=("", ""))
# Load the SentenceTransformer model
model = SentenceTransformer("paraphrase-MiniLM-L6-v2")
compute_embeddings(driver, model)

In [33]:
# Ask a question  (feel free to change the question) 
question = "In which episode was Viserys Targaryen killed?"

def ask_question(driver, question, model):
    nest_asyncio.apply()

    # Load .env file
    load_dotenv()
    os.environ["TOKENIZERS_PARALLELISM"] = "false"

    client = AsyncOpenAI()

    # Key information from the question 
    prompt = question_prompt(question)
    response = asyncio.run(get_response(client, prompt))
    print(response)
    key_information = response.split("Key Information: ")[1].strip()

    # Compute the embedding for the key information
    question_embedding = model.encode(key_information)

    # Find the most similar node to the question embedding
    node = find_most_similar_node(driver, question_embedding)
    if node:
        print("The most similar node is:")
        print(node)

    # Get the relevant data based on the most similar node
    relevant_data = get_relevant_data(driver, node, hops=2)

    # Show the relevant data
    print("The relevant data is:")
    print(relevant_data)

    # LLM answers the question based on the relevant data
    prompt = RAG_prompt(question, relevant_data)
    response = asyncio.run(get_response(client, prompt))
    print("The response is:")
    print(response)

ask_question(driver, question, model)

Key Information: Viserys Targaryen, killed, episode
All similar nodes:
{'distance': 0.7424314022064209, 'id': '387', 'labels': ['Episode', 'Entity'], 'properties': {'imdb_rating': 9.1, 'name': 'Book of the Stranger', 'number': 4}}
{'distance': 0.7443174123764038, 'id': '412', 'labels': ['Episode', 'Entity'], 'properties': {'imdb_rating': 9.5, 'name': 'The Dance of Dragons', 'number': 9}}
{'distance': 0.7458397150039673, 'id': '382', 'labels': ['Episode', 'Entity'], 'properties': {'imdb_rating': 8.6, 'name': 'Dark Wings Dark Words', 'number': 2}}
{'distance': 0.7462430000305176, 'id': '398', 'labels': ['Episode', 'Entity'], 'properties': {'imdb_rating': 8.5, 'name': 'High Sparrow', 'number': 3}}
{'distance': 0.7473532557487488, 'id': '430', 'labels': ['Episode', 'Entity'], 'properties': {'imdb_rating': 9.9, 'name': 'The Winds of Winter', 'number': 10}}
{'distance': 0.7474308013916016, 'id': '442', 'labels': ['Episode', 'Entity'], 'properties': {'imdb_rating': 8.4, 'name': 'No One', 'num

Exception ignored in: <coroutine object get_response at 0x308b706d0>
Traceback (most recent call last):
  File "<string>", line 1, in <lambda>
KeyError: '__import__'
Exception ignored in: <coroutine object get_response at 0x308b706d0>
Traceback (most recent call last):
  File "<string>", line 1, in <lambda>
KeyError: '__import__'


BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 16385 tokens. However, your messages resulted in 36302 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}


Here are a few examples of questions and answers: 

**To whom was Viserys Targaryen loyal to?**

The response is:

>Based on the provided data, Viserys Targaryen was loyal to House Targaryen.
>This information is derived from the relationships indicating that Viserys
>Targaryen was loyal to House Targaryen and the connections between them

**Who killed Viserys Targaryen in Game of thrones?**

The response is:

>Based on the provided relevance expansion data, Khal Drogo killed Viserys
>Targaryen in "Game of Thrones." This information is inferred from the
>relationship where Khal Drogo is linked to Viserys Targaryen with the action of
>being "KILLED" by Khal Drogo. The data does not show any other character
>directly killing Viserys Targaryen.

**"What was the weapon used to kill Viserys Targaryen in Game of Thrones?"**

The response is: 

>Based on the provided data, the weapon used to kill Viserys Targaryen in Game
>of Thrones was not explicitly mentioned. The data only shows that Khal Drogo
>was involved in the killing of Viserys Targaryen. There is no specific mention
>of the weapon used in the relevance expansion data. Therefore, I do not have
>enough information to answer the question about the weapon used to kill Viserys
>Targaryen.

This response is wrong, there is a method mentioned, not weapon, but LLM didn't catch the context due to different naming. 

**"Who betrayed Viserys Targaryen in Game of Thrones?"**

The response is:

>Based on the provided data, Khal Drogo betrayed Viserys Targaryen in Game of
>Thrones by killing him. This conclusion is drawn from the relationship between
>Khal Drogo and Viserys Targaryen where it is stated that Khal Drogo killed
>Viserys Targaryen.

This response is based on the killing relationship, but betrayal could have different consequences. 

Let's expand this knowledge. 

## Expanding the knowledge

Let's say that now we want to expand our existing knowledge graph with
additional information to enrich the dataset, provide more context and retrieve
more relevant data. 

In this example, we will take unstructured data, such as the
character description summary provided below, extract entities from that
summary, generate triplets to build the knowledge graph create queries and
eventually execute those queries in Memgraph to incorporate with the existing
graph. 

In [34]:
# Sample text summary for processing
summary = """
    Viserys Targaryen is the last living son of the former king, Aerys II Targaryen (the 'Mad King').
    As one of the last known Targaryen heirs, Viserys Targaryen is obsessed with reclaiming the Iron Throne and 
    restoring his family’s rule over Westeros. Ambitious and arrogant, he often treats his younger sister, Daenerys Targaryen, 
    as a pawn, seeing her only as a means to gain power. His ruthless ambition leads him to make a marriage alliance with 
    Khal Drogo, a powerful Dothraki warlord, hoping Khal Drogo will give him the army he needs. 
    However, Viserys Targaryen’s impatience and disrespect toward the Dothraki culture lead to his downfall;
    he is ultimately killed by Khal Drogo in a brutal display of 'a crown for a king' – having molten gold poured over his head. 
    """

## Entity extraction

TODO: add links 

The first step in the process is to extract entities from the summary using
SpaCy’s LLM.

To begin, we need to install SpaCy and the specific model we wll be using.

In [35]:
%pip install spacy
%pip install spacy_llm
!python -m spacy download en_core_web_md

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting en-core-web-md==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.8.0/en_core_web_md-3.8.0-py3-none-any.whl (33.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m33.5/33.5 MB[0m [31m92.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_md')


The goal of extracting entities from the text is to preprocess the data before
sending it to the GPT model, ensuring more accurate and relevant results. By
using SpaCy, we can identify key entities such as characters and locations
for a better understanding of the context of the text.

This is particularly useful because SpaCy is specifically trained to recognize
linguistic patterns and relationships in text, which helps to isolate and
highlight the most important pieces of information. By preprocessing the text
this way, we ensure that the GPT model receives a more structured input, helps
reduce noise and irrelevant data, leading to more precise and context-aware
outputs. 

In [36]:
import os
import spacy
from spacy_llm.util import assemble
import json
from collections import Counter
from pathlib import Path

# Split document into sentences
def split_document_sent(text, nlp):
    doc = nlp(text)
    return [sent.text.strip() for sent in doc.sents]


def process_text(text, nlp, verbose=False):
    doc = nlp(text)
    if verbose:
        print(f"Text: {doc.text}")
        print(f"Entities: {[(ent.text, ent.label_) for ent in doc.ents]}")
    return doc


# Pipeline to run entity extraction
def extract_entities(text, nlp, verbose=False):
    processed_data = []
    entity_counts = Counter()

    sentences = split_document_sent(text, nlp)
    for sent in sentences:
        doc = process_text(sent, nlp, verbose)
        entities = [(ent.text, ent.label_) for ent in doc.ents]

        # Store processed data for each sentence
        processed_data.append({"text": doc.text, "entities": entities})

        # Update counters
        entity_counts.update([ent[1] for ent in entities])

    # Export to JSON
    with open("processed_data.json", "w") as f:
        json.dump(processed_data, f)



## Extract node and relationship parameters

Now that we have extracted entities from the text, we have a better
understanding of the data and a more structured context to send to GPT model
we'll be using. The next step is to provide the extracted JSON file to the GPT
prompt, along with clear instructions on how to extract nodes and relationships
from those entities. These instructions will guide the model in identifying key
connections between the entities, which can then be used to build a knowledge
graph. In this example, we will be using the GPT-4 model. 

In [None]:
    
def enrich_graph_data(driver, summary, nlp):
    nest_asyncio.apply()
    
    load_dotenv()
    os.environ["TOKENIZERS_PARALLELISM"] = "false"

    client = AsyncOpenAI()

    # Load the spaCy model
    nlp = spacy.load("en_core_web_md")

    # Sample text summary for processing
    summary = """
        Viserys Targaryen is the last living son of the former king, Aerys II Targaryen (the 'Mad King').
        As one of the last known Targaryen heirs, Viserys Targaryen is obsessed with reclaiming the Iron Throne and 
        restoring his family’s rule over Westeros. Ambitious and arrogant, he often treats his younger sister, Daenerys Targaryen, 
        as a pawn, seeing her only as a means to gain power. His ruthless ambition leads him to make a marriage alliance with 
        Khal Drogo, a powerful Dothraki warlord, hoping Khal Drogo will give him the army he needs. 
        However, Viserys Targaryen’s impatience and disrespect toward the Dothraki culture lead to his downfall;
        he is ultimately killed by Khal Drogo in a brutal display of 'a crown for a king' – having molten gold poured over his head. 
    """

    extract_entities(summary, nlp)

    # Load processed data from JSON
    json_path = Path("processed_data.json")
    with open(json_path, "r") as f:
        processed_data = json.load(f)

    # Prepare nodes and relationships
    nodes = []
    relationships = []

    # Formulate a prompt for GPT-4
    prompt = (
        "Extract entities and relationships from the following JSON data. For each entry in data['entities'], "
        "create a 'node' dictionary with fields 'id' (unique identifier), 'name' (entity text), and 'type' (entity label). "
        "For entities that have meaningful connections, define 'relationships' as dictionaries with 'source' (source node id), "
        "'target' (target node id), and 'relationship' (type of connection). Create max 30 nodes, format relationships in the format of capital letters and _ inbetween words and format the entire response in the JSON output containing only variables nodes and relationships without any text inbetween. Use following labels for nodes: Character, Title, Location, House, Death, Event, Allegiance and following relationship types: HAPPENED_IN, SIBLING_OF, PARENT_OF, MARRIED_TO, HEALED_BY, RULES, KILLED, LOYAL_TO, BETRAYED_BY. Make sure the entire JSON file fits in the output"
        "JSON data:\n"
        f"{json.dumps(processed_data)}"
    )

    response = asyncio.run(get_response(client, prompt))

    structured_data = json.loads(response)  # Assuming GPT-4 outputs structured JSON

    # Populate nodes and relationships lists
    nodes.extend(structured_data.get("nodes", []))
    relationships.extend(structured_data.get("relationships", []))

    cypher_queries = generate_cypher_queries(nodes, relationships)
    with driver.session() as session:
        for query in cypher_queries:
            try:
                session.run(query)
                print(f"Executed query: {query}")
            except Exception as e:
                print(f"Error executing query: {query}. Error: {e}")


enrich_graph_data(driver, summary, nlp)

driver.close()

Error executing query: 
        MERGE (n:Character:Entity {name: 'Viserys Targaryen'}) 
        ON CREATE SET n.id=n1 
        ON MATCH SET n.id=n1
        . Error: {code: Memgraph.ClientError.MemgraphError.MemgraphError} {message: Unbound variable: n1.}
Error executing query: 
        MERGE (n:Character:Entity {name: 'Aerys II Targaryen'}) 
        ON CREATE SET n.id=n2 
        ON MATCH SET n.id=n2
        . Error: {code: Memgraph.ClientError.MemgraphError.MemgraphError} {message: Unbound variable: n2.}
Error executing query: 
        MERGE (n:Title:Entity {name: 'Targaryen'}) 
        ON CREATE SET n.id=n3 
        ON MATCH SET n.id=n3
        . Error: {code: Memgraph.ClientError.MemgraphError.MemgraphError} {message: Unbound variable: n3.}
Error executing query: 
        MERGE (n:Title:Entity {name: 'the Iron Throne'}) 
        ON CREATE SET n.id=n4 
        ON MATCH SET n.id=n4
        . Error: {code: Memgraph.ClientError.MemgraphError.MemgraphError} {message: Unbound variable: n4

## Generate queries

Now that GPT has provided us with the structured data for the nodes and
relationships, the next step is to generate the Cypher queries that we will use
to execute in Memgraph.

In [23]:
def generate_cypher_queries(nodes, relationships):
    queries = []

    # Create nodes
    for node in nodes:
        query = f"""
        MERGE (n:{node['type']}:Entity {{name: '{node['name']}'}}) 
        ON CREATE SET n.id={node['id']} 
        ON MATCH SET n.id={node['id']}
        """
        queries.append(query)

    # Create relationships
    for rel in relationships:
        query = f"MATCH (a {{id: {rel['source']}}}), (b {{id: {rel['target']}}}) " \
                f"CREATE (a)-[:{rel['relationship']}]->(b)"
        queries.append(query)

    return queries

cypher_queries = generate_cypher_queries(nodes, relationships)

## Execute queries

The final step is to execute those queries in Memgraph, enriching your graph
with the newly created context. 

In [None]:
with driver.session() as session:
    for query in cypher_queries:
        try:
            session.run(query)
            print(f"Executed query: {query}")
        except Exception as e:
            print(f"Error executing query: {query}. Error: {e}")

The dataset now has additional knowledge: 

```
MATCH (a {id: 1}), (b {id: 2}) CREATE (a)-[:PARENT_OF]->(b)
MATCH (a {id: 1}), (b {id: 3}) CREATE (a)-[:SIBLING_OF]->(b)
MATCH (a {id: 1}), (b {id: 4}) CREATE (a)-[:LOYAL_TO]->(b)
MATCH (a {id: 1}), (b {id: 5}) CREATE (a)-[:HAPPENED_IN]->(b)
MATCH (a {id: 1}), (b {id: 6}) CREATE (a)-[:HAPPENED_IN]->(b)
MATCH (a {id: 1}), (b {id: 7}) CREATE (a)-[:SIBLING_OF]->(b)
MATCH (a {id: 1}), (b {id: 8}) CREATE (a)-[:MARRIED_TO]->(b)
MATCH (a {id: 1}), (b {id: 9}) CREATE (a)-[:HEALED_BY]->(b)
MATCH (a {id: 8}), (b {id: 9}) CREATE (a)-[:RULES]->(b)
MATCH (a {id: 8}), (b {id: 1}) CREATE (a)-[:BETRAYED_BY]->(b)

```

As described earlier, before enriching the graph with more data: 

**"Who betrayed Viserys Targaryen in Game of Thrones?"**

The response was:

>Based on the provided data, Khal Drogo betrayed Viserys Targaryen in Game of
>Thrones by killing him. This conclusion is drawn from the relationship between
>Khal Drogo and Viserys Targaryen where it is stated that Khal Drogo killed
>Viserys Targaryen.

This response is based on the killing relationship, but betrayal could have
different consequences. So in a sense LLM made that conclusion based on the
wrong relationship. 

Now again asking the same question yields a correct answer: **"Who betrayed
Viserys Targaryen in Game of Thrones?"**

>Based on the provided data, Khal Drogo betrayed Viserys Targaryen in Game of
>Thrones. This conclusion is derived from the relationship between Viserys
>Targaryen and Khal Drogo, where Khal Drogo is connected to Viserys Targaryen
>through the 'BETRAYED_BY' relationship, indicating that Khal Drogo betrayed
>Viserys Targaryen.

