# GraphRAG Implementation with LlamaIndex - Experiment 3

[GraphRAG - LlamaIndex](https://docs.llamaindex.ai/en/stable/examples/cookbooks/GraphRAG_v2/)

# Setting up the environment

In [1]:
from llama_index.graph_stores.neo4j import Neo4jPGStore

username="neo4j"
password="admin123"
url="bolt://localhost:7687"

graph_store = Neo4jPGStore(
    username=username,
    password=password,
    url=url,
)

In [2]:
import os
from config import Config

os.environ["OPENAI_API_KEY"] = Config.OPENAI_API_KEY

In [9]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o")
embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")

# Loading Documents

In [4]:
from loader import load_epubs_from_dir

documents = load_epubs_from_dir("./data")
print(f"Loaded {len(documents)} documents")

Loaded 7 documents


In [5]:
from llama_index.core import Document

new_documents = []
for document in documents:
    new_documents.append(Document(text=document.text)) # Convert to Document object

# Splitting

In [6]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(
    chunk_size=1024,
    chunk_overlap=20,
)
nodes = splitter.get_nodes_from_documents(documents)
print(f"Total number of nodes: {len(nodes)}")

Total number of nodes: 542


# Building Knowledge Graph

In [7]:
KG_TRIPLET_EXTRACT_TMPL = """
-Goal-
Given a text document, identify all entities and their entity types from the text and all relationships among the identified entities.
Given the text, extract up to {max_knowledge_triplets} entity-relation triplets.

-Steps-
1. Identify all entities. For each identified entity, extract the following information:
- entity_name: Name of the entity, capitalized
- entity_type: Type of the entity
- entity_description: Comprehensive description of the entity's attributes and activities
Format each entity as ("entity"$$$$""$$$$""$$$$"")

2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.
For each pair of related entities, extract the following information:
- source_entity: name of the source entity, as identified in step 1
- target_entity: name of the target entity, as identified in step 1
- relation: relationship between source_entity and target_entity
- relationship_description: explanation as to why you think the source entity and the target entity are related to each other

Format each relationship as ("relationship"$$$$""$$$$""$$$$""$$$$"")

3. When finished, output.

-Real Data-
######################
text: {text}
######################
output:"""

In [10]:
import re
from graph_rag_extractor import GraphRAGExtractor
from typing import Any

entity_pattern = r'\("entity"\$\$\$\$"(.+?)"\$\$\$\$"(.+?)"\$\$\$\$"(.+?)"\)'
relationship_pattern = r'\("relationship"\$\$\$\$"(.+?)"\$\$\$\$"(.+?)"\$\$\$\$"(.+?)"\$\$\$\$"(.+?)"\)'


def parse_fn(response_str: str) -> Any:
    entities = re.findall(entity_pattern, response_str)
    relationships = re.findall(relationship_pattern, response_str)
    return entities, relationships


kg_extractor = GraphRAGExtractor(
    llm=llm,
    extract_prompt=KG_TRIPLET_EXTRACT_TMPL,
    max_paths_per_chunk=2,
    parse_fn=parse_fn,
)

In [11]:
from llama_index.core import PropertyGraphIndex

index = PropertyGraphIndex(
    nodes=nodes,
    kg_extractors=[kg_extractor],
    property_graph_store=graph_store,
    llm=llm,
    embed_model=embed_model,
    show_progress=True,
)

Extracting paths from text: 100%|██████████| 542/542 [16:06<00:00,  1.78s/it]
Generating embeddings: 100%|██████████| 6/6 [00:05<00:00,  1.16it/s]
Generating embeddings: 100%|██████████| 10/10 [00:04<00:00,  2.46it/s]


# Entity Deduplication

In [12]:
graph_store.structured_query("""
CREATE VECTOR INDEX entity IF NOT EXISTS
FOR (m:`__Entity__`)
ON m.embedding
OPTIONS {indexConfig: {
 `vector.dimensions`: 1536,
 `vector.similarity_function`: 'cosine'
}}
""")

[]

In [13]:
# Just for inspection
similarity_threshold = 0.9
word_edit_distance = 5
data = graph_store.structured_query("""
MATCH (e:__Entity__)
CALL {
  WITH e
  CALL db.index.vector.queryNodes('entity', 10, e.embedding)
  YIELD node, score
  WITH node, score
  WHERE score > toFLoat($cutoff)
      AND (toLower(node.name) CONTAINS toLower(e.name) OR toLower(e.name) CONTAINS toLower(node.name)
           OR apoc.text.distance(toLower(node.name), toLower(e.name)) < $distance)
      AND labels(e) = labels(node)
  WITH node, score
  ORDER BY node.name
  RETURN collect(node) AS nodes
}
WITH distinct nodes
WHERE size(nodes) > 1
WITH collect([n in nodes | n.name]) AS results
UNWIND range(0, size(results)-1, 1) as index
WITH results, index, results[index] as result
WITH apoc.coll.sort(reduce(acc = result, index2 IN range(0, size(results)-1, 1) |
        CASE WHEN index <> index2 AND
            size(apoc.coll.intersection(acc, results[index2])) > 0
            THEN apoc.coll.union(acc, results[index2])
            ELSE acc
        END
)) as combinedResult
WITH distinct(combinedResult) as combinedResult
// extra filtering
WITH collect(combinedResult) as allCombinedResults
UNWIND range(0, size(allCombinedResults)-1, 1) as combinedResultIndex
WITH allCombinedResults[combinedResultIndex] as combinedResult, combinedResultIndex, allCombinedResults
WHERE NOT any(x IN range(0,size(allCombinedResults)-1,1)
    WHERE x <> combinedResultIndex
    AND apoc.coll.containsAll(allCombinedResults[x], combinedResult)
)
RETURN combinedResult
""", param_map={'cutoff': similarity_threshold, 'distance': word_edit_distance})
for row in data:
    print(row)

{'combinedResult': ['Holy Qur’an', 'The Qur’an']}
{'combinedResult': ["GLORIOUS QUR'AN", "Glorious Qur'an"]}
{'combinedResult': ['Prophet', 'Prophet (S)', 'Prophet (SA)', 'Prophet Jesus', 'THE PROPHET', 'The Most Noble Prophet', 'The Prophet (S)']}
{'combinedResult': ['Philosophia', 'Philosophy']}
{'combinedResult': ['Shaykh Shihab ad-Din Suhravardi', 'Suhravardi']}
{'combinedResult': ['Ahl al-Bayt (a) World Assembly', 'Ahl al-Bayt World Assembly']}
{'combinedResult': ['Imam Al-Sadiq', 'Imam al-Sadiq']}
{'combinedResult': ['Imam Mahdi', 'Imam al-Mahdi']}
{'combinedResult': ['Imam of the Age', "Imam of the Age ('atfs)"]}
{'combinedResult': ['Imam Husayn', 'Imam al-Husayn']}
{'combinedResult': ['Imams', "Imams ('a)"]}
{'combinedResult': ['Imam Zayn al-‘Abidin', 'Imam Zayn al-‘Abidin (_‘a_)']}
{'combinedResult': ["'Abd al-Rahman ibn Muljim", 'Ibn Muljim']}
{'combinedResult': ['IMAM ALI', 'Imam']}
{'combinedResult': ['Self-Knowledge', 'Self-Knowledge, Second Edition']}
{'combinedResult': [

In [14]:
graph_store.structured_query("""
MATCH (e:__Entity__)
CALL {
  WITH e
  CALL db.index.vector.queryNodes('entity', 10, e.embedding)
  YIELD node, score
  WITH node, score
  WHERE score > toFLoat($cutoff)
      AND (toLower(node.name) CONTAINS toLower(e.name) OR toLower(e.name) CONTAINS toLower(node.name)
           OR apoc.text.distance(toLower(node.name), toLower(e.name)) < $distance)
      AND labels(e) = labels(node)
  WITH node, score
  ORDER BY node.name
  RETURN collect(node) AS nodes
}
WITH distinct nodes
WHERE size(nodes) > 1
WITH collect([n in nodes | n.name]) AS results
UNWIND range(0, size(results)-1, 1) as index
WITH results, index, results[index] as result
WITH apoc.coll.sort(reduce(acc = result, index2 IN range(0, size(results)-1, 1) |
        CASE WHEN index <> index2 AND
            size(apoc.coll.intersection(acc, results[index2])) > 0
            THEN apoc.coll.union(acc, results[index2])
            ELSE acc
        END
)) as combinedResult
WITH distinct(combinedResult) as combinedResult
// extra filtering
WITH collect(combinedResult) as allCombinedResults
UNWIND range(0, size(allCombinedResults)-1, 1) as combinedResultIndex
WITH allCombinedResults[combinedResultIndex] as combinedResult, combinedResultIndex, allCombinedResults
WHERE NOT any(x IN range(0,size(allCombinedResults)-1,1)
    WHERE x <> combinedResultIndex
    AND apoc.coll.containsAll(allCombinedResults[x], combinedResult)
)
CALL {
  WITH combinedResult
	UNWIND combinedResult AS name
	MATCH (e:__Entity__ {name:name})
	WITH e
	ORDER BY size(e.name) DESC // prefer longer names to remain after merging
	RETURN collect(e) AS nodes
}
CALL apoc.refactor.mergeNodes(nodes, {properties: {
    `.*`: 'discard'
}})
YIELD node
RETURN count(*)
""", param_map={'cutoff': similarity_threshold, 'distance': word_edit_distance})

[{'count(*)': 20}]

In [29]:
# Running it again
similarity_threshold = 0.9
word_edit_distance = 5
data = graph_store.structured_query("""
MATCH (e:__Entity__)
CALL {
  WITH e
  CALL db.index.vector.queryNodes('entity', 10, e.embedding)
  YIELD node, score
  WITH node, score
  WHERE score > toFLoat($cutoff)
      AND (toLower(node.name) CONTAINS toLower(e.name) OR toLower(e.name) CONTAINS toLower(node.name)
           OR apoc.text.distance(toLower(node.name), toLower(e.name)) < $distance)
      AND labels(e) = labels(node)
  WITH node, score
  ORDER BY node.name
  RETURN collect(node) AS nodes
}
WITH distinct nodes
WHERE size(nodes) > 1
WITH collect([n in nodes | n.name]) AS results
UNWIND range(0, size(results)-1, 1) as index
WITH results, index, results[index] as result
WITH apoc.coll.sort(reduce(acc = result, index2 IN range(0, size(results)-1, 1) |
        CASE WHEN index <> index2 AND
            size(apoc.coll.intersection(acc, results[index2])) > 0
            THEN apoc.coll.union(acc, results[index2])
            ELSE acc
        END
)) as combinedResult
WITH distinct(combinedResult) as combinedResult
// extra filtering
WITH collect(combinedResult) as allCombinedResults
UNWIND range(0, size(allCombinedResults)-1, 1) as combinedResultIndex
WITH allCombinedResults[combinedResultIndex] as combinedResult, combinedResultIndex, allCombinedResults
WHERE NOT any(x IN range(0,size(allCombinedResults)-1,1)
    WHERE x <> combinedResultIndex
    AND apoc.coll.containsAll(allCombinedResults[x], combinedResult)
)
RETURN combinedResult
""", param_map={'cutoff': similarity_threshold, 'distance': word_edit_distance})
for row in data:
    print(row)

As above query returned nothing, we can conclude that there are no duplicate entities in the dataset.

# Create QueryEngine

In [17]:
from graph_rag_store import GraphRAGStore

# Note: used to be `Neo4jPGStore`
graph_store_reader = GraphRAGStore(
    username="neo4j", password="admin123", url="bolt://localhost:7687"
)

In [18]:
from llama_index.core import PropertyGraphIndex

# This is used to create a new index from an existing one which is useful for creating multiple indices with different configurations
index_2 = PropertyGraphIndex.from_existing(
    property_graph_store=graph_store_reader,
    llm=llm,
    embed_model=embed_model,
    show_progress=True,
)

# Build Communities

In [20]:
index_2.property_graph_store.build_communities()

# Create Query Engine

In [21]:
from graph_query_engine import GraphRAGQueryEngine

query_engine = GraphRAGQueryEngine(
    graph_store=index_2.property_graph_store,
    llm=llm,
    embed_model=embed_model,
    index=index_2,
    similarity_top_k=5,
)

# Querying

In [22]:
from IPython.display import Markdown

In [23]:
response = query_engine.query(
    "What are scholarly arguments that prove god's existence?"
)
display(Markdown(f"{response.response}"))

The community summary highlights various Islamic beliefs and philosophical perspectives that indirectly support the existence of God, though it does not explicitly present scholarly arguments. Key arguments often discussed in Islamic theology and broader philosophical discourse include:

1. **Cosmological Argument**: This posits that everything that exists has a cause, and the universe's existence points to an uncaused cause, identified as God.

2. **Teleological Argument (Design Argument)**: The complexity and order in the universe suggest a purposeful design, implying an intelligent designer, God.

3. **Ontological Argument**: This philosophical argument asserts that the concept of a perfect being (God) necessitates existence, as existence is a requisite of perfection.

4. **Moral Argument**: The existence of objective moral values is best explained by a moral lawgiver, which is God.

5. **Experiential Argument**: Personal experiences of the divine are cited as evidence of God's existence.

These arguments, while rooted in Islamic thought, are part of broader philosophical discussions and are used to rationalize belief in God. The summary emphasizes the role of divine revelation, spiritual fulfillment, and the pursuit of knowledge in understanding God's existence within Islam.

In [24]:
response = query_engine.query(
    "What are some fallacious arguments given to prove god's existence? Explain the fallacies"
)
display(Markdown(f"{response.response}"))

The community summary does not directly address fallacious arguments used to prove God's existence, focusing instead on themes of divine guidance and spiritual relationships. However, common fallacies often critiqued in philosophical discussions about God's existence include:

1. **Argument from Ignorance (Ad Ignorantiam)**: Claiming something is true simply because it hasn't been proven false, such as asserting God's existence because it hasn't been disproven.

2. **Circular Reasoning (Begging the Question)**: Assuming the conclusion within the premises, like stating "God exists because the Bible says so, and the Bible is true because it is the word of God."

3. **Appeal to Authority (Argumentum ad Verecundiam)**: Relying on authority figures or texts as evidence without further justification, such as claiming God's existence based solely on religious leaders' assertions.

4. **False Dichotomy (False Dilemma)**: Presenting two options as the only possibilities when others exist, such as claiming either God exists or life has no meaning.

5. **Straw Man Argument**: Misrepresenting an opponent's position to make it easier to attack, like oversimplifying atheistic arguments.

6. **Appeal to Emotion**: Using emotional appeals rather than logical reasoning, such as arguing that God must exist because a godless universe is frightening.

These fallacies highlight the importance of sound reasoning and evidence in discussions about the existence of God.

In [25]:
response = query_engine.query(
    "List ten actions I can perform to get closer to Allah?  Do not include the basic wajib actions"
)
display(Markdown(f"{response.response}"))

To deepen your spiritual connection with Allah beyond obligatory actions, consider these ten practices:

1. **Engage in Dhikr:** Regularly recite and reflect on Allah's names and attributes to strengthen your spiritual bond.
2. **Study the Qur'an:** Dedicate time to understanding and reflecting on its teachings and messages.
3. **Seek Knowledge:** Pursue religious and spiritual learning to deepen your faith.
4. **Practice Gratitude:** Regularly express gratitude for Allah's blessings, fostering a humble mindset.
5. **Perform Voluntary Fasting:** Engage in fasting outside of Ramadan to cultivate self-discipline.
6. **Give Charity:** Regularly help those in need, fostering compassion and fulfilling Islamic teachings.
7. **Engage in Night Prayers:** Perform additional prayers like Tahajjud to seek closeness to Allah.
8. **Reflect on Creation:** Contemplate the beauty of Allah’s creation to enhance your appreciation and connection.
9. **Make Dua:** Communicate with Allah through personal prayers, sharing your hopes and gratitude.
10. **Serve Others:** Engage in acts of kindness and service, embodying prophetic teachings and fostering community.

These actions can enhance your spiritual growth and bring you closer to Allah.

In [26]:
response = query_engine.query(
    "What are actions one should perform at the end of the day?"
)
display(Markdown(f"{response.response}"))

At the end of the day, one can strengthen their connection with Allah and reflect on their faith through various spiritual practices. These include engaging in supplication and prayer to seek guidance and express gratitude, reflecting on the day's actions to ensure alignment with Islamic teachings, and seeking forgiveness for any shortcomings. Additionally, one should engage in devotional practices such as Dhikr and reading the Qur'an to maintain spiritual awareness and gratitude. Expressing humility and servitude, invoking blessings on the Prophet Muhammad and his progeny, and planning for spiritual growth are also important. These actions emphasize maintaining a strong spiritual connection, seeking divine mercy, and fostering personal and spiritual growth.

In [27]:
response = query_engine.query(
    "What is the islamic understanding of freedom, contrasted with the western understanding?"
)
display(Markdown(f"{response.response}"))

The Islamic understanding of freedom is deeply rooted in spiritual growth and alignment with divine will, emphasizing submission to Allah as the path to true liberation. This perspective views freedom as liberation from worldly desires and achieving spiritual fulfillment through adherence to divine guidance and moral principles. In contrast, the Western understanding of freedom often emphasizes individual autonomy, personal rights, and the ability to make choices without external constraints, focusing on personal liberty and self-expression. While Islamic freedom prioritizes spiritual and moral alignment with divine will, Western freedom centers on individual autonomy and personal choice.

In [28]:
response = query_engine.query(
    "How does the islamic concept of freedom relate to divine justice and why there is suffering in the world?"
)
display(Markdown(f"{response.response}"))

The Islamic concept of freedom is intricately linked to divine justice and the existence of suffering in the world. In Islam, freedom is understood as the ability to choose between right and wrong, guided by Allah's teachings. This freedom is a test of faith, allowing individuals to demonstrate their moral responsibility and accountability, which are essential for divine justice. Suffering is seen as a test or trial that serves a greater purpose, providing opportunities for spiritual growth, patience, and resilience. It is a reminder of the transient nature of worldly life and encourages believers to seek spiritual fulfillment and align their actions with divine will. Ultimately, true freedom in Islam is found in submission to Allah, leading to spiritual enlightenment and fulfillment, while divine justice ensures that every action is accounted for, with ultimate justice served in the hereafter.