#### Limitations of vector RAG 
1. **Themes and relationships** - Document embedding captures semantic meaning but struggles to capture themes and relationships between entities in the document corpus.
2. **Scalability** - as the volume of the database grows, the retrieval process can become less efficient, as the computational load increases with the search space.
3. **Diverse Data** - the structured and diverse data are harder to embed. 

In [3]:
import time
from langchain_community.document_loaders import WikipediaLoader
from langchain_text_splitters import TokenTextSplitter

# Function to load Wikipedia data with retry mechanism
def load_wikipedia_data(query, retries=3, delay=5):
    for attempt in range(retries):
        try:
            loader = WikipediaLoader(query=query)
            raw_documents = loader.load()
            return raw_documents
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt < retries - 1:
                time.sleep(delay)
            else:
                raise

# Load Wikipedia data with retry mechanism
query = "Large language model"
raw_documents = load_wikipedia_data(query)

# Split the documents
text_splitter = TokenTextSplitter(chunk_size=100, chunk_overlap=20)
documents = text_splitter.split_documents(raw_documents[:3])

# Print the first document
print(documents[0])

page_content='A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.
The largest and most capable LLMs are artificial neural networks built with a decoder-only transformer-based architecture, enabling efficient processing and generation of large-scale text data. Modern models can be fine-tun' metadata={'title': 'Large language model', 'summary': 'A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.\nThe largest and most capable LLMs are artificial neural networks built with a

In [13]:
import time
from langchain_community.document_loaders import WikipediaLoader
from langchain_text_splitters import TokenTextSplitter

# Load Wikipedia data with retry mechanism
query = "Large language model"
raw_documents = WikipediaLoader(query=query).load()

# Split the documents
text_splitter = TokenTextSplitter(chunk_size=100, chunk_overlap=20)
documents = text_splitter.split_documents(raw_documents)

# Print the first document
print(documents[0])

page_content='A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.
The largest and most capable LLMs are artificial neural networks built with a decoder-only transformer-based architecture, enabling efficient processing and generation of large-scale text data. Modern models can be fine-tun' metadata={'title': 'Large language model', 'summary': 'A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. As language models, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a self-supervised and semi-supervised training process.\nThe largest and most capable LLMs are artificial neural networks built with a

#### Document to graph

#### create nodes and edges data structure

In [5]:
from langchain_openai import ChatOpenAI
from langchain_experimental.graph_transformers import LLMGraphTransformer
import os 

openai_api_key = os.getenv("OPENAI_API_KEY")

llm = ChatOpenAI (api_key = openai_api_key, temperature=0, model_name="gpt-4o-mini")
llm_transformer = LLMGraphTransformer(llm=llm)

graph_documents = llm_transformer.convert_to_graph_documents(documents)
print(graph_documents)

[GraphDocument(nodes=[Node(id='Large Language Model', type='Concept'), Node(id='Natural Language Processing', type='Concept'), Node(id='Language Generation', type='Concept'), Node(id='Artificial Neural Networks', type='Concept'), Node(id='Decoder-Only Transformer-Based Architecture', type='Concept'), Node(id='Text Data', type='Concept')], relationships=[Relationship(source=Node(id='Large Language Model', type='Concept'), target=Node(id='Natural Language Processing', type='Concept'), type='DESIGNED_FOR'), Relationship(source=Node(id='Large Language Model', type='Concept'), target=Node(id='Language Generation', type='Concept'), type='DESIGNED_FOR'), Relationship(source=Node(id='Large Language Model', type='Concept'), target=Node(id='Artificial Neural Networks', type='Concept'), type='IS_A_TYPE_OF'), Relationship(source=Node(id='Artificial Neural Networks', type='Concept'), target=Node(id='Decoder-Only Transformer-Based Architecture', type='Concept'), type='BUILT_WITH'), Relationship(sour

#### render the graph using neo4j

In [6]:
!pip install neo4j



In [None]:
import os 
from langchain_community.graphs import Neo4jGraph
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI

openai_api_key = os.getenv("OPENAI_API_KEY")
graph = Neo4jGraph(url="bolt://localhost:7687", username="neo4j", password="esoterics17")

# generation of graph from the text using LLM
llm = ChatOpenAI(api_key=openai_api_key, temperature=0, model_name="gpt-4o-mini")
llm_transformer = LLMGraphTransformer(llm=llm)
graph_documents = llm_transformer.convert_to_graph_documents(documents[:3])

# adding the graph into the Neo4j database
graph.add_graph_documents(
    graph_documents=graph_documents,
    include_source=True,
    baseEntityLabel=True 
)
graph.refresh_schema()

#### Querying the Graph 

![](images/cipher.png)

In [22]:
# query the graph 
schema = graph.get_schema
print(schema)
results = graph.query("""
MATCH (gpt4:Model {id:"Gpt-3"})-[:DEVELOPED_BY]->(org:Organization)
RETURN org
""")
print (results)

Node properties:
Document {id: STRING, text: STRING, summary: STRING, source: STRING, title: STRING}
Concept {id: STRING}
Date {id: STRING}
Model {id: STRING}
Timeperiod {id: STRING}
Dataset {id: STRING}
Organization {id: STRING}
Event {id: STRING}
Technology {id: STRING}
Paper {id: STRING}
Person {id: STRING}
Leaderboard {id: STRING}
Architecture {id: STRING}
Ai model {id: STRING}
Entity {id: STRING}
Ai assistant {id: STRING}
Technique {id: STRING}
Language model {id: STRING}
Company {id: STRING}
Software {id: STRING}
Product {id: STRING}
Measurement {id: STRING}
Equation {id: STRING}
Unknown {id: STRING}
Variable {id: STRING}
Constant {id: STRING}
Mathematical expression {id: STRING}
Function {id: STRING}
Parameter {id: STRING}
Year {id: STRING}
Process {id: STRING}
Version {id: STRING}
Unit {id: STRING}
License {id: STRING}
Mathematicalexpression {id: STRING}
Platform {id: STRING}
Service {id: STRING}
Feature {id: STRING}
Quantity {id: STRING}
Relationship properties:

The relations

```cypher
MATCH (p:Person)-[:KNOWN_FOR]->(c:Concept {id: 'Theory of Relativity'})
RETURN p
```
**Explanation:**
- ```(p:Person)```: Matches nodes labeled Person.
- ```[:KNOWN_FOR]```: Matches the relationship indicating the person is known for a particular concept.
- ```(c:Concept {id: 'Theory of Relativity'})```: Matches nodes labeled Concept with the id property set to "Theory of Relativity".
- ```RETURN p```: Returns the Person node(s) that match this condition.

**code for getting scientist who is known for Theory of Relativity**
```python   
# Print the graph schema
print(graph.get_schema)

# Query the graph
results = graph.query("""
MATCH (relativity:Concept {id: "Theory Of Relativity"}) <-[:KNOWN_FOR]- (scientist:Person)
return scientist
""")

print(results[0])
```


![](images/graphical-rag-arch.png)
![](images/graph-cypher-qa-chain.png)

In [28]:
from langchain_community.chains.graph_qa.cypher import GraphCypherQAChain
chain = GraphCypherQAChain.from_llm(
    llm=ChatOpenAI(api_key=openai_api_key, temperature=0, model_name="gpt-4o-mini"), graph=graph, verbose=True
)
result = chain.invoke({"query":"Model Gpt-3 was developed by which organization?"})
print(result)

#MATCH (gpt4:Model {id:"Gpt-3"})-[:DEVELOPED_BY]->(org:Organization)
#RETURN org



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Model {id: 'Gpt-3'})<-[:DEVELOPED_BY]-(o:Organization) RETURN o[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m
{'query': 'Model Gpt-3 was developed by which organization?', 'result': "I don't know the answer."}
