# RAG Knowledge Graph

In [1]:
%pip install llama-index-core 
%pip install llama-index-graph-stores-neo4j
%pip install llama-index-llms-mistralai
%pip install llama-index-embeddings-mistralai
!pip install llama-index-embeddings-openai --upgrade
!pip install python-dotenv

Note: you may need to restart the kernel to use updated packages.
Collecting llama-index-core<0.12.0,>=0.11.0 (from llama-index-graph-stores-neo4j)
  Using cached llama_index_core-0.11.16-py3-none-any.whl.metadata (2.4 kB)
Using cached llama_index_core-0.11.16-py3-none-any.whl (1.6 MB)
Installing collected packages: llama-index-core
  Attempting uninstall: llama-index-core
    Found existing installation: llama-index-core 0.10.68.post1
    Uninstalling llama-index-core-0.10.68.post1:
      Successfully uninstalled llama-index-core-0.10.68.post1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index-vector-stores-pinecone 0.1.7 requires llama-index-core<0.11.0,>=0.10.11.post1, but you have llama-index-core 0.11.16 which is incompatible.
llama-index-readers-s3 0.1.8 requires llama-index-core<0.11.0,>=0.10.37.post1, but you have llama-index-core 0.11.16 

## Docker Setup

In [None]:
!docker run \
    -p 7474:7474 -p 7687:7687 \
    -v $PWD/data:/data -v $PWD/plugins:/plugins \
    --name neo4j-apoc \
    -e NEO4J_apoc_export_file_enabled=true \
    -e NEO4J_apoc_import_file_enabled=true \
    -e NEO4J_apoc_import_file_use__neo4j__config=true \
    -e NEO4JLABS_PLUGINS=\[\"apoc\"\] \
    neo4j:latest

Unable to find image 'neo4j:latest' locally
latest: Pulling from library/neo4j

[1Bc7c08455: Pulling fs layer 
[1B7d49f52d: Pulling fs layer 
[1Bc86c75ab: Pulling fs layer 
[1B4dbbe75b: Pulling fs layer 
[1B6abc7988: Pulling fs layer 
[5B7d49f52d: Downloading  121.9MB/144MBMB[6A[2K[5A[2K[6A[2K[5A[2K[6A[2K[5A[2K[6A[2K[5A[2K[6A[2K[6A[2K[5A[2K[3A[2K[6A[2K[5A[2K[6A[2K[6A[2K[6A[2K[6A[2K[5A[2K[6A[2K[5A[2K[6A[2K[6A[2K[5A[2K[6A[2K[5A[2K[6A[2K[6A[2K[6A[2K[6A[2K[2A[2K[5A[2K[2A[2K[5A[2K[6A[2K[6A[2K[2A[2K[6A[2K[2A[2K[5A[2K[6A[2K[2A[2K[5A[2K[6A[2K[2A[2K[6A[2K[5A[2K[2A[2K[6A[2K[2A[2K[5A[2K[2A[2K[2A[2K[6A[2K[5A[2K[6A[2K[5A[2K[2A[2K[2A[2K[6A[2K[5A[2K[2A[2K[6A[2K[2A[2K[5A[2K[2A[2K[2A[2K[6A[2K[5A[2K[6A[2K[2A[2K[2A[2KDownloading  13.48MB/127.2MB[2A[2K[2A[2K[5A[2K[6A[2K[2A[2K[2A[2K[6A[2K[5A[2K[6A[2K[6A[2K[5A[2K[6A[2K[2A[2K[6A[2K[

## Setup

In [None]:
import nest_asyncio
from IPython.display import Markdown, display
from dotenv import load_dotenv
import os 
import pprint

load_dotenv()
nest_asyncio.apply()


In [None]:
# from llama_index.embeddings.mistralai import MistralAIEmbedding
# from llama_index.llms.mistralai import MistralAI
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings


llm = OpenAI(model='gpt-4o-mini')
embed_model = OpenAIEmbedding(model="text-embedding-3-small")

## Download Data 

In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

--2024-10-02 15:36:02--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘data/paul_graham/paul_graham_essay.txt’


2024-10-02 15:36:02 (1.12 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



## Load Data 

In [None]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader('data/paul_graham').load_data()

## Index Construction 

In [None]:
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore

# Note: used to be `Neo4jPGStore`
graph_store = Neo4jPropertyGraphStore(
    username=os.environ['NEO4J_USERNAME'],
    password=os.environ['NEO4J_PASSWORD'],
    url="bolt://localhost:7687",
)



In [None]:
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SimpleLLMPathExtractor

index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=embed_model,
    kg_extractors=[
        SimpleLLMPathExtractor(llm=llm)
    ],
    property_graph=graph_store,
    show_progress=True,
    
)
    

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting paths from text: 100%|██████████| 22/22 [00:17<00:00,  1.28it/s]
Generating embeddings: 100%|██████████| 1/1 [00:02<00:00,  2.37s/it]
Generating embeddings: 100%|██████████| 5/5 [00:05<00:00,  1.12s/it]


In [None]:
from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model

## Retrivers 

In [None]:
from llama_index.core.indices.property_graph import (
    LLMSynonymRetriever, 
    VectorContextRetriever
)

llm_synonym_retriever = LLMSynonymRetriever(
    index.property_graph_store,
    llm=llm,
    include_text=False,  
)

vector_context_retriever = VectorContextRetriever(
    index.property_graph_store,
    embed_model=embed_model,
    include_text=False,
)

## Querying

In [None]:
from pprint import pprint

retriever = index.as_retriever(
    sub_retrievers=[
        llm_synonym_retriever,
        vector_context_retriever
    ]
)

nodes = retriever.retrieve("What did the author do at Viaweb?")

for node in nodes:
    print(node.text)

Company -> Put -> Art galleries online
I -> Didn't want to run -> Company
Viaweb -> Seemed -> Lame
Viaweb -> Charged -> $300 a month
Company -> Called -> Viaweb
Viaweb -> Was -> Inexpensive
Viaweb -> Charged -> $100 a month
I -> Decided to start -> Company
Viaweb -> Was -> Growing rapidly
Company -> Was -> At mercy of investors
Viaweb -> Was -> Profitable
Code editor -> Was in -> Viaweb
Yahoo -> Bought -> Viaweb
Sam -> Wanted to start -> Startup


## Query Engine

In [None]:
query_engine = index.as_query_engine(include_text=True)

response = query_engine.query("what did the author do at Viaweb")

display(Markdown(f"{response.response}"))

At Viaweb, the author was involved in developing an online store builder. He worked on the application builder while collaborating with others on network infrastructure and services like images and phone calls. He also engaged in building stores for users, which helped him learn about retail and user experience. Additionally, he played a significant role in the overall management and direction of the company, navigating the challenges of running a startup during the Internet Bubble.