# Kùzu Graph Store

This notebook walks through configuring `Kùzu` to be the backend for graph storage in LlamaIndex.

In [None]:
%pip install llama-index-llms-openai
%pip install llama-index-graph-stores-kuzu

In [None]:
# My OpenAI Key
import os

os.environ["OPENAI_API_KEY"] = "API_KEY_HERE"

In [None]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)

## Prepare for Kùzu

In [None]:
# Clean up all the directories used in this notebook
import shutil

shutil.rmtree("./test1", ignore_errors=True)
shutil.rmtree("./test2", ignore_errors=True)
shutil.rmtree("./test3", ignore_errors=True)

In [None]:
%pip install kuzu
import kuzu

db = kuzu.Database("test1")

/Users/prrao/code/llama_index/.venv/bin/python: No module named pip
Note: you may need to restart the kernel to use updated packages.


## Using Knowledge Graph with KuzuGraphStore

In [None]:
from llama_index.graph_stores.kuzu import KuzuGraphStore

graph_store = KuzuGraphStore(db)

#### Building the Knowledge Graph

In [None]:
from llama_index.core import SimpleDirectoryReader, KnowledgeGraphIndex
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from IPython.display import Markdown, display
import kuzu

In [None]:
documents = SimpleDirectoryReader(
    "../../../examples/data/paul_graham"
).load_data()

In [None]:
# define LLM

llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = llm
Settings.chunk_size = 512

In [None]:
from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults(graph_store=graph_store)

# NOTE: can take a while!
index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=2,
    storage_context=storage_context,
)
# # To reload from an existing graph store without recomputing each time, use:
# index = KnowledgeGraphIndex(nodes=[], storage_context=storage_context)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

#### Querying the Knowledge Graph

First, we can query and send only the triplets to the LLM.

In [None]:
query_engine = index.as_query_engine(
    include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about Interleaf",
)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [None]:
display(Markdown(f"<b>{response}</b>"))

<b>Interleaf was involved in making software, added a scripting language, was inspired by Emacs, taught what not to do, built impressive technology, and made software that became obsolete. Additionally, Interleaf made software that was replaced by a service, got crushed by Moore's law, and was affected by rapid change. The software made by Interleaf could launch as soon as it was done and was made with a certain technology.</b>

For more detailed answers, we can also send the text from where the retrieved tripets were extracted.

In [None]:
query_engine = index.as_query_engine(
    include_text=True, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about Interleaf",
)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: 8eb19ceb-9f3c-4135-9733-1c244e37d61e: [7] Technically the apartment wasn't rent-controlled but rent-stabilized, but...
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: 79786d26-ff72-4928-9d9d-89e9c27c3a85: less influenced by custom) will have an advantage in fields affected by rapid...
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: 928ff319-0234-46cc-9ce8-4e6a082cb08a: (If you're curious why my site looks so old-fashioned, it's because it's stil...
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: e67fb600-460b-4bdf-a0ac-49606c76e079: This name didn't last long before it was replaced by "software as a service,"...
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: a01a39af-2eee-4826-8ae6-b7a7260fd8c8: In ev

In [None]:
display(Markdown(f"<b>{response}</b>"))

<b>Interleaf was a company that made software for creating documents. They added a scripting language inspired by Emacs, which was a dialect of Lisp. The company had smart people and built impressive technology but ultimately got crushed by Moore's Law in the 1990s due to the exponential growth in the power of commodity processors. Interleaf's software could be launched as soon as it was done and was affected by rapid changes in the industry. Additionally, working at Interleaf taught valuable lessons about what not to do in software development.</b>

#### Query with embeddings

In [None]:
# NOTE: can take a while!
db = kuzu.Database("test2")
graph_store = KuzuGraphStore(db)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
new_index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=2,
    storage_context=storage_context,
    include_embeddings=True,
)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST

In [None]:
# query using top 3 triplets plus keywords (duplicate triplets are removed)
query_engine = index.as_query_engine(
    include_text=True,
    response_mode="tree_summarize",
    embedding_mode="hybrid",
    similarity_top_k=5,
)
response = query_engine.query(
    "Tell me more about what the author worked on at Interleaf",
)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: 8eb19ceb-9f3c-4135-9733-1c244e37d61e: [7] Technically the apartment wasn't rent-controlled but rent-stabilized, but...
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: 79786d26-ff72-4928-9d9d-89e9c27c3a85: less influenced by custom) will have an advantage in fields affected by rapid...
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: 928ff319-0234-46cc-9ce8-4e6a082cb08a: (If you're curious why my site looks so old-fashioned, it's because it's stil...
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: e67fb600-460b-4bdf-a0ac-49606c76e079: This name didn't last long before it was replaced by "software as a service,"...
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: a01a39af-2eee-4826-8ae6-b7a7260fd8c8: In ev

In [None]:
display(Markdown(f"<b>{response}</b>"))

<b>The author worked on software at Interleaf, where they added a scripting language inspired by Emacs. They mentioned that their experience at Interleaf taught them what not to do. Additionally, the author mentioned that the software they worked on at Interleaf could be launched as soon as it was done and was affected by rapid change.</b>

#### Visualizing the Graph

In [None]:
%pip install pyvis

In [None]:
## create graph
from pyvis.network import Network

g = index.get_networkx_graph()
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(g)
net.show("kuzugraph_draw.html")

kuzugraph_draw.html


#### [Optional] Try building the graph and manually add triplets!

In [None]:
from llama_index.core.node_parser import SentenceSplitter

In [None]:
node_parser = SentenceSplitter()

In [None]:
nodes = node_parser.get_nodes_from_documents(documents)

In [None]:
# initialize an empty database
db = kuzu.Database("test3")
graph_store = KuzuGraphStore(db)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
index = KnowledgeGraphIndex(
    [],
    storage_context=storage_context,
)

In [None]:
# add keyword mappings and nodes manually
# add triplets (subject, relationship, object)

# for node 0
node_0_tups = [
    ("author", "worked on", "writing"),
    ("author", "worked on", "programming"),
]
for tup in node_0_tups:
    index.upsert_triplet_and_node(tup, nodes[0])

# for node 1
node_1_tups = [
    ("Interleaf", "made software for", "creating documents"),
    ("Interleaf", "added", "scripting language"),
    ("software", "generate", "web sites"),
]
for tup in node_1_tups:
    index.upsert_triplet_and_node(tup, nodes[1])

In [None]:
query_engine = index.as_query_engine(
    include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
    "Tell me more about Interleaf",
)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [None]:
str(response)

'Interleaf was involved in creating documents and also added a scripting language to its software.'