<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/query_engine/knowledge_graph_query_engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Knowledge Graph Query Engine

Creating a Knowledge Graph usually involves specialized and complex tasks. However, by utilizing the Llama Index (LLM), the KnowledgeGraphIndex, and the GraphStore, we can facilitate the creation of a relatively effective Knowledge Graph from any data source supported by [Llama Hub](https://llamahub.ai/).

Furthermore, querying a Knowledge Graph often requires domain-specific knowledge related to the storage system, such as Cypher. But, with the assistance of the LLM and the LlamaIndex KnowledgeGraphQueryEngine, this can be accomplished using Natural Language!

In this demonstration, we will guide you through the steps to:

- Extract and Set Up a Knowledge Graph using the Llama Index
- Query a Knowledge Graph using Cypher
- Query a Knowledge Graph using Natural Language

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [14]:
# %pip install llama-index-readers-wikipedia
# %pip install llama-index-llms-azure-openai
# %pip install llama-index-graph-stores-nebula
# %pip install llama-index-llms-openai
# %pip install llama-index-embeddings-azure-openai

Let's first get ready for basic preparation of Llama Index.

### OpenAI

In [15]:
# For OpenAI

import os

os.environ["OPENAI_API_KEY"] = "sk-a0khYUHBaQOMhZkcr5pNgNEcl6i9-4cmpmjKxIShyLT3BlbkFJxt6vnTi02sQ7JSk_39KIv9j4JoRS8tCO3mZ-3V8cQA"

import logging
import sys

logging.basicConfig(
    stream=sys.stdout, level=logging.INFO
)  # logging.DEBUG for more verbose output


# define LLM
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

Settings.llm = OpenAI(temperature=0, model="gpt-4o-mini")
Settings.chunk_size = 512
# Settings.chunk_overlap =32

### Azure

In [16]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# For OpenAI
api_key = "sk-a0khYUHBaQOMhZkcr5pNgNEcl6i9-4cmpmjKxIShyLT3BlbkFJxt6vnTi02sQ7JSk_39KIv9j4JoRS8tCO3mZ-3V8cQA"

llm = OpenAI(model="gpt-4o-mini")

# You need to deploy your own embedding model as well as your own chat completion model
embed_model = OpenAIEmbedding(model="text-embedding-3-large")


In [17]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embed_model
# Settings.chunk_size = 512
# Settings.chunk_overlap =32

## Prepare for NebulaGraph

Before next step to creating the Knowledge Graph, let's ensure we have a running NebulaGraph with defined data schema.

In [18]:
# Create a NebulaGraph (version 3.5.0 or newer) cluster with:
# Option 0 for machines with Docker: `curl -fsSL nebula-up.siwei.io/install.sh | bash`
# Option 1 for Desktop: NebulaGraph Docker Extension https://hub.docker.com/extensions/weygu/nebulagraph-dd-ext

# If not, create it with the following commands from NebulaGraph's console:
# CREATE SPACE llamaindex(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);
# :sleep 10;
# USE llamaindex;
# CREATE TAG entity(name string);
# CREATE EDGE relationship(relationship string);
# :sleep 10;
# CREATE TAG INDEX entity_index ON entity(name(256));

# %pip install ipython-ngql nebula3-python

os.environ["NEBULA_USER"] = "root"
os.environ["NEBULA_PASSWORD"] = "nebula"  # default is "nebula"
os.environ[
    "NEBULA_ADDRESS"
] = "127.0.0.1:9669"  # assumed we have NebulaGraph installed locally

space_name = "tvpl_graph"
edge_types, rel_prop_names = ["relationship"], [
    "relationship"
]  # default, could be omit if create from an empty kg
tags = ["entity"]  # default, could be omit if create from an empty kg

Prepare for StorageContext with graph_store as NebulaGraphStore

In [19]:
from llama_index.core import StorageContext
from llama_index.graph_stores.nebula import NebulaGraphStore
graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)
storage_context = StorageContext.from_defaults(graph_store=graph_store)

## (Optional)Build the Knowledge Graph with LlamaIndex

With the help of Llama Index and LLM defined, we could build Knowledge Graph from given documents.

If we have a Knowledge Graph on NebulaGraphStore already, this step could be skipped

### Step 1, load data from Wikipedia for "Guardians of the Galaxy Vol. 3"

In [20]:
# from llama_index.core import download_loader

# from llama_index.readers.wikipedia import WikipediaReader

# loader = WikipediaReader()

# documents = loader.load_data(
#     pages=["Guardians of the Galaxy Vol. 3"], auto_suggest=False
# )

In [21]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir= r"C:\Users\My_Pc\Desktop\sample_test")
documents = reader.load_data(num_workers=4)

In [22]:
from llama_index.core import KnowledgeGraphIndex

# Tạo KnowledgeGraphIndex với các cạnh đã dịch và hai chiều
kg_index = KnowledgeGraphIndex.from_documents(
    documents=documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    space_name=space_name,
    edge_types=edge_types,  # Sử dụng cạnh hai chiều
    rel_prop_names=rel_prop_names,  # Sử dụng quan hệ đã dịch
    tags=tags,
    include_embeddings=True,
)

# In kết quả đồ thị để kiểm tra
print(kg_index)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST

In [70]:
query_engine = kg_index.as_query_engine(
    include_text=False, response_mode="tree_summarize"
)
response1 = query_engine.query(
    "Người khuyết tật đặc biệt nặng",
)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Now we have a Knowledge Graph on NebulaGraph cluster under space named `llamaindex` about the 'Guardians of the Galaxy Vol. 3' movie, let's play with it a little bit.

In [71]:
from IPython.display import Markdown, display

response1.metadata

{'cfe5b90b-63ef-439c-943c-a7d98de9e1f2': {'kg_rel_texts': ["('Người khuyết tật đặc biệt nặng', 'Được kết luận', 'Không còn khả năng tự phục vụ')",
   "('Người khuyết tật đặc biệt nặng', 'Suy giảm khả năng lao động', 'Từ 81% trở lên')"],
  'kg_rel_map': {},
  'kg_schema': {'schema': "Node properties: [{'tag': 'entity', 'properties': [('name', 'string')]}]\nEdge properties: [{'edge': 'relationship', 'properties': [('relationship', 'string')]}]\nRelationships: ['(:entity)-[:relationship]->(:entity)']\n"}}}

In [7]:
# install related packages, password is nebula by default
%pip install ipython-ngql networkx pyvis
%load_ext ngql
%ngql --address 127.0.0.1 --port 9669 --user root --password <nebula>

Note: you may need to restart the kernel to use updated packages.
[1;3;38;2;0;135;107m[OK] Connection Pool Created[0m
INFO:nebula3.logger:Get connection to ('127.0.0.1', 9669)


Unnamed: 0,Name
0,quocminhdb
1,tvpl_ccpls
2,tvpl_graph
3,tvpl_graph_final


In [8]:
# Query some random Relationships with Cypher
%ngql USE tvpl_graph;
%ngql MATCH ()-[e]->() RETURN e LIMIT 10

INFO:nebula3.logger:Get connection to ('127.0.0.1', 9669)
INFO:nebula3.logger:Get connection to ('127.0.0.1', 9669)


Unnamed: 0,e
0,"(""Bộ lao động - thương binh và xã hội"")-[:relationship@6674116680599171975{relationship: ""Hướng dẫn về kinh phí""}]->(""Quy định tại khoản 5 điều này"")"
1,"(""Bộ tài chính"")-[:relationship@6674116680599171975{relationship: ""Hướng dẫn về kinh phí""}]->(""Quy định tại khoản 5 điều này"")"
2,"(""Chính phủ"")-[:relationship@2430196491825356717{relationship: ""Quy định""}]->(""Mức trợ cấp xã hội hàng tháng"")"
3,"(""Con bạn"")-[:relationship@8320799937296828072{relationship: ""Thuộc diện khuyết tật nặng""}]->(""Được hưởng bhyt theo quy định"")"
4,"(""Cấp thẻ bảo hiểm y tế"")-[:relationship@2430196491825356717{relationship: ""Quy định""}]->(""Pháp luật về bảo hiểm y tế"")"
5,"(""Gia đình có người khuyết tật đặc biệt nặng"")-[:relationship@1996260917724520936{relationship: ""Được hỗ trợ""}]->(""Kinh phí chăm sóc hàng tháng"")"
6,"(""Giấy xác nhận khuyết tật"")-[:relationship@-7744571359366948951{relationship: ""Cấp""}]->(""Theo quy định của ngân sách nhà nước"")"
7,"(""Hội đồng"")-[:relationship@508125737386925613{relationship: ""Thực hiện""}]->(""Quan sát trực tiếp người khuyết tật"")"
8,"(""Hội đồng"")-[:relationship@508125737386925613{relationship: ""Thực hiện""}]->(""Sử dụng phương pháp khác theo quy định"")"
9,"(""Hội đồng"")-[:relationship@2105189237134172629{relationship: ""Quan sát người khuyết tật""}]->(""Thông qua thực hiện hoạt động đơn giản phục vụ nhu cầu sinh hoạt cá nhân hàng ngày"")"


## Asking the Knowledge Graph

Finally, let's demo how to Query Knowledge Graph with Natural language!

Here, we will leverage the `KnowledgeGraphQueryEngine`, with `NebulaGraphStore` as the `storage_context.graph_store`.

In [9]:
graph_store = NebulaGraphStore(
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)

storage_context = StorageContext.from_defaults(graph_store=graph_store)

# NOTE: can take a while!
index = KnowledgeGraphIndex.from_documents(
    documents,
    storage_context=storage_context,
    max_triplets_per_chunk=10,
    space_name=space_name,
    edge_types=edge_types,
    rel_prop_names=rel_prop_names,
    tags=tags,
)

NameError: name 'KnowledgeGraphIndex' is not defined

In [155]:
query_engine = index.as_query_engine()

response = query_engine.query("Mức lương hưu hằng tháng được tính như thế nào")

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:llama_index.core.indices.knowledge_graph.retrievers:> No relationships found, returning nodes found by keywords.
INFO:llama_index.core.indices.knowledge_graph.retrievers:> No nodes found by keywords, returning empty response.
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [50]:
# NOTE: can take a while!
new_index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=10,
    include_embeddings=True,
)



INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST

NameError: name 'index' is not defined

In [77]:
# query using top 3 triplets plus keywords (duplicate triplets are removed)
query_engine = new_index.as_query_engine(
    include_text=True,
    response_mode="tree_summarize",
    embedding_mode="hybrid",
    similarity_top_k=5,
)
response = query_engine.query(
    "Người khuyết tật đặc biệt nặng",
)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: 699e537b-7c44-428f-9fa8-06e6369da132: - Hội đồng giám định y khoa xác định, kết luận về dạng tật và mức độ khuyết t...
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: d374e6f3-1c6e-426a-a53d-f5b5ec39bd5c: Có phải người khuyết tật nào cũng được hưởng trợ cấp xã hội hàng tháng?
Căn ...
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: 70fe1e33-68da-484a-81c7-a4be3d9c9ddb: Người khuyết tật có được cấp thẻ bảo hiểm y tế miễn phí?
Căn cứ Điều 9 Nghị ...
INFO:llama_index.core.indices.knowledge_graph.retrievers:> Querying with idx: 94f6b128-4630-4d1b-98b0-e6b3c0433d64: - Trường hợp văn bản của Hội đồng giám định y khoa trước ngày Nghị định này c...
INFO:httpx:HTTP Request: POST https://

In [78]:
response

Response(response='Người khuyết tật đặc biệt nặng là những người được Hội đồng giám định y khoa kết luận không còn khả năng tự phục vụ hoặc có mức suy giảm khả năng lao động từ 81% trở lên. Họ được hưởng trợ cấp xã hội hàng tháng và có quyền được cấp thẻ bảo hiểm y tế miễn phí.', source_nodes=[NodeWithScore(node=TextNode(id_='699e537b-7c44-428f-9fa8-06e6369da132', embedding=None, metadata={'file_path': 'C:\\Users\\My_Pc\\Desktop\\sample_test\\THPL_MHLH_PhapLuat_8084_3.txt', 'file_name': 'THPL_MHLH_PhapLuat_8084_3.txt', 'file_type': 'text/plain', 'file_size': 3211, 'creation_date': '2024-10-30', 'last_modified_date': '2024-10-30'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='7cb14352-b022-4d04-b1f1-9db88

In [80]:
response1.metadata

{'cfe5b90b-63ef-439c-943c-a7d98de9e1f2': {'kg_rel_texts': ["('Người khuyết tật đặc biệt nặng', 'Được kết luận', 'Không còn khả năng tự phục vụ')",
   "('Người khuyết tật đặc biệt nặng', 'Suy giảm khả năng lao động', 'Từ 81% trở lên')"],
  'kg_rel_map': {},
  'kg_schema': {'schema': "Node properties: [{'tag': 'entity', 'properties': [('name', 'string')]}]\nEdge properties: [{'edge': 'relationship', 'properties': [('relationship', 'string')]}]\nRelationships: ['(:entity)-[:relationship]->(:entity)']\n"}}}

In [81]:
response.metadata

{'699e537b-7c44-428f-9fa8-06e6369da132': {'file_path': 'C:\\Users\\My_Pc\\Desktop\\sample_test\\THPL_MHLH_PhapLuat_8084_3.txt',
  'file_name': 'THPL_MHLH_PhapLuat_8084_3.txt',
  'file_type': 'text/plain',
  'file_size': 3211,
  'creation_date': '2024-10-30',
  'last_modified_date': '2024-10-30'},
 'd374e6f3-1c6e-426a-a53d-f5b5ec39bd5c': {'file_path': 'C:\\Users\\My_Pc\\Desktop\\sample_test\\THPL_MHLH_PhapLuat_8084_1.txt',
  'file_name': 'THPL_MHLH_PhapLuat_8084_1.txt',
  'file_type': 'text/plain',
  'file_size': 1461,
  'creation_date': '2024-10-30',
  'last_modified_date': '2024-10-30'},
 '70fe1e33-68da-484a-81c7-a4be3d9c9ddb': {'file_path': 'C:\\Users\\My_Pc\\Desktop\\sample_test\\THPL_MHLH_PhapLuat_8084_2.txt',
  'file_name': 'THPL_MHLH_PhapLuat_8084_2.txt',
  'file_type': 'text/plain',
  'file_size': 951,
  'creation_date': '2024-10-30',
  'last_modified_date': '2024-10-30'},
 '94f6b128-4630-4d1b-98b0-e6b3c0433d64': {'file_path': 'C:\\Users\\My_Pc\\Desktop\\sample_test\\THPL_MHLH_P

In [157]:
graph_query = query_engine.generate_query(
    "Tell me about Peter Quill?",
)

graph_query = graph_query.replace("WHERE", "\n  WHERE").replace(
    "RETURN", "\nRETURN"
)

display(
    Markdown(
        f"""
```cypher
{graph_query}
```
"""
    )
)

AttributeError: 'RetrieverQueryEngine' object has no attribute 'generate_query'

We could see it helps generate the Graph query:

```cypher
MATCH (p:`entity`)-[:relationship]->(e:`entity`)
  WHERE p.`entity`.`name` == 'Peter Quill'
RETURN e.`entity`.`name`;
```
And synthese the question based on its result:

```json
{'e2.entity.name': ['grandfather', 'alternate version of Gamora', 'Guardians of the Galaxy']}
```

Of course we still could query it, too! And this query engine could be our best Graph Query Language learning bot, then :).

In [None]:
%%ngql
MATCH (p:`entity`)-[e:relationship]->(m:`entity`)
  WHERE p.`entity`.`name` == 'Peter Quill'
RETURN p.`entity`.`name`, e.relationship, m.`entity`.`name`;

INFO:nebula3.logger:Get connection to ('127.0.0.1', 9669)


Unnamed: 0,p.entity.name,e.relationship,m.entity.name
0,Peter Quill,would return to the MCU,May 2021
1,Peter Quill,was abducted from Earth,as a child
2,Peter Quill,is leader of,Guardians of the Galaxy
3,Peter Quill,was raised by,a group of alien thieves and smugglers
4,Peter Quill,is half-human,half-Celestial


And change the query to be rendered

In [None]:
%%ngql
MATCH (p:`entity`)-[e:relationship]->(m:`entity`)
  WHERE p.`entity`.`name` == 'Peter Quill'
RETURN p, e, m;

INFO:nebula3.logger:Get connection to ('127.0.0.1', 9669)


Unnamed: 0,p,e,m
0,"(""Peter Quill"" :entity{name: ""Peter Quill""})","(""Peter Quill"")-[:relationship@-84437522554765...","(""May 2021"" :entity{name: ""May 2021""})"
1,"(""Peter Quill"" :entity{name: ""Peter Quill""})","(""Peter Quill"")-[:relationship@-11770408155938...","(""as a child"" :entity{name: ""as a child""})"
2,"(""Peter Quill"" :entity{name: ""Peter Quill""})","(""Peter Quill"")-[:relationship@-79394488349732...","(""Guardians of the Galaxy"" :entity{name: ""Guar..."
3,"(""Peter Quill"" :entity{name: ""Peter Quill""})","(""Peter Quill"")-[:relationship@325695233021653...","(""a group of alien thieves and smugglers"" :ent..."
4,"(""Peter Quill"" :entity{name: ""Peter Quill""})","(""Peter Quill"")-[:relationship@555553046209276...","(""half-Celestial"" :entity{name: ""half-Celestia..."


In [None]:
%ng_draw

nebulagraph_draw.html


The results of this knowledge-fetching query could not be more clear from the renderred graph then.