feat: add graph_stores, impl Simple KG & Nebula KG #2581

wey-gu · 2023-05-06T09:08:25Z

draft for RFC #1318

add graph_stores
implement Simple kg and NebulaGraph kg

WIP:

get comments from jerry
continue to rebase for changes in 0.6.0(introduce graph_store to storage context)
~~implement load_from_disk and save_to_disk~~ not needed after 0.6.x

jerryjliu · 2023-05-06T18:14:59Z

@wey-gu this is cool, thanks for the PR - will try to take some time today to review and offer suggestions on how to rebase

Disiok · 2023-05-08T15:28:29Z

Hey @wey-gu this is great! As @jerryjliu noted, we made some fairly significant changes to how we handle storage (see https://gpt-index.readthedocs.io/en/latest/how_to/storage.html)

Disiok · 2023-05-08T15:32:42Z

Some specific notes:

for storage based on external connection (e.g. NebulaGraph) here, we no longer try to save the configuration, instead we ask user to reconstruct the connection
we created a storage context that bundles docstore, index store, and vector store. The question here is whether if we should add another graph store object into it. Would be great if you take a look at 0.6.0 and let us know your thoughts.

Disiok · 2023-05-08T15:33:05Z

Happy to help and answer any questions you might have about 0.6.0

wey-gu · 2023-05-08T15:35:35Z

Hey @wey-gu this is great! As @jerryjliu noted, we made some fairly significant changes to how we handle storage (see https://gpt-index.readthedocs.io/en/latest/how_to/storage.html)

Dear @Disiok

Got it!
I'll make changes based on new design of storage later and ask for further comments.

Thanks a lot!
Cheers// Wey

jerryjliu · 2023-05-10T05:23:56Z

awesome yeah @wey-gu adding on to what @Disiok said, it's possible a new graph store abstraction would be mostly used for specific indices, like our knowledge graph index. can be an optional part of our StorageContext that's none by default

wey-gu · 2023-05-25T09:38:55Z

Dear @Disiok

Some specific notes:

for storage based on external connection (e.g. NebulaGraph) here, we no longer try to save the configuration, instead we ask user to reconstruct the connection

wey: Got it, now I don't have to implement that :)

we created a storage context that bundles docstore, index store, and vector store. The question here is whether if we should add another graph store object into it. Would be great if you take a look at 0.6.0 and let us know your thoughts.

wey: This new abstraction is awesome, is it possible to enable chain-able storage context? The knowledge_graph index(now I added graph_store) comes with embedding support, which is memory based, is it possible to enable this embedding storage inside the knowledge_graph index consuming storage context in the future?

Dear @jerryjliu

awesome yeah @wey-gu adding on to what @Disiok said, it's possible a new graph store abstraction would be mostly used for specific indices, like our knowledge graph index. can be an optional part of our StorageContext that's none by default

wey: now the graph_store was introduced towards StorageContext and knowledge_graph_index was adapted to be based on simpleGraphStore or nebulaGraphStore :)

What do you think please of this change? Now it's storage context based :)

I will be working on more typical stories/demos for documents to help users understand how it works and how it helps

better consuming global/cross-node context
consuming existing knowledge graph in a custom retriever
composable index co-existing with other indexes
build a knowledge graph by simply dragging/putting docs in(with the help of llama hub from different sources)

in separate doc PR + blogs after this is merged.

Thanks again! I am super excited about this change :D

BR//Wey

wey-gu · 2023-05-29T04:08:31Z

Sorry, will fix the lint and UT issue.

wey-gu · 2023-05-29T08:16:04Z

linting and UT passed now, locally, thanks! ❤
cc @Disiok @jerryjliu

jerryjliu · 2023-05-31T19:04:07Z

hey @wey-gu thanks for the changes - will take an action item to review this :)

wey-gu · 2023-06-05T10:01:20Z

pushed another version to address the conflicts related to changes from GPTKnowledgeGraphIndex to KnowledgeGraphIndex by @Disiok

wey-gu · 2023-06-06T01:33:11Z

Thanks @logan-markewich for helping with the review!

Also, another rebase to resolve the lint error..

logan-markewich

This looks super good, and thanks for tackling such a complicated PR!

A few minor nits, but my main worry is backwards compatibility (both with previously saved knowledge graph indexes, as well as previously saved indexes in general)

The storage context code likely needs a bit of TLC to be more robust

Lastly, and maybe not this PR, but some docs on using Nebula would be cool. (I actually still need to try setting it up lol)

examples/composable_indices/city_analysis/PineconeDemo-CityAnalysis.ipynb

docs/examples/index_structs/knowledge_graph/example.html

llama_index/__init__.py

llama_index/data_structs/data_structs.py

llama_index/indices/knowledge_graph/base.py

llama_index/indices/knowledge_graph/retrievers.py

llama_index/storage/storage_context.py

wey-gu

Will work on all great review/improvement points from Logan, thanks again!

docs/examples/index_structs/knowledge_graph/example.html

llama_index/__init__.py

llama_index/data_structs/data_structs.py

llama_index/graph_stores/types.py

llama_index/indices/knowledge_graph/retrievers.py

wey-gu · 2023-06-07T02:48:10Z

llama_index/graph_stores/simple.py

Indeed, make a lot of sense, I didn't think of this backward compatibility yet, but I'll try so in this round of push. thanks!

Still need a followup commit to address backwards compatibility of graph_store.json from the previous impl.

logan-markewich · 2023-06-07T20:11:42Z

I think this is good to ship! Looking forward to your future work @wey-gu ! Knowledge graphs can be very powerful, and hoping llama-index can continue to be a great tool to leverage them

Disiok · 2023-06-07T20:29:47Z

llama_index/data_structs/struct_type.py

+    SIMPLE_KG = "simple_kg"
+    NEBULAGRAPH = "nebulagraph"


nit: where are these used?

I think this just follows the same structure for the vector store registry -- is that registry used anywhere?

Disiok · 2023-06-07T20:31:02Z

llama_index/indices/base.py

@@ -60,6 +60,7 @@ def __init__(
        self._storage_context = storage_context or StorageContext.from_defaults()
        self._docstore = self._storage_context.docstore
        self._vector_store = self._storage_context.vector_store
+        self._graph_store = self._storage_context.graph_store


I don't think the base class should know about graph store?

mmm true, but could say the same about the vector store? We can take a look at both in a future PR

llama_index/indices/knowledge_graph/base.py

llama_index/indices/knowledge_graph/retriever.py

Disiok · 2023-06-07T20:35:27Z

llama_index/storage/storage_context.py


        """
        if persist_dir is None:
            docstore = docstore or SimpleDocumentStore()
            index_store = index_store or SimpleIndexStore()
            vector_store = vector_store or SimpleVectorStore()
+            graph_store = graph_store or SimpleGraphStore()


just checking: there should be minimal overhead in doing this right?

Otherwise, we should just not construct an object and leave it as None, so it doesn't impact other type of indices that don't use graph.

It's the same as a vector_store essentially in terms of impact -> should be near-zero.

Similar to how other indexes don't use the vector store, but we still instantiate it I guess

Disiok

Added some nitpick comments, but I don't want to block merging this massive PR.
We can do more cleanup after landing as well.

logan-markewich · 2023-06-07T21:14:03Z

Addressed a majority of your comments @Disiok I think it's good to land for now :)

jerryjliu · 2023-06-08T05:42:23Z

@wey-gu this is an amazing change, thanks for the contribution. thanks to @logan-markewich @Disiok for the reviews too. just a heads up, planning to publish this friday morning pacific time - let me know your twitter handle!

wey-gu · 2023-06-08T06:12:39Z

@wey-gu this is an amazing change, thanks for the contribution. thanks to @logan-markewich @Disiok for the reviews too. just a heads up, planning to publish this friday morning pacific time - let me know your twitter handle!

Dear @jerryjliu ,
Thanks so much, I am honored to have the chance to bring something the Llama Index project and it's been an awesome experience working within the great Llama community with you, @logan-markewich and @Disiok , I am working on upcoming PR and demo-project/video on top of this change.

My handle is wey_gu :)

Thanks!

wey-gu · 2023-06-08T06:14:57Z

I think this is good to ship! Looking forward to your future work @wey-gu ! Knowledge graphs can be very powerful, and hoping llama-index can continue to be a great tool to leverage them

Dear @logan-markewich ,

Many thanks for the great help and guide(and big thanks to @Disiok !!)!! I am preparing for upcoming PRs/DEMOs, let's make LLMs understand more knowledge with graphs!

BR//Wey

BleakStone · 2023-07-10T09:40:38Z

llama_index/graph_stores/nebulagraph.py

+                f"  [rel.`{self._rel_prop_names[0]}`, dst(rel)] "
+                f"] AS rels "
+                f"RETURN "
+                f"  subj,"


1.fixbug:
need add rels in return stmt like:

f" subj, rels,"

2.question:
when i add entity type in match stmt like:

MATCH (s:entity)

and add limit stmt like:

LIMIT 1000

it also have ValueError : Scan vertices or edges need to specify a limit number, or limit number can not push down.

my envs:
nebula3-python==3.4.0
NebulaGraph version is 3.1.0

@wey-gu

Could you please upgrade to NebulaGraph 3.5.0, and see what happens, this implementation was expecting NebulaGraph 3.5.0+

pachgadehardik · 2023-08-02T06:37:57Z

Need to load the existing data indexes from nebula graph into the retriever for knowledge graph query_engine. Is that functionality available in llama-index ?

wey-gu · 2023-08-07T02:24:16Z

Need to load the existing data indexes from nebula graph into the retriever for knowledge graph query_engine. Is that functionality available in llama-index ?

@pachgadehardik

For text2cypher, yes, it's already implemented! Follow https://gpt-index.readthedocs.io/en/latest/examples/query_engine/knowledge_graph_query_engine.html is all needed.

For Graph RAG(find major entities with keyword or embedding from the task, get subgraph as context), not yet fully supported. I am thinking of adding this soon.

pachgadehardik · 2023-08-08T06:04:11Z

Need to load the existing data indexes from nebula graph into the retriever for knowledge graph query_engine. Is that functionality available in llama-index ?

@pachgadehardik

For text2cypher, yes, it's already implemented! Follow https://gpt-index.readthedocs.io/en/latest/examples/query_engine/knowledge_graph_query_engine.html is all needed.

For Graph RAG(find major entities with keyword or embedding from the task, get subgraph as context), not yet fully supported. I am thinking of adding this soon.

Thanks a lot @wey-gu for the update. However I am facing an issue while running KnowledgeGraphQueryEngine. NebulaGraphStore is being loaded but when executing the KGQueryEngine, was facing a query syntax -
ValueError: Query failed. Query:
MATCH ()-[e:relationship]->()
WITH e limit 1
MATCH (m)-[:relationship]->(n) WHERE id(m) == src(e) AND id(n) == dst(e)
RETURN "(:" + tags(m)[0] + ")-[:relationship]->(:" + tags(n)[0] + ")" AS rels
, Param: {}Error message: Scan vertices or edges need to specify a limit number, or limit number cannot push down.

wey-gu · 2023-08-08T06:31:43Z

Need to load the existing data indexes from nebula graph into the retriever for knowledge graph query_engine. Is that functionality available in llama-index ?

@pachgadehardik
For text2cypher, yes, it's already implemented! Follow https://gpt-index.readthedocs.io/en/latest/examples/query_engine/knowledge_graph_query_engine.html is all needed.
For Graph RAG(find major entities with keyword or embedding from the task, get subgraph as context), not yet fully supported. I am thinking of adding this soon.

Thanks a lot @wey-gu for the update. However I am facing an issue while running KnowledgeGraphQueryEngine. NebulaGraphStore is being loaded but when executing the KGQueryEngine, was facing a query syntax - ValueError: Query failed. Query: MATCH ()-[e:relationship]->() WITH e limit 1 MATCH (m)-[:relationship]->(n) WHERE id(m) == src(e) AND id(n) == dst(e) RETURN "(:" + tags(m)[0] + ")-[:relationship]->(:" + tags(n)[0] + ")" AS rels , Param: {}Error message: Scan vertices or edges need to specify a limit number, or limit number cannot push down.

Dear @pachgadehardik

Could you share the NebulaGraph version? If it's an older version of NebulaGraph like 3.1.0, it's highly recommended to be upgraded to NebulaGraph 3.5.0, this is basically just a binary replacement(offline)

pachgadehardik · 2023-08-08T11:46:47Z

Need to load the existing data indexes from nebula graph into the retriever for knowledge graph query_engine. Is that functionality available in llama-index ?

@pachgadehardik
For text2cypher, yes, it's already implemented! Follow https://gpt-index.readthedocs.io/en/latest/examples/query_engine/knowledge_graph_query_engine.html is all needed.
For Graph RAG(find major entities with keyword or embedding from the task, get subgraph as context), not yet fully supported. I am thinking of adding this soon.

Thanks a lot @wey-gu for the update. However I am facing an issue while running KnowledgeGraphQueryEngine. NebulaGraphStore is being loaded but when executing the KGQueryEngine, was facing a query syntax - ValueError: Query failed. Query: MATCH ()-[e:relationship]->() WITH e limit 1 MATCH (m)-[:relationship]->(n) WHERE id(m) == src(e) AND id(n) == dst(e) RETURN "(:" + tags(m)[0] + ")-[:relationship]->(:" + tags(n)[0] + ")" AS rels , Param: {}Error message: Scan vertices or edges need to specify a limit number, or limit number cannot push down.

Dear @pachgadehardik

Could you share the NebulaGraph version? If it's an older version of NebulaGraph like 3.1.0, it's highly recommended to be upgraded to NebulaGraph 3.5.0, this is basically just a binary replacement(offline)

@wey-gu, the current version deployed in kubernetes cluster is 3.4.0

wey-gu · 2023-08-09T02:38:47Z

Need to load the existing data indexes from nebula graph into the retriever for knowledge graph query_engine. Is that functionality available in llama-index ?

@pachgadehardik
For text2cypher, yes, it's already implemented! Follow https://gpt-index.readthedocs.io/en/latest/examples/query_engine/knowledge_graph_query_engine.html is all needed.
For Graph RAG(find major entities with keyword or embedding from the task, get subgraph as context), not yet fully supported. I am thinking of adding this soon.

Thanks a lot @wey-gu for the update. However I am facing an issue while running KnowledgeGraphQueryEngine. NebulaGraphStore is being loaded but when executing the KGQueryEngine, was facing a query syntax - ValueError: Query failed. Query: MATCH ()-[e:relationship]->() WITH e limit 1 MATCH (m)-[:relationship]->(n) WHERE id(m) == src(e) AND id(n) == dst(e) RETURN "(:" + tags(m)[0] + ")-[:relationship]->(:" + tags(n)[0] + ")" AS rels , Param: {}Error message: Scan vertices or edges need to specify a limit number, or limit number cannot push down.

Dear @pachgadehardik
Could you share the NebulaGraph version? If it's an older version of NebulaGraph like 3.1.0, it's highly recommended to be upgraded to NebulaGraph 3.5.0, this is basically just a binary replacement(offline)

@wey-gu, the current version deployed in kubernetes cluster is 3.4.0

I see, the current schema fetching way is not compatible with the cluster that's older than 3.5.0, I'll pr to fix this today!
Thanks for letting me know this and sorry for it!

wey-gu · 2023-08-09T10:34:47Z

pachgadehardik

It should be fixed in #7204

wey-gu force-pushed the external_kg branch 3 times, most recently from 0e6b231 to 38c32be Compare May 6, 2023 09:51

wey-gu mentioned this pull request May 6, 2023

RFC: Bring graph store to llama index #1318

Closed

3 tasks

wey-gu force-pushed the external_kg branch 3 times, most recently from bfb3e79 to 2e1f9af Compare May 25, 2023 09:27

wey-gu force-pushed the external_kg branch from 2e1f9af to 3fd9cb9 Compare May 26, 2023 07:11

wey-gu force-pushed the external_kg branch 2 times, most recently from ed9fcc3 to abd3690 Compare May 29, 2023 08:14

wey-gu force-pushed the external_kg branch from abd3690 to b502a2b Compare May 29, 2023 08:25

wey-gu force-pushed the external_kg branch from b502a2b to 73358b3 Compare June 5, 2023 09:41

logan-markewich self-requested a review June 5, 2023 21:43

feat: add graph_stores, impl Simple KG & Nebula KG

4ade800

wey-gu force-pushed the external_kg branch from 73358b3 to 4ade800 Compare June 6, 2023 01:31

logan-markewich reviewed Jun 6, 2023

View reviewed changes

wey-gu commented Jun 7, 2023

View reviewed changes

fix logan-markewich's comments

7013518

Still need a followup commit to address backwards compatibility of graph_store.json from the previous impl.

wey-gu and others added 6 commits June 7, 2023 16:11

SimpleGrpahStore from_persist_path lagacy kg json file

176df86

improve legacy support

06f4dad

linting

e56ec5f

modify attribute name

a7f2c46

Merge branch 'main' into external_kg

7a4e802

typing

f903d43

logan-markewich approved these changes Jun 7, 2023

View reviewed changes

Disiok reviewed Jun 7, 2023

View reviewed changes

llama_index/indices/knowledge_graph/base.py Outdated Show resolved Hide resolved

Disiok reviewed Jun 7, 2023

View reviewed changes

llama_index/indices/knowledge_graph/base.py Outdated Show resolved Hide resolved

Disiok reviewed Jun 7, 2023

View reviewed changes

llama_index/indices/knowledge_graph/retriever.py Outdated Show resolved Hide resolved

Disiok reviewed Jun 7, 2023

View reviewed changes

Disiok approved these changes Jun 7, 2023

View reviewed changes

address simon comments

7d50fc9

logan-markewich merged commit 7b9f6da into run-llama:main Jun 7, 2023
8 checks passed

BleakStone reviewed Jul 10, 2023

View reviewed changes

wey-gu mentioned this pull request Aug 9, 2023

feat: KG RAG query engine, enable Graph RAG on existing KGs #7204

Merged

13 tasks

feat: add graph_stores, impl Simple KG & Nebula KG #2581

feat: add graph_stores, impl Simple KG & Nebula KG #2581

Conversation

wey-gu commented May 6, 2023 • edited

jerryjliu commented May 6, 2023

Disiok commented May 8, 2023

Disiok commented May 8, 2023

Disiok commented May 8, 2023

wey-gu commented May 8, 2023

jerryjliu commented May 10, 2023

wey-gu commented May 25, 2023 • edited

wey-gu commented May 29, 2023

wey-gu commented May 29, 2023 • edited

jerryjliu commented May 31, 2023

wey-gu commented Jun 5, 2023

wey-gu commented Jun 6, 2023

logan-markewich left a comment • edited

Choose a reason for hiding this comment

wey-gu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

logan-markewich commented Jun 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Disiok left a comment

Choose a reason for hiding this comment

logan-markewich commented Jun 7, 2023 • edited

jerryjliu commented Jun 8, 2023

wey-gu commented Jun 8, 2023

wey-gu commented Jun 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pachgadehardik commented Aug 2, 2023

wey-gu commented Aug 7, 2023 • edited

pachgadehardik commented Aug 8, 2023

wey-gu commented Aug 8, 2023

pachgadehardik commented Aug 8, 2023

wey-gu commented Aug 9, 2023

wey-gu commented Aug 9, 2023

wey-gu commented May 6, 2023 •

edited

wey-gu commented May 25, 2023 •

edited

wey-gu commented May 29, 2023 •

edited

logan-markewich left a comment •

edited

logan-markewich commented Jun 7, 2023 •

edited

wey-gu commented Aug 7, 2023 •

edited