New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add graph_stores, impl Simple KG & Nebula KG #2581
Conversation
0e6b231
to
38c32be
Compare
@wey-gu this is cool, thanks for the PR - will try to take some time today to review and offer suggestions on how to rebase |
Hey @wey-gu this is great! As @jerryjliu noted, we made some fairly significant changes to how we handle storage (see https://gpt-index.readthedocs.io/en/latest/how_to/storage.html) |
Some specific notes:
|
Happy to help and answer any questions you might have about 0.6.0 |
Dear @Disiok Got it! Thanks a lot! |
bfb3e79
to
2e1f9af
Compare
Dear @Disiok
wey: Got it, now I don't have to implement that :)
wey: This new abstraction is awesome, is it possible to enable chain-able storage context? The knowledge_graph index(now I added graph_store) comes with embedding support, which is memory based, is it possible to enable this embedding storage inside the knowledge_graph index consuming storage context in the future? Dear @jerryjliu
wey: now the graph_store was introduced towards StorageContext and knowledge_graph_index was adapted to be based on simpleGraphStore or nebulaGraphStore :) What do you think please of this change? Now it's storage context based :) I will be working on more typical stories/demos for documents to help users understand how it works and how it helps
in separate doc PR + blogs after this is merged. Thanks again! I am super excited about this change :D BR//Wey |
Sorry, will fix the lint and UT issue. |
ed9fcc3
to
abd3690
Compare
linting and UT passed now, locally, thanks! ❤ |
hey @wey-gu thanks for the changes - will take an action item to review this :) |
pushed another version to address the conflicts related to changes from GPTKnowledgeGraphIndex to KnowledgeGraphIndex by @Disiok |
Thanks @logan-markewich for helping with the review! Also, another rebase to resolve the lint error.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks super good, and thanks for tackling such a complicated PR!
A few minor nits, but my main worry is backwards compatibility (both with previously saved knowledge graph indexes, as well as previously saved indexes in general)
The storage context code likely needs a bit of TLC to be more robust
Lastly, and maybe not this PR, but some docs on using Nebula would be cool. (I actually still need to try setting it up lol)
examples/composable_indices/city_analysis/PineconeDemo-CityAnalysis.ipynb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will work on all great review/improvement points from Logan, thanks again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, make a lot of sense, I didn't think of this backward compatibility yet, but I'll try so in this round of push. thanks!
Still need a followup commit to address backwards compatibility of graph_store.json from the previous impl.
I think this is good to ship! Looking forward to your future work @wey-gu ! Knowledge graphs can be very powerful, and hoping llama-index can continue to be a great tool to leverage them |
SIMPLE_KG = "simple_kg" | ||
NEBULAGRAPH = "nebulagraph" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: where are these used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this just follows the same structure for the vector store registry -- is that registry used anywhere?
@@ -60,6 +60,7 @@ def __init__( | |||
self._storage_context = storage_context or StorageContext.from_defaults() | |||
self._docstore = self._storage_context.docstore | |||
self._vector_store = self._storage_context.vector_store | |||
self._graph_store = self._storage_context.graph_store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the base class should know about graph store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm true, but could say the same about the vector store? We can take a look at both in a future PR
|
||
""" | ||
if persist_dir is None: | ||
docstore = docstore or SimpleDocumentStore() | ||
index_store = index_store or SimpleIndexStore() | ||
vector_store = vector_store or SimpleVectorStore() | ||
graph_store = graph_store or SimpleGraphStore() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checking: there should be minimal overhead in doing this right?
Otherwise, we should just not construct an object and leave it as None, so it doesn't impact other type of indices that don't use graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same as a vector_store essentially in terms of impact -> should be near-zero.
Similar to how other indexes don't use the vector store, but we still instantiate it I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some nitpick comments, but I don't want to block merging this massive PR.
We can do more cleanup after landing as well.
Addressed a majority of your comments @Disiok I think it's good to land for now :) |
@wey-gu this is an amazing change, thanks for the contribution. thanks to @logan-markewich @Disiok for the reviews too. just a heads up, planning to publish this friday morning pacific time - let me know your twitter handle! |
Dear @jerryjliu , My handle is wey_gu :) Thanks! |
Dear @logan-markewich , Many thanks for the great help and guide(and big thanks to @Disiok !!)!! I am preparing for upcoming PRs/DEMOs, let's make LLMs understand more knowledge with graphs! BR//Wey |
f" [rel.`{self._rel_prop_names[0]}`, dst(rel)] " | ||
f"] AS rels " | ||
f"RETURN " | ||
f" subj," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.fixbug:
need add rels in return stmt like:
f" subj, rels,"
2.question:
when i add entity type in match stmt like:
MATCH (s:entity)
and add limit stmt like:
LIMIT 1000
it also have ValueError : Scan vertices or edges need to specify a limit number, or limit number can not push down.
my envs:
nebula3-python==3.4.0
NebulaGraph version is 3.1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please upgrade to NebulaGraph 3.5.0, and see what happens, this implementation was expecting NebulaGraph 3.5.0+
Need to load the existing data indexes from nebula graph into the retriever for knowledge graph query_engine. Is that functionality available in llama-index ? |
For text2cypher, yes, it's already implemented! Follow https://gpt-index.readthedocs.io/en/latest/examples/query_engine/knowledge_graph_query_engine.html is all needed. For Graph RAG(find major entities with keyword or embedding from the task, get subgraph as context), not yet fully supported. I am thinking of adding this soon. |
Thanks a lot @wey-gu for the update. However I am facing an issue while running KnowledgeGraphQueryEngine. NebulaGraphStore is being loaded but when executing the KGQueryEngine, was facing a query syntax - |
Dear @pachgadehardik Could you share the NebulaGraph version? If it's an older version of NebulaGraph like 3.1.0, it's highly recommended to be upgraded to NebulaGraph 3.5.0, this is basically just a binary replacement(offline) |
@wey-gu, the current version deployed in kubernetes cluster is 3.4.0 |
I see, the current schema fetching way is not compatible with the cluster that's older than 3.5.0, I'll pr to fix this today! |
It should be fixed in #7204 |
draft for RFC #1318
WIP:
implement load_from_disk and save_to_disknot needed after 0.6.x