-
Notifications
You must be signed in to change notification settings - Fork 5.9k
feat: KG RAG query engine, enable Graph RAG on existing KGs #7204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f6a2447
to
ac46ae6
Compare
ac46ae6
to
c011718
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks pretty good so far! Thanks for contributing this! 🙏🏻
Just to confirm how I think this is working
- Take a query, extract out keywords and synonyms of those keywords
- Use those to extract triplets that mention these keywords/synonyms
- Optionally combine those retrieved triplets with a text2cypher query result?
self._max_knowledge_sequence = max_knowledge_sequence | ||
self._verbose = verbose | ||
|
||
def _get_entities(self, query_str: str) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can _get_entities
and _expand_synonyms
be the same function? It seems like the only difference is the mode and prompt, which could be inputs to a unified function?
Would help clean up the code a bit I think!
raise ValueError("Invalid retriever mode.") | ||
|
||
nodes.extend(nodes_keyword) | ||
nodes.extend(nodes_embedding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodes_keyword
or nodes_embedding
might not be defined here depending on the mode? We might want to just extend directly after calling each retriever
i.e
nodes.extend(self._retrieve_keyword(query_bundle))
response_synthesizer: Optional[BaseSynthesizer] = None, | ||
**kwargs: Any, | ||
): | ||
self._retriever = KnowledgeGraphRAGRetriever( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this into a from_defaults()
function? Since a majority of the arguments are used only in the retriever, we can make the retriever an argument to the class instead, and handle setting up the retriever in from_defaults()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, is this entire class really needed? Seems like to me only the retriever is needed, and the retriever could be used with the normal RetrieverQueryEngine()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the whole thing of RAG is actually a retriever, will do so with RetrieverQueryEngine!(👍🏻)
Yes, exactly! And for the entity/keyword search of step 1, in the future, we could introduce the embedding-based capability to enable semantic searching, too. For step 2: extract triplets from KG, or we could say it's getting n-depth subgraphs of those entities. And for step 3, the text2cypher could be helpful but incomplete for the retrieval, and vice visa, thus options to combine the two could be one strategy to go with on KG. Also now all comments are addressed! Thanks @logan-markewich !!! For rendered notebook, see here Also, I put a demo video in the notebook, too(not rendered in colab). demo_text2cypher_vs_graph_rag.mp4 |
Now supports all 3.x NebulaGraph
- also split _retrive() from kg query engine(nl2graphquery) - now we could do graph RAG on existing KGs, too
c011718
to
b2a06c3
Compare
@@ -16,11 +16,17 @@ | |||
logger = logging.getLogger(__name__) | |||
|
|||
|
|||
rel_query = Template( | |||
rel_query_0 = Template( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we give these a more descriptive name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do so :)
Thanks for the updates @wey-gu !! I just made a bit of a change to move remove the RAG query engine, and just move the retriever to be used in a I updated the notebooks to confirm they still work as well, but you may need to update the screenshot of the streamline app :) After that, I think it's basically good to merge! |
Now it's elegant :) It should have been settled in pure retrievers! The demo video could be as-is as it was created from the kg_index --> as query engine, but generationally enough to compare the two(text2cyper vs graph rag). |
to address suggestions from logan
Thanks for the amazing work on this @wey-gu 💪💪 |
Thanks a ton @logan-markewich for your amazing work and support 🤩 |
Thanks a ton guys. Really appreciate the updates done in this thread. |
@wey-gu
|
Wow, good to know it helps @BleakStone ! Sorry, we made refactor to not introduce a query engine but only a retriever(as it is only a retriever), please instead follow the final version of the notebook here instead. |
Description
The shape now is RFC @logan-markewich @jerryjliu , in case good to go I'll impl. the async functions, too.
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist: