New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[main < T225] Add LLM util #225
Conversation
@katarinasupe check https://www.notion.so/memgraph/Workflow-e99f310ca5ec463a95a3b5594c04aac6 for workflow on naming of branches. In your case |
can we get any information by reusing the results from meta_util.schema? or we need to do the process of extracting the schema again manually? |
@Josipmrden I could get some info from meta util. My first approach was to edit meta_util and see what I can do, but that took more time than implementing a new module. Meta_util is also counting properties and how many nodes have those properties and that is an overkill for this module. To call meta_util from this module is, in my opinion, also an overkill. To get properties that exist, counts need to be done too. |
I updated the module, restructured a code a bit and applied @Josipmrden and Brett's suggestions. Here are the screenshots of new usage. Here are my comments and I also refer to Brett's review here:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments, call back for approve
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, few small changes from me
Thank you @brettdbrewer for the comments, I will check the ones on Discord too and copy anything useful to reply here.
|
@brettdbrewer raised concerns regarding the capitalization of properties. Once we discuss this, |
I don't think tolower() will be needed as the issue is with property values, not property/node/relationship names. Said another way, the issue isn't with the schema definition, it is in the Cypher query generated by the LLM. E.g. for the query "Who was killed by magic?" in the GoT dataset, the LLM would need to create the following Cypher query "MATCH (killer:Character)-[k:KILLED]->(victim:Character) WHERE toLower(k.method) = 'magic' RETURN victim.name" to get the right answer since "Magic" is capitalized in the dataset for that relationship property. I think we can divorce this issue from the schema definition and address it as a solution problem any developer will have to (potentially) solve in their solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTG
I added a 'fix' for multiple labels according to @brettdbrewer's suggestion. Here is what it looks like for the above example now: Besides that, I restructured the code a bit to follow the MAGE codebase and improved the docstring according to the documentation. I will update the documentation based on these changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment
Description
A module that contains procedures describing graphs in a format best suited for large language models (LLMs).
Example usage:
Get raw graph schema:
CALL llm_util.schema('raw') YIELD schema RETURN schema;
Get prompt-ready graph schema:
CALL llm_util.schema() YIELD schema RETURN schema;
or
CALL llm_util.schema('prompt_ready') YIELD schema RETURN schema;
TODO:
Changelog message:
Now you can generate graph schema in a format best suited for large language models (LLMs).
Pull request type
######################################
Reviewer checklist (the reviewer checks this part)
Module/Algorithm
######################################