Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[main < T225] Add LLM util #225

Merged
merged 9 commits into from Jun 29, 2023
Merged

[main < T225] Add LLM util #225

merged 9 commits into from Jun 29, 2023

Conversation

katarinasupe
Copy link
Contributor

@katarinasupe katarinasupe commented Jun 13, 2023

Description

A module that contains procedures describing graphs in a format best suited for large language models (LLMs).

Example usage:
Get raw graph schema:
CALL llm_util.schema('raw') YIELD schema RETURN schema;
Get prompt-ready graph schema:
CALL llm_util.schema() YIELD schema RETURN schema;
or
CALL llm_util.schema('prompt_ready') YIELD schema RETURN schema;

TODO:

  • Update MAGE README

Changelog message:
Now you can generate graph schema in a format best suited for large language models (LLMs).

Pull request type

  • Algorithm/Module

######################################

Reviewer checklist (the reviewer checks this part)

Module/Algorithm

  • Core algorithm/module implementation
  • Query module implementation
  • Unit tests
  • End-to-end tests
  • Code documentation
  • README short description
  • Documentation on memgraph/docs

######################################

@katarinasupe katarinasupe added the status: draft PR is in draft phase label Jun 13, 2023
@katarinasupe katarinasupe self-assigned this Jun 13, 2023
@Josipmrden
Copy link
Collaborator

@katarinasupe check https://www.notion.so/memgraph/Workflow-e99f310ca5ec463a95a3b5594c04aac6 for workflow on naming of branches.

In your case [main < T225] Add LLM util

python/llm_util.py Outdated Show resolved Hide resolved
python/llm_util.py Outdated Show resolved Hide resolved
python/llm_util.py Outdated Show resolved Hide resolved
python/llm_util.py Outdated Show resolved Hide resolved
python/llm_util.py Outdated Show resolved Hide resolved
python/llm_util.py Outdated Show resolved Hide resolved
python/llm_util.py Outdated Show resolved Hide resolved
@Josipmrden
Copy link
Collaborator

can we get any information by reusing the results from meta_util.schema? or we need to do the process of extracting the schema again manually?

@katarinasupe
Copy link
Contributor Author

@Josipmrden I could get some info from meta util. My first approach was to edit meta_util and see what I can do, but that took more time than implementing a new module. Meta_util is also counting properties and how many nodes have those properties and that is an overkill for this module. To call meta_util from this module is, in my opinion, also an overkill. To get properties that exist, counts need to be done too.

@katarinasupe katarinasupe changed the title Add llm util [main < T225] Add LLM util Jun 19, 2023
@katarinasupe
Copy link
Contributor Author

katarinasupe commented Jun 19, 2023

I updated the module, restructured a code a bit and applied @Josipmrden and Brett's suggestions. Here are the screenshots of new usage.
Screenshot 2023-06-19 at 13 22 09
Screenshot 2023-06-19 at 13 22 24
Screenshot 2023-06-19 at 13 22 40

Here are my comments and I also refer to Brett's review here:

  • I added raw and prompt-ready as two possible llm schema options. Using raw, users can get all they need to create their version of the prompt-ready schema. I hope this answers versioning mechanism for update or you had something else in mind? We can add arguments to input parts of the output string, but if the user can easily concatenate those strings to the raw output, then I am not sure if it makes sense? But, I do get your point when it comes to testing. Maybe some specific parts of the prompt-ready string should be set as arguments?
  • The id property came from the dataset; hence it is something user-defined, not an internal id. Do you think we should exclude any id property from all datasets?
  • I changed the usage. Let me know if it's better:
    Get raw graph schema:
    CALL llm_util.schema('raw') YIELD schema RETURN schema;
    Get prompt-ready graph schema:
    CALL llm_util.schema() YIELD schema RETURN schema;
    or
    CALL llm_util.schema('prompt_ready') YIELD schema RETURN schema;
  • I will have to test with multiple labels. For now, if node has multiple labels, for example, Person and Student, its label is :Person:Student and hence, there is a label in the dataset :Person:Student. If there is any other node in the dataset with only :Person label, then we will have also that kind of node. It will be something like this:
    Node name: Person:Student
    Node name: Person
    I need to check the exact output, put let me know what are your suggestions for multiple labels. For relationships, we would have [:Person]-[:FRIENDS_WITH]->(:Person:Student), [:Person]-[:FRIENDS_WITH]->(:Person) if there is at least one relationship of both types.
  • Nodes can have multiple labels, and relationships can have only one type.
  • I changed type to rel in code. I agree it can be confusing.

@katarinasupe katarinasupe added status: ready PR is ready for review and removed status: draft PR is in draft phase labels Jun 19, 2023
@katarinasupe katarinasupe added lang: python Issue on Python codebase type: module labels Jun 19, 2023
@katarinasupe
Copy link
Contributor Author

I tested it on Europe gas pipelines dataset from Memgraph Lab (which does not have a pretty schema).
europe-gas-pipelines-scigrid-model

Here is the output:

Node properties are the following:
Node name: 'NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}]
Node name: 'BORDER_POINT:INTER_CONNECTION_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}]
Node name: 'NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'int'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'int'}]
Node name: 'COMPRESSOR:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'NODE:PRODUCTION', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_production_M_m3_per_d', 'type': 'float'}]
Node name: 'ENTRY_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}]
Node name: 'BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'int'}]
Node name: 'BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'int'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'int'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'COMPRESSOR:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'COMPRESSOR:LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}]
Node name: 'BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}]
Node name: 'BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'int'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'int'}]
Node name: 'LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}]
Node name: 'ENTRY_POINT:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'float'}]
Node name: 'ENTRY_POINT:LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}]
Node name: 'BORDER_POINT:INTER_CONNECTION_POINT:LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}]

Relationship properties are the following:
Relationship Name: 'PIPE', Relationship Properties: [{'property': 'name', 'type': 'str'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'int'}, {'property': 'pipe_id', 'type': 'str'}, {'property': 'diameter_mm', 'type': 'int'}, {'property': 'length_km', 'type': 'float'}, {'property': 'num_compressor', 'type': 'int'}]

The relationships are the following:
['(:NODE)-[:PIPE]->(:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:NODE:PRODUCTION)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:NODE:PRODUCTION)']
['(:NODE:PRODUCTION)-[:PIPE]->(:NODE)']
['(:NODE)-[:PIPE]->(:ENTRY_POINT:NODE)']
['(:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:ENTRY_POINT:NODE)-[:PIPE]->(:NODE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:LNG:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:ENTRY_POINT:LNG:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:LNG:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:ENTRY_POINT:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:COMPRESSOR:LNG:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:NODE:STORAGE)']
['(:COMPRESSOR:LNG:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:ENTRY_POINT:NODE:STORAGE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:NODE:STORAGE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:ENTRY_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:LNG:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE)-[:PIPE]->(:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE)-[:PIPE]->(:ENTRY_POINT:NODE:STORAGE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:LNG:NODE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:LNG:NODE)-[:PIPE]->(:NODE)']
['(:LNG:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:LNG:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:LNG:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:LNG:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:NODE:STORAGE)']
['(:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:ENTRY_POINT:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:ENTRY_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:ENTRY_POINT:NODE:STORAGE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:NODE:STORAGE)-[:PIPE]->(:LNG:NODE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:LNG:NODE)-[:PIPE]->(:LNG:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:LNG:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:LNG:NODE)']
['(:NODE)-[:PIPE]->(:LNG:NODE)']

@katarinasupe katarinasupe marked this pull request as ready for review June 19, 2023 12:09
Copy link
Collaborator

@Josipmrden Josipmrden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments, call back for approve

python/llm_util.py Outdated Show resolved Hide resolved
python/llm_util.py Outdated Show resolved Hide resolved
python/llm_util.py Outdated Show resolved Hide resolved
@brettdbrewer
Copy link

  • yes, I think having the 'raw" output can address my versioning question. A developer can always fall back to it and parse it how they want. And that is certainly easier than maintaining multiple 'prompt-ready' versions. My concern was simply around how fluid this space is, so making sure any API can adapt with that change.
  • Re id properties: If they are user defined, then I assume you'd want to retain them in this schema as a user's natural language query may reference id's specifically in that use case.
  • I like using the function param to determine raw or prompt-ready
  • Regarding nodes having multiple labels: I'd want to test some variations so see which performs better on complex queries. I assume you'd want the Node Name to be representative as possible to how it would be used in a Cypher query. Btw, it appears that sometimes the parent is first and sometimes it is second? e.g. NODE:STORAGE, COMPRESSOR:NODE. Am I just reading it wrong?
  • Regarding nodes with multiple labels in relationship strings: yes, I think we'll need to be explicit in having each of the permutations (because you don't know how a user will query it). I assume each would be its own row in the Relationships section of the prompt-ready string.
  • Because this is such a new area, I highly recommend having a set of unit tests that test validity of LLM output across multiple LLMs and multiple databases for the prompt-ready string format. It will also be highly informative imho as I have constantly found edge cases and extreme quality diffs between LLMs and their versions.

Copy link
Collaborator

@antoniofilipovic antoniofilipovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, few small changes from me

@katarinasupe
Copy link
Contributor Author

Thank you @brettdbrewer for the comments, I will check the ones on Discord too and copy anything useful to reply here.

  • There is no parent connection regarding labels on nodes, again just a user defined thing - they defined label NODE:STORAGE or COMPRESSOR:NODE and that is how it was stored. This is a user's error because it would make more sense to have it logically in the right order.
  • I will add tests definitely, will ask you more about that if needed, since I am not that familiar with different LLMs

@katarinasupe
Copy link
Contributor Author

@brettdbrewer raised concerns regarding the capitalization of properties. Once we discuss this, tolower() might be added to all string properties.

@brettdbrewer
Copy link

I don't think tolower() will be needed as the issue is with property values, not property/node/relationship names. Said another way, the issue isn't with the schema definition, it is in the Cypher query generated by the LLM. E.g. for the query "Who was killed by magic?" in the GoT dataset, the LLM would need to create the following Cypher query "MATCH (killer:Character)-[k:KILLED]->(victim:Character) WHERE toLower(k.method) = 'magic' RETURN victim.name" to get the right answer since "Magic" is capitalized in the dataset for that relationship property. I think we can divorce this issue from the schema definition and address it as a solution problem any developer will have to (potentially) solve in their solution.

@katarinasupe katarinasupe added this to the 1.8.0 milestone Jun 26, 2023
Copy link
Collaborator

@antoniofilipovic antoniofilipovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTG

@antoniofilipovic antoniofilipovic added In progress and removed status: ready PR is ready for review labels Jun 26, 2023
@katarinasupe katarinasupe mentioned this pull request Jun 26, 2023
6 tasks
@katarinasupe
Copy link
Contributor Author

I added a 'fix' for multiple labels according to @brettdbrewer's suggestion. Here is what it looks like for the above example now:
Screenshot 2023-06-27 at 13 59 56
Screenshot 2023-06-27 at 14 00 10
Screenshot 2023-06-27 at 14 00 16

Besides that, I restructured the code a bit to follow the MAGE codebase and improved the docstring according to the documentation. I will update the documentation based on these changes.

Copy link
Collaborator

@Josipmrden Josipmrden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment

@antoniofilipovic antoniofilipovic merged commit 2fe7896 into main Jun 29, 2023
4 checks passed
@antoniofilipovic antoniofilipovic deleted the add-llm-util branch June 29, 2023 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants