[main < T225] Add LLM util #225

katarinasupe · 2023-06-13T17:57:46Z

Description

A module that contains procedures describing graphs in a format best suited for large language models (LLMs).

Example usage:
Get raw graph schema:
CALL llm_util.schema('raw') YIELD schema RETURN schema;
Get prompt-ready graph schema:
CALL llm_util.schema() YIELD schema RETURN schema;
or
CALL llm_util.schema('prompt_ready') YIELD schema RETURN schema;

TODO:

Update MAGE README

Changelog message:
Now you can generate graph schema in a format best suited for large language models (LLMs).

Pull request type

Algorithm/Module

######################################

Reviewer checklist (the reviewer checks this part)

Module/Algorithm

######################################

Josipmrden · 2023-06-14T11:09:09Z

@katarinasupe check https://www.notion.so/memgraph/Workflow-e99f310ca5ec463a95a3b5594c04aac6 for workflow on naming of branches.

In your case [main < T225] Add LLM util

python/llm_util.py

Josipmrden · 2023-06-14T11:43:26Z

can we get any information by reusing the results from meta_util.schema? or we need to do the process of extracting the schema again manually?

katarinasupe · 2023-06-19T07:57:31Z

@Josipmrden I could get some info from meta util. My first approach was to edit meta_util and see what I can do, but that took more time than implementing a new module. Meta_util is also counting properties and how many nodes have those properties and that is an overkill for this module. To call meta_util from this module is, in my opinion, also an overkill. To get properties that exist, counts need to be done too.

katarinasupe · 2023-06-19T11:33:30Z

I updated the module, restructured a code a bit and applied @Josipmrden and Brett's suggestions. Here are the screenshots of new usage.

Here are my comments and I also refer to Brett's review here:

I added raw and prompt-ready as two possible llm schema options. Using raw, users can get all they need to create their version of the prompt-ready schema. I hope this answers versioning mechanism for update or you had something else in mind? We can add arguments to input parts of the output string, but if the user can easily concatenate those strings to the raw output, then I am not sure if it makes sense? But, I do get your point when it comes to testing. Maybe some specific parts of the prompt-ready string should be set as arguments?
The id property came from the dataset; hence it is something user-defined, not an internal id. Do you think we should exclude any id property from all datasets?
I changed the usage. Let me know if it's better:
Get raw graph schema:
CALL llm_util.schema('raw') YIELD schema RETURN schema;
Get prompt-ready graph schema:
CALL llm_util.schema() YIELD schema RETURN schema;
or
CALL llm_util.schema('prompt_ready') YIELD schema RETURN schema;
I will have to test with multiple labels. For now, if node has multiple labels, for example, Person and Student, its label is :Person:Student and hence, there is a label in the dataset :Person:Student. If there is any other node in the dataset with only :Person label, then we will have also that kind of node. It will be something like this:
Node name: Person:Student
Node name: Person
I need to check the exact output, put let me know what are your suggestions for multiple labels. For relationships, we would have [:Person]-[:FRIENDS_WITH]->(:Person:Student), [:Person]-[:FRIENDS_WITH]->(:Person) if there is at least one relationship of both types.
Nodes can have multiple labels, and relationships can have only one type.
I changed type to rel in code. I agree it can be confusing.

katarinasupe · 2023-06-19T12:06:55Z

I tested it on Europe gas pipelines dataset from Memgraph Lab (which does not have a pretty schema).

Here is the output:

Node properties are the following:
Node name: 'NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}]
Node name: 'BORDER_POINT:INTER_CONNECTION_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}]
Node name: 'NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'int'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'int'}]
Node name: 'COMPRESSOR:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'NODE:PRODUCTION', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_production_M_m3_per_d', 'type': 'float'}]
Node name: 'ENTRY_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}]
Node name: 'BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'int'}]
Node name: 'BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'int'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'int'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'COMPRESSOR:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'COMPRESSOR:LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'float'}]
Node name: 'BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}]
Node name: 'BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}]
Node name: 'BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'int'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'int'}]
Node name: 'LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}]
Node name: 'ENTRY_POINT:NODE:STORAGE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_cap_pipe2store_M_m3_per_d', 'type': 'float'}, {'property': 'max_power_MW', 'type': 'float'}, {'property': 'min_storage_pressure_bar', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}, {'property': 'max_storage_pressure_bar', 'type': 'float'}]
Node name: 'ENTRY_POINT:LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}]
Node name: 'BORDER_POINT:INTER_CONNECTION_POINT:LNG:NODE', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'country_code', 'type': 'str'}, {'property': 'node_id', 'type': 'str'}, {'property': 'lat', 'type': 'float'}, {'property': 'lng', 'type': 'float'}, {'property': 'elevation_m', 'type': 'int'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'pipe_name', 'type': 'str'}, {'property': 'max_cap_store2pipe_M_m3_per_d', 'type': 'float'}, {'property': 'max_workingGas_M_m3', 'type': 'float'}]

Relationship properties are the following:
Relationship Name: 'PIPE', Relationship Properties: [{'property': 'name', 'type': 'str'}, {'property': 'operator_name', 'type': 'str'}, {'property': 'start_year', 'type': 'int'}, {'property': 'end_year', 'type': 'int'}, {'property': 'max_cap_M_m3_per_d', 'type': 'float'}, {'property': 'max_pressure_bar', 'type': 'int'}, {'property': 'pipe_id', 'type': 'str'}, {'property': 'diameter_mm', 'type': 'int'}, {'property': 'length_km', 'type': 'float'}, {'property': 'num_compressor', 'type': 'int'}]

The relationships are the following:
['(:NODE)-[:PIPE]->(:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:NODE:PRODUCTION)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:NODE:PRODUCTION)']
['(:NODE:PRODUCTION)-[:PIPE]->(:NODE)']
['(:NODE)-[:PIPE]->(:ENTRY_POINT:NODE)']
['(:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:ENTRY_POINT:NODE)-[:PIPE]->(:NODE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:LNG:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:ENTRY_POINT:LNG:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:LNG:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:ENTRY_POINT:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:COMPRESSOR:LNG:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:NODE:STORAGE)']
['(:COMPRESSOR:LNG:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:ENTRY_POINT:NODE:STORAGE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:COMPRESSOR:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:NODE:STORAGE)']
['(:COMPRESSOR:NODE)-[:PIPE]->(:ENTRY_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:LNG:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE)-[:PIPE]->(:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE)-[:PIPE]->(:ENTRY_POINT:NODE:STORAGE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:LNG:NODE)-[:PIPE]->(:NODE)']
['(:COMPRESSOR:LNG:NODE)-[:PIPE]->(:NODE)']
['(:LNG:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)']
['(:LNG:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:LNG:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:LNG:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)-[:PIPE]->(:NODE:STORAGE)']
['(:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:NODE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:LNG:NODE)']
['(:ENTRY_POINT:NODE:STORAGE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:ENTRY_POINT:NODE)-[:PIPE]->(:NODE:STORAGE)']
['(:ENTRY_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:ENTRY_POINT:NODE:STORAGE)']
['(:BORDER_POINT:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:COMPRESSOR:NODE:STORAGE)']
['(:NODE:STORAGE)-[:PIPE]->(:LNG:NODE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE)']
['(:LNG:NODE)-[:PIPE]->(:LNG:NODE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:LNG:NODE)-[:PIPE]->(:COMPRESSOR:NODE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:NODE)-[:PIPE]->(:BORDER_POINT:COMPRESSOR:ENTRY_POINT:INTER_CONNECTION_POINT:NODE:STORAGE)']
['(:BORDER_POINT:INTER_CONNECTION_POINT:NODE)-[:PIPE]->(:LNG:NODE)']
['(:NODE)-[:PIPE]->(:LNG:NODE)']

Josipmrden

small comments, call back for approve

python/llm_util.py

brettdbrewer · 2023-06-19T23:19:24Z

yes, I think having the 'raw" output can address my versioning question. A developer can always fall back to it and parse it how they want. And that is certainly easier than maintaining multiple 'prompt-ready' versions. My concern was simply around how fluid this space is, so making sure any API can adapt with that change.
Re id properties: If they are user defined, then I assume you'd want to retain them in this schema as a user's natural language query may reference id's specifically in that use case.
I like using the function param to determine raw or prompt-ready
Regarding nodes having multiple labels: I'd want to test some variations so see which performs better on complex queries. I assume you'd want the Node Name to be representative as possible to how it would be used in a Cypher query. Btw, it appears that sometimes the parent is first and sometimes it is second? e.g. NODE:STORAGE, COMPRESSOR:NODE. Am I just reading it wrong?
Regarding nodes with multiple labels in relationship strings: yes, I think we'll need to be explicit in having each of the permutations (because you don't know how a user will query it). I assume each would be its own row in the Relationships section of the prompt-ready string.
Because this is such a new area, I highly recommend having a set of unit tests that test validity of LLM output across multiple LLMs and multiple databases for the prompt-ready string format. It will also be highly informative imho as I have constantly found edge cases and extreme quality diffs between LLMs and their versions.

python/llm_util.py

antoniofilipovic

Looks good, few small changes from me

katarinasupe · 2023-06-21T09:08:34Z

Thank you @brettdbrewer for the comments, I will check the ones on Discord too and copy anything useful to reply here.

There is no parent connection regarding labels on nodes, again just a user defined thing - they defined label NODE:STORAGE or COMPRESSOR:NODE and that is how it was stored. This is a user's error because it would make more sense to have it logically in the right order.
I will add tests definitely, will ask you more about that if needed, since I am not that familiar with different LLMs

katarinasupe · 2023-06-21T09:27:56Z

@brettdbrewer raised concerns regarding the capitalization of properties. Once we discuss this, tolower() might be added to all string properties.

brettdbrewer · 2023-06-21T16:30:07Z

I don't think tolower() will be needed as the issue is with property values, not property/node/relationship names. Said another way, the issue isn't with the schema definition, it is in the Cypher query generated by the LLM. E.g. for the query "Who was killed by magic?" in the GoT dataset, the LLM would need to create the following Cypher query "MATCH (killer:Character)-[k:KILLED]->(victim:Character) WHERE toLower(k.method) = 'magic' RETURN victim.name" to get the right answer since "Magic" is capitalized in the dataset for that relationship property. I think we can divorce this issue from the schema definition and address it as a solution problem any developer will have to (potentially) solve in their solution.

antoniofilipovic

LGTG

katarinasupe · 2023-06-27T12:42:38Z

I added a 'fix' for multiple labels according to @brettdbrewer's suggestion. Here is what it looks like for the above example now:

Besides that, I restructured the code a bit to follow the MAGE codebase and improved the docstring according to the documentation. I will update the documentation based on these changes.

Josipmrden

One small comment

e2e/llm_util_test/test_schema_prompt_ready_with_relationship_props/test.yml

e2e/llm_util_test/test_schema_prompt_ready_without_relationship_props/test.yml

Add llm util v1

583ed5f

katarinasupe added the status: draft PR is in draft phase label Jun 13, 2023

katarinasupe self-assigned this Jun 13, 2023

Josipmrden requested changes Jun 14, 2023

View reviewed changes

katarinasupe changed the title ~~Add llm util~~ [main < T225] Add LLM util Jun 19, 2023

Update llm util

0c153a5

katarinasupe added status: ready PR is ready for review and removed status: draft PR is in draft phase labels Jun 19, 2023

katarinasupe requested a review from Josipmrden June 19, 2023 11:34

katarinasupe added lang: python Issue on Python codebase type: module labels Jun 19, 2023

katarinasupe marked this pull request as ready for review June 19, 2023 12:09

Josipmrden reviewed Jun 19, 2023

View reviewed changes

python/llm_util.py Outdated Show resolved Hide resolved

python/llm_util.py Outdated Show resolved Hide resolved

python/llm_util.py Outdated Show resolved Hide resolved

Mrma's review

0c59c0c

katarinasupe requested a review from Josipmrden June 19, 2023 14:55

antoniofilipovic reviewed Jun 20, 2023

View reviewed changes

python/llm_util.py Outdated Show resolved Hide resolved

antoniofilipovic reviewed Jun 20, 2023

View reviewed changes

python/llm_util.py Outdated Show resolved Hide resolved

antoniofilipovic reviewed Jun 20, 2023

View reviewed changes

python/llm_util.py Show resolved Hide resolved

antoniofilipovic requested changes Jun 20, 2023

View reviewed changes

Fico's review; add tests

722e932

katarinasupe added this to the 1.8.0 milestone Jun 26, 2023

antoniofilipovic approved these changes Jun 26, 2023

View reviewed changes

antoniofilipovic added In progress and removed status: ready PR is ready for review labels Jun 26, 2023

katarinasupe mentioned this pull request Jun 26, 2023

Add LLM util docs memgraph/docs#939

Merged

6 tasks

Add update for multiple labels + fix tests

3cc1d9c

katarinasupe added 3 commits June 27, 2023 14:50

Merge branch 'main' into add-llm-util

a1e00ca

Update README

ed933f1

Update readme

7d195ed

Josipmrden reviewed Jun 28, 2023

View reviewed changes

e2e/llm_util_test/test_schema_prompt_ready_with_relationship_props/test.yml Show resolved Hide resolved

e2e/llm_util_test/test_schema_prompt_ready_without_relationship_props/test.yml Outdated Show resolved Hide resolved

Remove extra empty lines in tests

e84e37d

Josipmrden approved these changes Jun 29, 2023

View reviewed changes

antoniofilipovic merged commit 2fe7896 into main Jun 29, 2023
4 checks passed

antoniofilipovic deleted the add-llm-util branch June 29, 2023 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[main < T225] Add LLM util #225

[main < T225] Add LLM util #225

katarinasupe commented Jun 13, 2023 •

edited

Josipmrden commented Jun 14, 2023

Josipmrden commented Jun 14, 2023

katarinasupe commented Jun 19, 2023

katarinasupe commented Jun 19, 2023 •

edited

katarinasupe commented Jun 19, 2023

Josipmrden left a comment

brettdbrewer commented Jun 19, 2023

antoniofilipovic left a comment

katarinasupe commented Jun 21, 2023

katarinasupe commented Jun 21, 2023

brettdbrewer commented Jun 21, 2023

antoniofilipovic left a comment

katarinasupe commented Jun 27, 2023

Josipmrden left a comment

[main < T225] Add LLM util #225

[main < T225] Add LLM util #225

Conversation

katarinasupe commented Jun 13, 2023 • edited

Description

Pull request type

Reviewer checklist (the reviewer checks this part)

Module/Algorithm

Josipmrden commented Jun 14, 2023

Josipmrden commented Jun 14, 2023

katarinasupe commented Jun 19, 2023

katarinasupe commented Jun 19, 2023 • edited

katarinasupe commented Jun 19, 2023

Josipmrden left a comment

Choose a reason for hiding this comment

brettdbrewer commented Jun 19, 2023

antoniofilipovic left a comment

Choose a reason for hiding this comment

katarinasupe commented Jun 21, 2023

katarinasupe commented Jun 21, 2023

brettdbrewer commented Jun 21, 2023

antoniofilipovic left a comment

Choose a reason for hiding this comment

katarinasupe commented Jun 27, 2023

Josipmrden left a comment

Choose a reason for hiding this comment

katarinasupe commented Jun 13, 2023 •

edited

katarinasupe commented Jun 19, 2023 •

edited