streamNodeProperty() doesn't work with gds.run_cypher() as I guess #179

MOSSupport · 2022-09-16T05:31:15Z

graphdatascience 1.3

I tried a query like this:

query = f'''
   call gds.graph.streamNodeProperty(
      'xxx',
      'xxxx',
      ['xxxxx']
   )
yield nodeId as id, propertyValue as degree
return id, degree limit 100
...
result = gds.run_cypher(query)

=> KeyError: 'graph_name'

I figured out to make it work like this:

query = f'''
   ...
'''
params = {
   'graph_name': 'xxx',
   'properties': 'xxxx',
   'entities'" ['xxxxx'],
   'config': ''
}
result = gds.run_cypher(query, params)

=> No error, but it returned all rows(not limited to 100) as nodeId and propertyValue(not renamed as id and degree)

Other cypher queries works with gds.run_cypher(query) as expected.

The text was updated successfully, but these errors were encountered:

FlorentinD · 2022-09-16T09:27:02Z

Hello @MOSSupport ,
I am trying to reproduce your error.

To clarify, are you trying to only return the rows with a degree >= 100 or only show the first 100 rows?

MOSSupport · 2022-09-17T06:53:45Z

Hi,
I just tried to reduce the number of rows in the result for test purpose.
Sorry I cannot copy the whole error msgs or the code since it was tested in a closed environment(by security).

Thanks.

FlorentinD · 2022-09-17T07:02:48Z

For further help, we need the GDS and Neo4j version you were running the queries against.

When I tried it in my test environment, I could get the expected result.

MOSSupport · 2022-09-17T08:18:16Z

Neo4j 4.4.8 Enterprise image(Debian 11, openjdk 11.0.15) from docker hub, I'm running it on RHEL 7.9 machine.
GDS 2.1.9

FlorentinD · 2022-09-20T08:54:37Z

Hello,
I used the same versions as you, but I still see the rename of the property as well as the limit to only 100 rows.
Could you try running a similar query with our example notebook (https://github.com/neo4j/graph-data-science-client/blob/main/examples/load-data-via-graph-construction.ipynb)?

The query I used based on your description was
gds.run_cypher("CALL gds.graph.streamNodeProperty($graph_name, $property, $nodeLabels) YIELD nodeId AS id, propertyValue AS degree RETURN id, degree LIMIT 10", {"graph_name": G.name(), "property": "subject", "nodeLabels": ["Paper"]})

MOSSupport · 2022-09-20T12:26:57Z

I tried the code you typed above. But still the same "KeyError" returned as I explained above.
And I found one more thing:
When I use "result = gds.graph.streamNodeProperty(G, 'xxx', ['xxxx'])". I got the dfferenct result from the result I got when I run streamNodeProperty in Neo4j browser. For example, I got nodeId of the correct label in Neo4j browser but the result in python(I use pycharm) is from a different label even though it(python client version) returned the result without error.

FlorentinD · 2022-09-20T12:43:08Z

Looks like I misunderstood you, I though you made it work afterwards (written at the end of your first description).

Can you share the exact (anonymized) browser query and python client equivalent?
Also can you share the stacktrace? For the first part, Key error: graph_name does not make sense as there is not graph_name inside the query.

If you are using a limit, you might also want to use an ORDER BY nodeId to compare the version between neo4j desktop and python client.

MOSSupport · 2022-09-21T07:37:18Z

The original query is the code in my first post. Its error message was like this(Beware typo since I am typing it):
Traceback (most recent call last):
File "/usr/local/lib/python3.8/code.py", line 90 in runcode
exec(code, self.locals)
File "", line 10, in
File "/app1/pycharm-code/venv/lib/python3.8/site-packages/graphdatascience/graph_data_science.py", line 134, in run_cypher
return self._query_runner.run_query(query, params)
File "/app1/pycharm-code/venv/lib/python3.8/site-packages/graphdatascience/query_runner/arrow_query_runner.py", line 57, in run_query
graph_name = params["graph_name"]
KeyError: 'graph_name'

Yes, there is no 'graph_name' in my code. But it appeared in its error message above.

MOSSupport · 2022-09-21T07:47:34Z

So I changed the first code according to the error messages like the 2nd code. It is processed without the error above. And its result is returned by label(3rd parameter) but their nodeId values are not the same ones with the node ids of the original nodes of the graph(stored on the storage).

FlorentinD · 2022-09-21T08:41:44Z

Ah, you are using GDS enterprise with Arrow enabled that explains the difference!

I see the error now as well and will update you once we fixed the issue!

As a temporary workaround I can only think of not using gds.run_query but instead gds.graph.streamProperty and filter the result afterwards with pandas.

MOSSupport · 2022-09-21T08:56:55Z

But I have to develop this process as python application now. How can I get the actual node properties like 'name' strings? As I told above, the node ids are not correct from gds.graph.streamProperty in python code. I cannot retrieve the correct properties since the ids are not correctly returned.

FlorentinD · 2022-09-21T09:14:56Z

I would suggest to use pandas to transform the result for now. As mentioned above, for now you need to use a workaround.

import pandas as pd

G = G.graph.get("xxx");
result = gds.graph.streamNodeProperty(G, "myProperty", "my_label") # this is a pandas df
result.rename(columns = {"nodeId": "id", "propertyValue": "degree"}, inplace=True)
result = result.iloc[:100,:]

display(result)

Hope this helps you as a temporary solution.

MOSSupport · 2022-09-21T10:27:06Z

You see my 4th posting. I already tried like that but I got the strange nodeIds. I cannot find the actual nodes in the original graph by matching with those nodeId retuned. They looks not the correct ids, so now I cannot use them in the development.

FlorentinD · 2022-09-21T13:25:16Z

Ok, unfortunately there is another bug on the server side.
Thank you for pointing this out!
We could find a fix and I will update you when we published a new version.

The only workaround, right now is to disable arrow in the server settings until our next release for your use-case.

FlorentinD · 2022-09-21T13:48:56Z

Another idea is to build this library using the version on #186.
(If you cant disable arrow on the server or wait for the next release)

MOSSupport · 2022-09-22T01:23:44Z

It didn't work even after arrow disabled for my test. Currently I use bolt driver to bring the result of the cypher of streamNodeProperty in my python code. Other functions are developed with the python client since they are working except streamNodeProperty.

FlorentinD · 2022-09-22T07:25:03Z

Sad to hear the workaround does not work for you.
When you are using the bolt driver even with arrow enabled it should work as we wont go through arrow in this case.

Without Arrow I could not get a test to fail. Did you make sure to restart the server after disabling arrow?
Couldnt you be more specific what didnt work after you disabled arrow?

MOSSupport · 2022-09-22T07:55:29Z

Ah, the arrow must not be disabled at that time. Now I got the correct ids after I verified the arrow is disabled by dbms.listConfig.

MOSSupport · 2022-09-22T08:19:21Z

But it takes too long time for a big data:
df_news = gds.graph.streamNodeProperty(
G,
'dimension12',
['News']
)
news_ids = df_news[0:10]['nodeId'].to_list()

It took 117 secs with 2M News nodes. It's not practical to use in the development without arrow.

FlorentinD · 2022-09-22T10:13:53Z

With arrow disabled you can use your original query and only return the first 10 elements in the cypher query through run_cypher. This should be practical in development even without arrow.

For the fix using arrow, you need to wait until we release 2.1.13, which we plan to release next Thursday.

MOSSupport · 2022-09-22T10:29:21Z

Yes, run_cypher works with no keyError after arrow disabled. The slice of 10 records is just to check the code. My purpose is to get all the vectors and then calculated them with numpy or scikit to find the most similar nodes. I tried Filtered KNN(alpha) but terminated in the middle of running since it took so long time with the big data. So I have to wait the next release.
Thanks.
Dongho.

FlorentinD · 2022-09-22T11:51:12Z

Thats helpful feedback.
Can you describe your use-case, such as how large is set the of sourceNodes/targetNodes, which configuration did you try out?
Also what is your desired return time?

MOSSupport · 2022-09-23T05:53:44Z

It's a quick PoC project to demonstrate Neo4j and GDSL capabilities. It has no specific requirement to meet.
The graph was buit with 2.4M news. It has News nodes and the Word or Entity(multi token) nodes generated from the news. It is projected of 77M nodes and 800M relationships.
Some codes can be made as a batch process but some cases need a real time responce. Any case the faster the better. I have to finish this demo at the end of this month or so.

FlorentinD · 2022-09-23T08:24:18Z

Thanks for the details.
The current implementation filters as a post-processing, so its good feedback that this is not fast enough for your scenario.

I would like to understand a bit more about your filtering.
For the 77M nodes, what kind of filter are you trying to apply?
Do you want to find the nearest neighbor inside a small subset of nodes?

MOSSupport · 2022-09-23T09:53:30Z

Yes, it's the core issue of the demo. Customer explained what kind of news they want to find in a set of hundreds words. They can be a source nodes. GDSL has several algos with source node or nodes in their parameter like bfs. But their approaches are almost to find News having the words or entities. They are not so different with the result of the exact matching search.
I tried fastRP to improve the result with the vector. But the result of the most similar nodes is so much noisy(having not correct result).
I am working to improve fastRP result or to make a scoring process to select the better news nodes.
It's competing with a Bert based approach(with GPU) by other team.

FlorentinD · 2022-09-30T07:47:45Z

Thanks for the detailed response. We will consider your feedback in our future planning!

As a side comment, FastRP is also an algorithm where you need to tune the iterationWeights so noisy results could be due the embeddings not being good enough.

@MOSSupport 2.1.13 is released now with the bug fix for Arrow.
For the client we are also planning a release, but had to work on some issues around our CI first.

FlorentinD · 2022-10-10T13:00:20Z

@MOSSupport we also released a new client version now, so with updating your versions, it should work now.

FlorentinD mentioned this issue Sep 21, 2022

Investigate arrow_query_runner bug #183

Closed

FlorentinD mentioned this issue Sep 21, 2022

Investigate filtered stream properties with arrow problem #185

Closed

FlorentinD mentioned this issue Sep 21, 2022

Do not use the Arrow query runner for run_cypher #186

Merged

2 tasks

FlorentinD closed this as completed Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streamNodeProperty() doesn't work with gds.run_cypher() as I guess #179

streamNodeProperty() doesn't work with gds.run_cypher() as I guess #179

MOSSupport commented Sep 16, 2022 •

edited by Mats-SX

FlorentinD commented Sep 16, 2022

MOSSupport commented Sep 17, 2022 •

edited

FlorentinD commented Sep 17, 2022

MOSSupport commented Sep 17, 2022 •

edited

FlorentinD commented Sep 20, 2022

MOSSupport commented Sep 20, 2022

FlorentinD commented Sep 20, 2022

MOSSupport commented Sep 21, 2022 •

edited

MOSSupport commented Sep 21, 2022 •

edited

FlorentinD commented Sep 21, 2022

MOSSupport commented Sep 21, 2022

FlorentinD commented Sep 21, 2022

MOSSupport commented Sep 21, 2022 •

edited

FlorentinD commented Sep 21, 2022

FlorentinD commented Sep 21, 2022

MOSSupport commented Sep 22, 2022

FlorentinD commented Sep 22, 2022

MOSSupport commented Sep 22, 2022

MOSSupport commented Sep 22, 2022

FlorentinD commented Sep 22, 2022

MOSSupport commented Sep 22, 2022

FlorentinD commented Sep 22, 2022 •

edited

MOSSupport commented Sep 23, 2022 •

edited

FlorentinD commented Sep 23, 2022

MOSSupport commented Sep 23, 2022 •

edited

FlorentinD commented Sep 30, 2022

FlorentinD commented Oct 10, 2022

streamNodeProperty() doesn't work with gds.run_cypher() as I guess #179

streamNodeProperty() doesn't work with gds.run_cypher() as I guess #179

Comments

MOSSupport commented Sep 16, 2022 • edited by Mats-SX

FlorentinD commented Sep 16, 2022

MOSSupport commented Sep 17, 2022 • edited

FlorentinD commented Sep 17, 2022

MOSSupport commented Sep 17, 2022 • edited

FlorentinD commented Sep 20, 2022

MOSSupport commented Sep 20, 2022

FlorentinD commented Sep 20, 2022

MOSSupport commented Sep 21, 2022 • edited

MOSSupport commented Sep 21, 2022 • edited

FlorentinD commented Sep 21, 2022

MOSSupport commented Sep 21, 2022

FlorentinD commented Sep 21, 2022

MOSSupport commented Sep 21, 2022 • edited

FlorentinD commented Sep 21, 2022

FlorentinD commented Sep 21, 2022

MOSSupport commented Sep 22, 2022

FlorentinD commented Sep 22, 2022

MOSSupport commented Sep 22, 2022

MOSSupport commented Sep 22, 2022

FlorentinD commented Sep 22, 2022

MOSSupport commented Sep 22, 2022

FlorentinD commented Sep 22, 2022 • edited

MOSSupport commented Sep 23, 2022 • edited

FlorentinD commented Sep 23, 2022

MOSSupport commented Sep 23, 2022 • edited

FlorentinD commented Sep 30, 2022

FlorentinD commented Oct 10, 2022

MOSSupport commented Sep 16, 2022 •

edited by Mats-SX

MOSSupport commented Sep 17, 2022 •

edited

MOSSupport commented Sep 17, 2022 •

edited

MOSSupport commented Sep 21, 2022 •

edited

MOSSupport commented Sep 21, 2022 •

edited

MOSSupport commented Sep 21, 2022 •

edited

FlorentinD commented Sep 22, 2022 •

edited

MOSSupport commented Sep 23, 2022 •

edited

MOSSupport commented Sep 23, 2022 •

edited