-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cypher query returning UnknownError: node [...] not connected to this relationship[...] #12268
Comments
@raduvanciu Can you find the complete stack trace for the failed query in the Also, are there any other transactions/queries running on the database at the same time? |
@chrisvest Thank you for your quick reply. There are no other transactions running at the same time. Here is the stack trace using version 3.5.5 Enterprise edition
|
@chrisvest Thank you again for looking into this. I have some additional info, which may help. The bug seems to be related to the fact that one of the source nodes (i.e., {geneId:54494}) has no outgoing edges of type ACTIVATION. It is however surprising that that the exception is not thrown for QUERY2, which includes the same source node. I found a workaround by rewriting the query. Filter first any source nodes where the shortest path cannot exist:
This query is much slower for a database with a lot of nodes, but the pre filtering can be done as a separate query. FYI, here are some stats with the database I am working with. I was not able to reproduce the issue on the small demo movie database. ID AllocationNode ID | 45917042 I tried to recreate the whole database from raw csv files, but the issue persists. Please let me know what do you think of this workaround and if you have additional insights. |
@raduvanciu Is it possible for you to attach the raw csv file here so that we can investigate? |
@sherfert I hope this helps, |
@raduvanciu An additional question, do you have any indexes/constraints when you run the query? Best regards Louise, Neo4j Cypher team |
There is one unique constraint and index on node Gene(geneId). For Query1, here is the plan as exposed by EXPLAIN, the planner attempts to use only one index. {
"statement": {
"text": "EXPLAIN WITH [54494,23710] AS x MATCH (g:Gene), p=shortestPath((g)-[:ACTIVATION*1..2]->(gg)) WHERE g.geneId in x AND NOT gg.geneId in x RETURN p",
"parameters": {}
},
"statementType": "r",
"counters": {
"_stats": {
"nodesCreated": 0,
"nodesDeleted": 0,
"relationshipsCreated": 0,
"relationshipsDeleted": 0,
"propertiesSet": 0,
"labelsAdded": 0,
"labelsRemoved": 0,
"indexesAdded": 0,
"indexesRemoved": 0,
"constraintsAdded": 0,
"constraintsRemoved": 0
}
},
"updateStatistics": {
"_stats": {
"nodesCreated": 0,
"nodesDeleted": 0,
"relationshipsCreated": 0,
"relationshipsDeleted": 0,
"propertiesSet": 0,
"labelsAdded": 0,
"labelsRemoved": 0,
"indexesAdded": 0,
"indexesRemoved": 0,
"constraintsAdded": 0,
"constraintsRemoved": 0
}
},
"plan": {
"operatorType": "ProduceResults",
"identifiers": [
"x",
" UNNAMED59",
"g",
"p",
"gg"
],
"arguments": {
"planner-impl": "IDP",
"planner-version": "3.5",
"runtime-version": "3.5",
"runtime": "SLOTTED",
"runtime-impl": "SLOTTED",
"version": "CYPHER 3.5",
"EstimatedRows": 55787407.63322778,
"planner": "COST"
},
"children": [
{
"operatorType": "Apply",
"identifiers": [
"x",
" UNNAMED59",
"g",
"p",
"gg"
],
"arguments": {
"EstimatedRows": 55787407.63322778
},
"children": [
{
"operatorType": "Projection",
"identifiers": [
"x"
],
"arguments": {
"EstimatedRows": 1,
"Expressions": "{x : $` AUTOLIST0`}"
},
"children": []
},
{
"operatorType": "ShortestPath",
"identifiers": [
"x",
" UNNAMED59",
"g",
"p",
"gg"
],
"arguments": {
"EstimatedRows": 55787407.63322778,
"Expressions": "{}"
},
"children": [
{
"operatorType": "CartesianProduct",
"identifiers": [
"g",
"gg",
"x"
],
"arguments": {
"EstimatedRows": 55787407.63322778
},
"children": [
{
"operatorType": "Filter",
"identifiers": [
"gg",
"x"
],
"arguments": {
"Expression": "not gg.geneId IN x",
"EstimatedRows": 2231620.0666843406
},
"children": [
{
"operatorType": "AllNodesScan",
"identifiers": [
"gg",
"x"
],
"arguments": {
"EstimatedRows": 31085476
},
"children": []
}
]
},
{
"operatorType": "NodeUniqueIndexSeek",
"identifiers": [
"g",
"x"
],
"arguments": {
"EstimatedRows": 24.99861354810036,
"Index": ":Gene(geneId)"
},
"children": []
}
]
}
]
}
]
}
]
},
"profile": false,
"notifications": [],
"server": {
"address": "0.0.0.0:7687",
"version": "Neo4j/3.5.5"
},
"resultConsumedAfter": {
"low": 0,
"high": 0
},
"resultAvailableAfter": {
"low": 0,
"high": 0
}
} |
@Lojjs Thank you for looking into this I hope to hear back from you soon. If the information above does not help, what would be a good way to share the whole database dump privately? The size of the file is ~2.7GB. |
@raduvanciu Since you are using the enterprise edition, perhaps you are a paying customer of Neo4j, and therefore you could also register an official support ticket with customer support and they can help provide you with a way to upload a large dataset privately. |
@craigtaverner Unfortunately, the support does not extend to the startup program, so we are not eligible for customer support. I can provide, via email, a private url to download the database. At the time of writing the comment, no developer is assigned to this issue. |
Hi @raduvanciu, perhaps if you could message me directly on the neo4j-users.slack.com we can discuss how best to proceed. |
@raduvanciu Is this still a problem for you or can we perhaps close this issue? |
@raduvanciu I will close this issue, please let us know if you have further questions or issues. Regards, |
I discovered that some cypher queries using shortest path fail for specific input, while at the same time work well with others.
the following queries are expected to complete without errors
QUERY1
WITH [54494,23710] AS x MATCH (g:Gene), p=shortestPath((g)-[:ACTIVATION*1..2]->(gg)) WHERE g.geneId in x AND NOT gg.geneId in x RETURN p
QUERY2
WITH [54494,23710,513] AS x MATCH (g:Gene), p=shortestPath((g)-[:ACTIVATION*1..2]->(gg)) WHERE g.geneId in x AND NOT gg.geneId in x RETURN p
QUERY3
WITH [54494] AS x MATCH (g:Gene), p=shortestPath((g)-[:ACTIVATION*1..2]->(gg)) WHERE g.geneId in x AND NOT gg.geneId in x RETURN p
QUERY4
WITH [23710] AS x MATCH (g:Gene), p=shortestPath((g)-[:ACTIVATION*1..2]->(gg)) WHERE g.geneId in x AND NOT gg.geneId in x RETURN p
QUERY5:
WITH [513] AS x MATCH (g:Gene), p=shortestPath((g)-[:ACTIVATION*1..2]->(gg)) WHERE g.geneId in x AND NOT gg.geneId in x RETURN p
the following error occurs for QUERY1
Neo.DatabaseError.General.UnknownError: Node[10573] not connected to this relationship[38722245]
QUERY2 completes successfully and return a paths with 34 nodes and 33 edges
QUERY3 completes successfully and return no paths
QUERY4 completes successfully and return a paths with 34 nodes and 33 edges
QUERY5 completes successfully and return no paths
I tried various combinations and several versions 3.5.4, 3.5.5, 3.5.8 using Docker or deployed on Ubuntu via AWS official AMI. I also tried removing the limit to the shortest path without much luck. Finally, I tried setting the cypher.forbid_exhaustive_shortestpath to true as described here:
https://neo4j.com/docs/cypher-manual/current/execution-plans/shortestpath-planning/
The following query returns 3 nodes and 1 edge, which indicates that the data is fine, the node with id 10573 is disconnected from the relationship with id 38722245.
MATCH (n), p=()-[r]->() where id(n)=10573 and id(r)=38722245 return p, n
I am not sure how to approach this error, but it seems to be related to the optimized version of the shortestPath algorithm. Any help would be much appreciated. In the meantime I will try to provide a minimal database to reproduce the problem. Sharing the whole database dump (~2GB) with you may be an option if needed.
I was unable yet to reproduce the problem in version 3.4.1.
The text was updated successfully, but these errors were encountered: