Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A potential performance bug (missing optimization) #12957

Closed
YuanchengJiang opened this issue Oct 22, 2022 · 11 comments
Closed

A potential performance bug (missing optimization) #12957

YuanchengJiang opened this issue Oct 22, 2022 · 11 comments
Assignees

Comments

@YuanchengJiang
Copy link

  • Neo4j version: Tested with version 4.1.12 and verified with 4.4.12 online sandbox
  • Operating system: Ubuntu 20.04
  • API/Driver: Cypher
  • Dataset: Pole(Crime analysis)

A potential performance bug found by executing equivalent queries:

Query 1:

MATCH (a)-[b]-(c)-[d]-(e)-[f]-(g)-[h]-(i) RETURN a ORDER BY id(a) LIMIT 1;

Result 1:

+----------------------------------------------------------------------+
| a                                                                    |
+----------------------------------------------------------------------+
| (:Person {name: "Todd", nhs_no: "117-66-8129", surname: "Hamilton"}) |
+----------------------------------------------------------------------+

1 row available after 3 ms, consumed after another 13930 ms

Mem Consumption: Over 4G

Query 2:

MATCH (a)-[b]-(c)-[d]-(e)-[f]-(g)-[h]-(i) where id(a)>=0 RETURN a ORDER BY id(a) LIMIT 1;

Note that where id(a)>=0 is always True

Result 2

+----------------------------------------------------------------------+
| a                                                                    |
+----------------------------------------------------------------------+
| (:Person {name: "Todd", nhs_no: "117-66-8129", surname: "Hamilton"}) |
+----------------------------------------------------------------------+

1 row available after 3 ms, consumed after another 19 ms

Mem Consumption: trivial

Not really sure why one redundant WHERE clause can boost the performance. Is it a potentially missed optimization for the first query?

Results in Sandbox:
Screenshot 2022-10-22 at 12 46 06 PM

@HannesSandberg
Copy link
Contributor

Thank you for your report! We will investigate and come back to you

@arnefischereit
Copy link
Contributor

Hi @YuanchengJiang

The additional predicate should not improve the performance as it is an additional operation that we undertake. I would suspect some other effects to be at play here such as the second query being faster because it can use the cached results from the first run.
Without further knowing the dataset, we also have no way of investigating the issue.

If you still think that the 2nd query is actually faster, please provide the output of PROFILE for both each after you have cleared the caches using db.clearQueryCaches()

@YuanchengJiang
Copy link
Author

Hi, one easy way to reproduce is using the online sandbox: https://sandbox.neo4j.com/. I am using the built-in crime analysis dataset so you can easily set it up.

Or you can download and import the dataset from https://github.com/neo4j-graph-examples/pole.

I also tested two queries with db.clearQueryCaches(). It does help but performance difference still exists(and up to 30+ times). See the snapshot below:

Screenshot 2022-10-24 at 3 22 35 PM

@arnefischereit
Copy link
Contributor

I ran the two queries on the crime investigation dataset and the one with the predicate seems to be a bit slower.

These are the results after a bit of warming up for each one of them:
image

When I run them on a fresh cache, they are at ~4 secs and ~2 secs respectively.

@YuanchengJiang
Copy link
Author

Hi, I have similar results to yours but I do not think it is normal to execute queries with PROFILE...

Have you tried the queries without PROFILE? See my results below, PROFILE is OK but MATCH is problematic:

Screenshot 2022-10-24 at 3 47 44 PM

Is it possible that PROFILE and MATCH could execute two different code branches?

@arnefischereit
Copy link
Contributor

If you add the PROFILE prefix to the query, it will add some additional monitoring. That makes it a bit slower in general but should not distort the results. This is what I get without PROFILE:
image

How often did you run each query? Have you tried it in Sandbox?

@YuanchengJiang
Copy link
Author

Hi, I just rerun queries on cypher 4.4 and 4.1.
I think the issue exists only in 4.1 so I upgrade it to solve it.

The sandbox is a bit unstable so I am not sure but cypher 4.4 does not give such a performance difference.

Thanks for your response.

@arnefischereit
Copy link
Contributor

Thank you for the feedback.

Let me know if this issue is of any more use for you, otherwise i will close it.

@YuanchengJiang
Copy link
Author

Hi, please close the issue

@YuanchengJiang
Copy link
Author

Just for your information, I notice that the call stack is different in 4.1 but as it is older version maybe no need to worry about.

The profile log in version 4.1:

neo4j@neo4j> call db.clearQueryCaches();
+------------------------------+
| value                        |
+------------------------------+
| "Query cache already empty." |
+------------------------------+

1 row available after 12 ms, consumed after another 2 ms
neo4j@neo4j> PROFILE MATCH (a)-[b]-(c)-[d]-(e)-[f]-(g)-[h]-(i) RETURN a ORDER BY id(a) LIMIT 1;
+----------------------------------------------------------------------+
| a                                                                    |
+----------------------------------------------------------------------+
| (:Person {name: "Todd", nhs_no: "117-66-8129", surname: "Hamilton"}) |
+----------------------------------------------------------------------+

+-------------------------------------------------------------------------------------------------------------+
| Plan      | Statement   | Version      | Planner | Runtime       | Time  | DbHits   | Rows | Memory (Bytes) |
+-------------------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | "CYPHER 4.1" | "COST"  | "INTERPRETED" | 16479 | 14988516 | 1    | 7119231888     |
+-------------------------------------------------------------------------------------------------------------+


+-----------------------+-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| Operator              | Details                                             | Estimated Rows | Rows     | DB Hits  | Cache H/M | Memory (Bytes) | Ordered by |
+-----------------------+-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +ProduceResults@neo4j | a                                                   |              1 |        1 |        0 |       0/0 |                | id(a) ASC  |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +Limit@neo4j          | 1                                                   |              1 |        1 |        0 |       0/0 |                | id(a) ASC  |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +Filter@neo4j         | not b = f AND not b = h AND not d = h AND not d = f |        8622334 |        1 |        0 |       0/0 |                | id(a) ASC  |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +NodeHashJoin@neo4j   | e                                                   |        8622334 |       45 |        0 |       0/0 |     7106628736 | id(a) ASC  |
| |\                    +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| | +Filter@neo4j       | not b = d                                           |         728323 |        1 |        0 |       0/0 |                | id(a) ASC  |
| | |                   +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| | +Expand(All)@neo4j  | (c)-[d]-(e)                                         |         728323 |        2 |        3 |       0/0 |                | id(a) ASC  |
| | |                   +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| | +Expand(All)@neo4j  | (a)-[b]-(c)                                         |         211677 |        1 |        2 |       0/0 |                | id(a) ASC  |
| | |                   +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| | +Sort@neo4j         | `id(a)` ASC                                         |          61521 |        1 |        0 |       0/0 |       12603152 | id(a) ASC  |
| | |                   +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| | +Projection@neo4j   | id(a) AS `id(a)`                                    |          61521 |    61521 |        0 |       0/0 |                |            |
| | |                   +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| | +AllNodesScan@neo4j | a                                                   |          61521 |    61521 |    61522 |       0/0 |                |            |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +Filter@neo4j         | not f = h                                           |         728323 | 14168906 |        0 |       0/0 |                |            |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +Expand(All)@neo4j    | (g)-[h]-(i)                                         |         728323 | 14380586 | 14592266 |       0/0 |                |            |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +Expand(All)@neo4j    | (e)-[f]-(g)                                         |         211677 |   211680 |   273201 |       0/0 |                |            |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +AllNodesScan@neo4j   | e                                                   |          61521 |    61521 |    61522 |       0/0 |                |            |
+-----------------------+-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+

1 row available after 124 ms, consumed after another 16355 ms
neo4j@neo4j> call db.clearQueryCaches();
+---------------------------------------------------+
| value                                             |
+---------------------------------------------------+
| "Query caches successfully cleared of 1 queries." |
+---------------------------------------------------+

1 row available after 10 ms, consumed after another 1 ms
neo4j@neo4j> PROFILE MATCH (a)-[b]-(c)-[d]-(e)-[f]-(g)-[h]-(i) where id(a)>=0 RETURN a ORDER BY id(a) LIMIT 1;
+----------------------------------------------------------------------+
| a                                                                    |
+----------------------------------------------------------------------+
| (:Person {name: "Todd", nhs_no: "117-66-8129", surname: "Hamilton"}) |
+----------------------------------------------------------------------+

+----------------------------------------------------------------------------------------------------------+
| Plan      | Statement   | Version      | Planner | Runtime       | Time | DbHits | Rows | Memory (Bytes) |
+----------------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | "CYPHER 4.1" | "COST"  | "INTERPRETED" | 141  | 61533  | 1    | 12603152       |
+----------------------------------------------------------------------------------------------------------+


+-----------------------+---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| Operator              | Details                               | Estimated Rows | Rows  | DB Hits | Cache H/M | Memory (Bytes) | Ordered by |
+-----------------------+---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +ProduceResults@neo4j | a                                     |              1 |     1 |       0 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Limit@neo4j          | 1                                     |              1 |     1 |       0 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Filter@neo4j         | not f = h AND not b = h AND not d = h |        2586700 |     1 |       0 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Expand(All)@neo4j    | (g)-[h]-(i)                           |        2586700 |     1 |       2 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Filter@neo4j         | not b = f AND not d = f               |         751789 |     1 |       0 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Expand(All)@neo4j    | (e)-[f]-(g)                           |         751789 |     3 |       4 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Filter@neo4j         | not b = d                             |         218497 |     1 |       0 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Expand(All)@neo4j    | (c)-[d]-(e)                           |         218497 |     2 |       3 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Expand(All)@neo4j    | (a)-[b]-(c)                           |          63503 |     1 |       2 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Sort@neo4j           | `id(a)` ASC                           |          18456 |     1 |       0 |       0/0 |       12603152 | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Projection@neo4j     | id(a) AS `id(a)`                      |          18456 | 61521 |       0 |       0/0 |                |            |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Filter@neo4j         | id(a) >= $autoint_0                   |          18456 | 61521 |       0 |       0/0 |                |            |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +AllNodesScan@neo4j   | a                                     |          61521 | 61521 |   61522 |       0/0 |                |            |
+-----------------------+---------------------------------------+----------------+-------+---------+-----------+----------------+------------+

1 row available after 83 ms, consumed after another 58 ms

@arnefischereit
Copy link
Contributor

Thank you very much for that feedback. That explains a lot: Apparently, we find a suboptimal plan in 4.1 for the query without the predicate, which is fixed in 4.4.

As we do not support 4.1 any more, we will leave it at that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants