A potential performance bug (missing optimization) #12957

YuanchengJiang · 2022-10-22T04:47:42Z

Neo4j version: Tested with version 4.1.12 and verified with 4.4.12 online sandbox
Operating system: Ubuntu 20.04
API/Driver: Cypher
Dataset: Pole(Crime analysis)

A potential performance bug found by executing equivalent queries:

Query 1:

MATCH (a)-[b]-(c)-[d]-(e)-[f]-(g)-[h]-(i) RETURN a ORDER BY id(a) LIMIT 1;

Result 1:

+----------------------------------------------------------------------+
| a                                                                    |
+----------------------------------------------------------------------+
| (:Person {name: "Todd", nhs_no: "117-66-8129", surname: "Hamilton"}) |
+----------------------------------------------------------------------+

1 row available after 3 ms, consumed after another 13930 ms

Mem Consumption: Over 4G

Query 2:

MATCH (a)-[b]-(c)-[d]-(e)-[f]-(g)-[h]-(i) where id(a)>=0 RETURN a ORDER BY id(a) LIMIT 1;

Note that where id(a)>=0 is always True

Result 2

+----------------------------------------------------------------------+
| a                                                                    |
+----------------------------------------------------------------------+
| (:Person {name: "Todd", nhs_no: "117-66-8129", surname: "Hamilton"}) |
+----------------------------------------------------------------------+

1 row available after 3 ms, consumed after another 19 ms

Mem Consumption: trivial

Not really sure why one redundant WHERE clause can boost the performance. Is it a potentially missed optimization for the first query?

Results in Sandbox:

The text was updated successfully, but these errors were encountered:

HannesSandberg · 2022-10-24T06:58:52Z

Thank you for your report! We will investigate and come back to you

arnefischereit · 2022-10-24T07:15:25Z

Hi @YuanchengJiang

The additional predicate should not improve the performance as it is an additional operation that we undertake. I would suspect some other effects to be at play here such as the second query being faster because it can use the cached results from the first run.
Without further knowing the dataset, we also have no way of investigating the issue.

If you still think that the 2nd query is actually faster, please provide the output of PROFILE for both each after you have cleared the caches using db.clearQueryCaches()

YuanchengJiang · 2022-10-24T07:25:09Z

Hi, one easy way to reproduce is using the online sandbox: https://sandbox.neo4j.com/. I am using the built-in crime analysis dataset so you can easily set it up.

Or you can download and import the dataset from https://github.com/neo4j-graph-examples/pole.

I also tested two queries with db.clearQueryCaches(). It does help but performance difference still exists(and up to 30+ times). See the snapshot below:

arnefischereit · 2022-10-24T07:42:29Z

I ran the two queries on the crime investigation dataset and the one with the predicate seems to be a bit slower.

These are the results after a bit of warming up for each one of them:

When I run them on a fresh cache, they are at ~4 secs and ~2 secs respectively.

YuanchengJiang · 2022-10-24T07:53:35Z

Hi, I have similar results to yours but I do not think it is normal to execute queries with PROFILE...

Have you tried the queries without PROFILE? See my results below, PROFILE is OK but MATCH is problematic:

Is it possible that PROFILE and MATCH could execute two different code branches?

arnefischereit · 2022-10-24T08:32:15Z

If you add the PROFILE prefix to the query, it will add some additional monitoring. That makes it a bit slower in general but should not distort the results. This is what I get without PROFILE:

How often did you run each query? Have you tried it in Sandbox?

YuanchengJiang · 2022-10-24T09:36:27Z

Hi, I just rerun queries on cypher 4.4 and 4.1.
I think the issue exists only in 4.1 so I upgrade it to solve it.

The sandbox is a bit unstable so I am not sure but cypher 4.4 does not give such a performance difference.

Thanks for your response.

arnefischereit · 2022-10-24T13:39:33Z

Thank you for the feedback.

Let me know if this issue is of any more use for you, otherwise i will close it.

YuanchengJiang · 2022-10-25T01:30:11Z

Hi, please close the issue

YuanchengJiang · 2022-10-25T04:59:18Z

Just for your information, I notice that the call stack is different in 4.1 but as it is older version maybe no need to worry about.

The profile log in version 4.1:

neo4j@neo4j> call db.clearQueryCaches();
+------------------------------+
| value                        |
+------------------------------+
| "Query cache already empty." |
+------------------------------+

1 row available after 12 ms, consumed after another 2 ms
neo4j@neo4j> PROFILE MATCH (a)-[b]-(c)-[d]-(e)-[f]-(g)-[h]-(i) RETURN a ORDER BY id(a) LIMIT 1;
+----------------------------------------------------------------------+
| a                                                                    |
+----------------------------------------------------------------------+
| (:Person {name: "Todd", nhs_no: "117-66-8129", surname: "Hamilton"}) |
+----------------------------------------------------------------------+

+-------------------------------------------------------------------------------------------------------------+
| Plan      | Statement   | Version      | Planner | Runtime       | Time  | DbHits   | Rows | Memory (Bytes) |
+-------------------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | "CYPHER 4.1" | "COST"  | "INTERPRETED" | 16479 | 14988516 | 1    | 7119231888     |
+-------------------------------------------------------------------------------------------------------------+


+-----------------------+-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| Operator              | Details                                             | Estimated Rows | Rows     | DB Hits  | Cache H/M | Memory (Bytes) | Ordered by |
+-----------------------+-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +ProduceResults@neo4j | a                                                   |              1 |        1 |        0 |       0/0 |                | id(a) ASC  |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +Limit@neo4j          | 1                                                   |              1 |        1 |        0 |       0/0 |                | id(a) ASC  |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +Filter@neo4j         | not b = f AND not b = h AND not d = h AND not d = f |        8622334 |        1 |        0 |       0/0 |                | id(a) ASC  |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +NodeHashJoin@neo4j   | e                                                   |        8622334 |       45 |        0 |       0/0 |     7106628736 | id(a) ASC  |
| |\                    +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| | +Filter@neo4j       | not b = d                                           |         728323 |        1 |        0 |       0/0 |                | id(a) ASC  |
| | |                   +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| | +Expand(All)@neo4j  | (c)-[d]-(e)                                         |         728323 |        2 |        3 |       0/0 |                | id(a) ASC  |
| | |                   +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| | +Expand(All)@neo4j  | (a)-[b]-(c)                                         |         211677 |        1 |        2 |       0/0 |                | id(a) ASC  |
| | |                   +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| | +Sort@neo4j         | `id(a)` ASC                                         |          61521 |        1 |        0 |       0/0 |       12603152 | id(a) ASC  |
| | |                   +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| | +Projection@neo4j   | id(a) AS `id(a)`                                    |          61521 |    61521 |        0 |       0/0 |                |            |
| | |                   +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| | +AllNodesScan@neo4j | a                                                   |          61521 |    61521 |    61522 |       0/0 |                |            |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +Filter@neo4j         | not f = h                                           |         728323 | 14168906 |        0 |       0/0 |                |            |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +Expand(All)@neo4j    | (g)-[h]-(i)                                         |         728323 | 14380586 | 14592266 |       0/0 |                |            |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +Expand(All)@neo4j    | (e)-[f]-(g)                                         |         211677 |   211680 |   273201 |       0/0 |                |            |
| |                     +-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+
| +AllNodesScan@neo4j   | e                                                   |          61521 |    61521 |    61522 |       0/0 |                |            |
+-----------------------+-----------------------------------------------------+----------------+----------+----------+-----------+----------------+------------+

1 row available after 124 ms, consumed after another 16355 ms

neo4j@neo4j> call db.clearQueryCaches();
+---------------------------------------------------+
| value                                             |
+---------------------------------------------------+
| "Query caches successfully cleared of 1 queries." |
+---------------------------------------------------+

1 row available after 10 ms, consumed after another 1 ms
neo4j@neo4j> PROFILE MATCH (a)-[b]-(c)-[d]-(e)-[f]-(g)-[h]-(i) where id(a)>=0 RETURN a ORDER BY id(a) LIMIT 1;
+----------------------------------------------------------------------+
| a                                                                    |
+----------------------------------------------------------------------+
| (:Person {name: "Todd", nhs_no: "117-66-8129", surname: "Hamilton"}) |
+----------------------------------------------------------------------+

+----------------------------------------------------------------------------------------------------------+
| Plan      | Statement   | Version      | Planner | Runtime       | Time | DbHits | Rows | Memory (Bytes) |
+----------------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | "CYPHER 4.1" | "COST"  | "INTERPRETED" | 141  | 61533  | 1    | 12603152       |
+----------------------------------------------------------------------------------------------------------+


+-----------------------+---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| Operator              | Details                               | Estimated Rows | Rows  | DB Hits | Cache H/M | Memory (Bytes) | Ordered by |
+-----------------------+---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +ProduceResults@neo4j | a                                     |              1 |     1 |       0 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Limit@neo4j          | 1                                     |              1 |     1 |       0 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Filter@neo4j         | not f = h AND not b = h AND not d = h |        2586700 |     1 |       0 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Expand(All)@neo4j    | (g)-[h]-(i)                           |        2586700 |     1 |       2 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Filter@neo4j         | not b = f AND not d = f               |         751789 |     1 |       0 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Expand(All)@neo4j    | (e)-[f]-(g)                           |         751789 |     3 |       4 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Filter@neo4j         | not b = d                             |         218497 |     1 |       0 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Expand(All)@neo4j    | (c)-[d]-(e)                           |         218497 |     2 |       3 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Expand(All)@neo4j    | (a)-[b]-(c)                           |          63503 |     1 |       2 |       0/0 |                | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Sort@neo4j           | `id(a)` ASC                           |          18456 |     1 |       0 |       0/0 |       12603152 | id(a) ASC  |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Projection@neo4j     | id(a) AS `id(a)`                      |          18456 | 61521 |       0 |       0/0 |                |            |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +Filter@neo4j         | id(a) >= $autoint_0                   |          18456 | 61521 |       0 |       0/0 |                |            |
| |                     +---------------------------------------+----------------+-------+---------+-----------+----------------+------------+
| +AllNodesScan@neo4j   | a                                     |          61521 | 61521 |   61522 |       0/0 |                |            |
+-----------------------+---------------------------------------+----------------+-------+---------+-----------+----------------+------------+

1 row available after 83 ms, consumed after another 58 ms

arnefischereit · 2022-10-25T06:54:30Z

Thank you very much for that feedback. That explains a lot: Apparently, we find a suboptimal plan in 4.1 for the query without the predicate, which is fixed in 4.4.

As we do not support 4.1 any more, we will leave it at that.

YuanchengJiang added the bug label Oct 22, 2022

HannesSandberg self-assigned this Oct 24, 2022

HannesSandberg added the team-cypher label Oct 24, 2022

arnefischereit closed this as completed Oct 25, 2022

YuanchengJiang mentioned this issue Oct 25, 2022

A potential performance bug (missing optimization) #12960

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A potential performance bug (missing optimization) #12957

A potential performance bug (missing optimization) #12957

YuanchengJiang commented Oct 22, 2022

HannesSandberg commented Oct 24, 2022

arnefischereit commented Oct 24, 2022

YuanchengJiang commented Oct 24, 2022

arnefischereit commented Oct 24, 2022

YuanchengJiang commented Oct 24, 2022

arnefischereit commented Oct 24, 2022

YuanchengJiang commented Oct 24, 2022

arnefischereit commented Oct 24, 2022

YuanchengJiang commented Oct 25, 2022

YuanchengJiang commented Oct 25, 2022

arnefischereit commented Oct 25, 2022

A potential performance bug (missing optimization) #12957

A potential performance bug (missing optimization) #12957

Comments

YuanchengJiang commented Oct 22, 2022

Query 1:

Result 1:

Query 2:

Result 2

HannesSandberg commented Oct 24, 2022

arnefischereit commented Oct 24, 2022

YuanchengJiang commented Oct 24, 2022

arnefischereit commented Oct 24, 2022

YuanchengJiang commented Oct 24, 2022

arnefischereit commented Oct 24, 2022

YuanchengJiang commented Oct 24, 2022

arnefischereit commented Oct 24, 2022

YuanchengJiang commented Oct 25, 2022

YuanchengJiang commented Oct 25, 2022

arnefischereit commented Oct 25, 2022