Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding index incurs performance regression in some cases #13010

Closed
YuanchengJiang opened this issue Dec 16, 2022 · 2 comments
Closed

Adding index incurs performance regression in some cases #13010

YuanchengJiang opened this issue Dec 16, 2022 · 2 comments
Assignees

Comments

@YuanchengJiang
Copy link

Neo4j version: 5.2.0
Operating system: Ubuntu 20.04
API/Driver: Cypher
Dataset: Recommendations, https://github.com/neo4j-graph-examples/recommendations

Query: OPTIONAL MATCH (a:Person)--(b:Movie)-->()-[c:IN_GENRE]-(a:Person) WHERE a.name CONTAINS " " AND b.year>1920 AND c.genre IS NULL RETURN DISTINCT b ORDER BY b LIMIT 1;

Executing the query without index is more than 100x faster than the one with index.

Without Index(Using NodeByLabelScan):

neo4j@neo4j> call db.clearQueryCaches();
+---------------------------------------------------+
| value                                             |
+---------------------------------------------------+
| "Query caches successfully cleared of 1 queries." |
+---------------------------------------------------+

1 row
ready to start consuming query after 8 ms, results consumed after another 1 ms
neo4j@neo4j> show indexes;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | name                                                                 | state    | populationPercent | type     | entityType | labelsOrTypes | properties  | indexProvider      | owningConstraint      |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 60 | "__org_neo4j_schema_index_label_scan_store_converted_to_token_index" | "ONLINE" | 100.0             | "LOOKUP" | "NODE"     | NULL          | NULL        | "token-lookup-1.0" | NULL                  |
| 64 | "constraint_3b27b0"                                                  | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["User"]      | ["userId"]  | "range-1.0"        | "constraint_3b27b0"   |
| 59 | "constraint_3d5fcb7f"                                                | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Movie"]     | ["movieId"] | "range-1.0"        | "constraint_3d5fcb7f" |
| 63 | "constraint_4499eae9"                                                | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Person"]    | ["tmdbId"]  | "range-1.0"        | "constraint_4499eae9" |
| 61 | "constraint_737d9c1d"                                                | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Movie"]     | ["tmdbId"]  | "range-1.0"        | "constraint_737d9c1d" |
| 62 | "constraint_f8689281"                                                | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Genre"]     | ["name"]    | "range-1.0"        | "constraint_f8689281" |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

6 rows
ready to start consuming query after 17 ms, results consumed after another 2 ms
neo4j@neo4j> PROFILE OPTIONAL MATCH (a:Person)--(b:Movie)-->()-[c:IN_GENRE]-(a:Person) WHERE a.name CONTAINS " " AND b.year>1920 AND c.genre IS NULL RETURN DISTINCT b ORDER BY b LIMIT 1;
+------+
| b    |
+------+
| NULL |
+------+

+-------------------------------------------------------------------------------------------------+
| Plan      | Statement   | Version | Planner | Runtime   | Time | DbHits | Rows | Memory (Bytes) |
+-------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | ""      | "COST"  | "SLOTTED" | 77   | 102526 | 1    | 672            |
+-------------------------------------------------------------------------------------------------+


Planner COST

Runtime SLOTTED

Runtime version 5.2

+------------------+----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+------------+
| Operator         | Id | Details                                            | Estimated Rows | Rows  | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Ordered by |
+------------------+----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+------------+
| +ProduceResults  |  0 | b                                                  |              1 |     1 |       0 |                |                    0/0 |            |
| |                +----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+            |
| +Top             |  1 | b ASC LIMIT 1                                      |              1 |     1 |       0 |                |                    0/0 | b ASC      |
| |                +----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+------------+
| +Distinct        |  2 | b                                                  |              1 |     1 |       0 |            192 |                    0/0 |            |
| |                +----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+------------+
| +Optional        |  3 |                                                    |              1 |     1 |       0 |                |                    0/0 |            |
| |                +----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+------------+
| +Filter          |  4 | not c = anon_1 AND not anon_1 = anon_0             |              0 |     0 |       0 |                |                    0/0 |            |
| |                +----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+------------+
| +Expand(Into)    |  5 | (b)-[anon_1]->(anon_2)                             |              0 |     0 |       0 |            592 |                    0/0 |            |
| |                +----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+------------+
| +Filter          |  6 | b.year > $autoint_1 AND not c = anon_0 AND b:Movie |              1 |     0 |       0 |                |                    0/0 |            |
| |                +----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+------------+
| +Expand(All)     |  7 | (a)-[anon_0]-(b)                                   |              3 |     0 |       0 |                |                    0/0 |            |
| |                +----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+------------+
| +Filter          |  8 | c.genre IS NULL                                    |              1 |     0 |       0 |                |                    0/0 |            |
| |                +----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+------------+
| +Expand(All)     |  9 | (a)-[c:IN_GENRE]-(anon_2)                          |              2 |     0 |   64431 |                |                    0/0 |            |
| |                +----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+------------+
| +Filter          | 10 | a.name CONTAINS $autostring_0                      |           5714 | 18922 |   19047 |                |                    0/0 |            |
| |                +----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+------------+
| +NodeByLabelScan | 11 | a:Person                                           |          19047 | 19047 |   19048 |                |                    0/0 |            |
+------------------+----+----------------------------------------------------+----------------+-------+---------+----------------+------------------------+------------+

Total database accesses: 102526, total allocated memory: 672

1 row
ready to start consuming query after 52 ms, results consumed after another 25 ms

With Index(Using NodeIndexSeekByRange):

neo4j@neo4j> call db.clearQueryCaches();
+---------------------------------------------------+
| value                                             |
+---------------------------------------------------+
| "Query caches successfully cleared of 1 queries." |
+---------------------------------------------------+

1 row
ready to start consuming query after 7 ms, results consumed after another 1 ms
neo4j@neo4j> show indexes;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | name                                                                 | state    | populationPercent | type     | entityType | labelsOrTypes | properties     | indexProvider      | owningConstraint      |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 60 | "__org_neo4j_schema_index_label_scan_store_converted_to_token_index" | "ONLINE" | 100.0             | "LOOKUP" | "NODE"     | NULL          | NULL           | "token-lookup-1.0" | NULL                  |
| 64 | "constraint_3b27b0"                                                  | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["User"]      | ["userId"]     | "range-1.0"        | "constraint_3b27b0"   |
| 59 | "constraint_3d5fcb7f"                                                | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Movie"]     | ["movieId"]    | "range-1.0"        | "constraint_3d5fcb7f" |
| 63 | "constraint_4499eae9"                                                | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Person"]    | ["tmdbId"]     | "range-1.0"        | "constraint_4499eae9" |
| 61 | "constraint_737d9c1d"                                                | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Movie"]     | ["tmdbId"]     | "range-1.0"        | "constraint_737d9c1d" |
| 62 | "constraint_f8689281"                                                | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Genre"]     | ["name"]       | "range-1.0"        | "constraint_f8689281" |
| 67 | "index_531180b4"                                                     | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Movie"]     | ["tagline"]    | "range-1.0"        | NULL                  |
| 70 | "index_58c92700"                                                     | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Movie"]     | ["released"]   | "range-1.0"        | NULL                  |
| 65 | "index_5c0607ad"                                                     | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Person"]    | ["name"]       | "range-1.0"        | NULL                  |
| 69 | "index_824aad20"                                                     | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Movie"]     | ["imdbRating"] | "range-1.0"        | NULL                  |
| 72 | "index_a908f819"                                                     | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["User"]      | ["name"]       | "range-1.0"        | NULL                  |
| 66 | "index_ea29e173"                                                     | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Movie"]     | ["title"]      | "range-1.0"        | NULL                  |
| 68 | "index_fc5f831e"                                                     | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Movie"]     | ["year"]       | "range-1.0"        | NULL                  |
| 71 | "index_fd2dc3fc"                                                     | "ONLINE" | 100.0             | "RANGE"  | "NODE"     | ["Movie"]     | ["imdbId"]     | "range-1.0"        | NULL                  |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

14 rows
ready to start consuming query after 16 ms, results consumed after another 1 ms
neo4j@neo4j> PROFILE OPTIONAL MATCH (a:Person)--(b:Movie)-->()-[c:IN_GENRE]-(a:Person) WHERE a.name CONTAINS " " AND b.year>1920 AND c.genre IS NULL RETURN DISTINCT b ORDER BY b LIMIT 1;
+------+
| b    |
+------+
| NULL |
+------+

+----------------------------------------------------------------------------------------------------+
| Plan      | Statement   | Version | Planner | Runtime   | Time  | DbHits   | Rows | Memory (Bytes) |
+----------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | ""      | "COST"  | "SLOTTED" | 12147 | 86951181 | 1    | 672            |
+----------------------------------------------------------------------------------------------------+


Planner COST

Runtime SLOTTED

Runtime version 5.2

+-----------------------+----+-----------------------------------------------------------------------------------+----------------+----------+----------+----------------+------------------------+------------+
| Operator              | Id | Details                                                                           | Estimated Rows | Rows     | DB Hits  | Memory (Bytes) | Page Cache Hits/Misses | Ordered by |
+-----------------------+----+-----------------------------------------------------------------------------------+----------------+----------+----------+----------------+------------------------+------------+
| +ProduceResults       |  0 | b                                                                                 |              1 |        1 |        0 |                |                    0/0 |            |
| |                     +----+-----------------------------------------------------------------------------------+----------------+----------+----------+----------------+------------------------+            |
| +Top                  |  1 | b ASC LIMIT 1                                                                     |              1 |        1 |        0 |                |                    0/0 | b ASC      |
| |                     +----+-----------------------------------------------------------------------------------+----------------+----------+----------+----------------+------------------------+------------+
| +Distinct             |  2 | b                                                                                 |              1 |        1 |        0 |            192 |                    0/0 |            |
| |                     +----+-----------------------------------------------------------------------------------+----------------+----------+----------+----------------+------------------------+------------+
| +Optional             |  3 |                                                                                   |              1 |        1 |        0 |                |                    0/0 |            |
| |                     +----+-----------------------------------------------------------------------------------+----------------+----------+----------+----------------+------------------------+------------+
| +Filter               |  4 | not c = anon_0 AND not anon_1 = anon_0                                            |              0 |        0 |        0 |                |                    0/0 |            |
| |                     +----+-----------------------------------------------------------------------------------+----------------+----------+----------+----------------+------------------------+------------+
| +Expand(Into)         |  5 | (a)-[anon_0]-(b)                                                                  |              0 |        0 |        0 |            592 |                    0/0 |            |
| |                     +----+-----------------------------------------------------------------------------------+----------------+----------+----------+----------------+------------------------+------------+
| +Filter               |  6 | a:Person AND a.name CONTAINS $autostring_0 AND c.genre IS NULL AND not c = anon_1 |              0 |        0 | 43388090 |                |                    0/0 |            |
| |                     +----+-----------------------------------------------------------------------------------+----------------+----------+----------+----------------+------------------------+------------+
| +Expand(All)          |  7 | (anon_2)-[c:IN_GENRE]-(a)                                                         |            849 | 43388090 | 43428576 |                |                    0/0 |            |
| |                     +----+-----------------------------------------------------------------------------------+----------------+----------+----------+----------------+------------------------+------------+
| +Expand(All)          |  8 | (b)-[anon_1]->(anon_2)                                                            |            602 |    20251 |   125428 |                |                    0/0 |            |
| |                     +----+-----------------------------------------------------------------------------------+----------------+----------+----------+----------------+------------------------+------------+
| +NodeIndexSeekByRange |  9 | RANGE INDEX b:Movie(year) WHERE year > $autoint_1                                 |            270 |     9086 |     9087 |                |                    0/0 |            |
+-----------------------+----+-----------------------------------------------------------------------------------+----------------+----------+----------+----------------+------------------------+------------+

Total database accesses: 86951181, total allocated memory: 672

1 row
ready to start consuming query after 47 ms, results consumed after another 12100 ms
@InverseFalcon
Copy link

InverseFalcon commented Dec 16, 2022

While at the onset this looks like a case where the data in the graph explains the problem (no person node meeting those conditions has an :IN_GENRE relationship, so rows go to 0 early, vs the other approach that chooses to start at :Movie nodes, which IS more selective than a label scan, but when expanding the other direction ends up expanding through supernodes resulting in 43388090 rows all requiring expensive filtering that ends up eliminating all of them), there does seem to be opportunity to do an early check via counts store data to quickly determine if there are no such patterns that could possibly match.

In this case, the data model should be that :IN_GENRE relationships never connect to :Person nodes. The counts store knows this, and as a result it should be able to intuit that there are no paths that could possibly match this pattern.

As such, without any special fail-fast condition from the counts store, a plan that either starts with a :Person nodes, failing to expand the non-existent relationships (as what happens in the first without-indexes query) or that starts with a relationship type scan might be weighted higher by the planner, which would circumvent the choice to go with the ultimately terrible plan that has to expand through supernodes before it can perform the critical filtering step that eliminates all rows.

Additionally, there's some opportunity for the planner to have better insight into supernodes, so we could avoid picking plans that end up performing highly expensive expansions.

From the counts store, there are 20340 :IN_GENRE relationships, and from the counts of that relationship for each of the relationship types, we can conclude that the only possible pattern for these is (:Movie)-[:IN_GENRE]->(:Genre). From the counts store we also know there are 9125 :Movie nodes and 20 :Genre nodes. The planner could reasonably infer that :Genre nodes are supernodes with respect to :IN_GENRE relationships, since if evenly distributed that would be 1017 :IN_GENRE relationships per :Genre node, so the planner could conclude that expanding through or from :Genre nodes via :IN_GENRE relationships is expensive, and to weigh that during planning such that we won't choose a plan that expands in that direction for that pattern (expanding in the opposite direction, however, is fine, since :Movie nodes are not supernodes with respect to :IN_GENRE relationships).

@HannesSandberg HannesSandberg self-assigned this Dec 19, 2022
@sherfert
Copy link
Contributor

Hi @YuanchengJiang

It is unfortunate that the planner picks a worse plan in the presence of an index. The results are correct, however, so this is not a bug.

@InverseFalcon made a good analysis of the issue. In its current state, the planner is right to prefer starting with the index, if it exists. Unfortunately the suggested fixes to the planner are non-trivial and would have to be prioritized against other features.

The issue is also for a query that does not find any data because it does not fit the schema. Optimizing this sort of queries is generally a quite low priority compared to queries that can and do find data.

I hope this makes sense.

@sherfert sherfert closed this as not planned Won't fix, can't repro, duplicate, stale Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants