Range index-backed Order By should not require filter #13417

TomVanWemmel · 2024-03-12T10:20:11Z

Is your feature request related to a problem? Please describe.
To increase performance of application we allow sorting on properties that have a Range index. We were under the impression that a Range index-backed Order By would significantly speed up our queries. However, it seems that only works when the ORDER BY clause is combined with a filter.

For instance in the Recommendations sample database, there's an index on Movies.title.
The following query will perform a NodeByLabelScan (over 27000 dbhits for about 9000 movies)

MATCH (m:Movie)
RETURN m
ORDER BY m.title

But this query will perform a NodeIndexScan (460 dbhits for about 9000 movies)

MATCH (m:Movie) WHERE m.title IS NOT NULL
RETURN m
ORDER BY m.title

Describe the solution you'd like
It would be an improvement to avoid the insignificant NULL-check that triggers the query-planner to evaluate the index.

Describe alternatives you've considered
Alternatively, the documentation could explicitly mention the requirement of the filter

The text was updated successfully, but these errors were encountered:

InverseFalcon · 2024-03-12T10:36:07Z

To clarify, the NULL-check is not insignificant. null is not a property value in Neo4j, it is absence of a property. Since indexes in Neo4j only include property values, nodes of the given label that lack the indexed property are not included in the index.

This is why an index scan cannot be used in your first example, because MATCH (m:Movie) will naturally include nodes that may be lacking the title property, and the index does not know about those nodes. If we incorrectly performed an index scan, then the result would be incorrect, as nodes would be missing from the result set.

By adding the restriction in the Cypher query that the property must exist (NOT NULL implies existence), the planner then knows that the index contains the entire result set, and so it is allowed to plan an index scan.

TomVanWemmel · 2024-03-12T13:37:29Z

To clarify, the NULL-check is not insignificant. null is not a property value in Neo4j, it is absence of a property. Since indexes in Neo4j only include property values, nodes of the given label that lack the indexed property are not included in the index.

This is why an index scan cannot be used in your first example, because MATCH (m:Movie) will naturally include nodes that may be lacking the title property, and the index does not know about those nodes. If we incorrectly performed an index scan, then the result would be incorrect, as nodes would be missing from the result set.

By adding the restriction in the Cypher query that the property must exist (NOT NULL implies existence), the planner then knows that the index contains the entire result set, and so it is allowed to plan an index scan.

I wrote insignificant because every node in that database has a title and the result is not influenced by the filter.
Your explanation does make sense, but I wonder why null-values are excluded from the index?

InverseFalcon · 2024-03-12T14:58:33Z

Again, null is not a value. It is absence of the property entirely. (and likewise, setting a property to null is the same as removing the property)

If I created a new :Movie node without any properties, and then queried MATCH (n:Movie) WHERE n.title IS NULL then that new node would be returned, because it is a node that does not have the title property.

When we create an index, only nodes of the indexed label, that have a value for the property, will be included in the index. Indexes function on property values, and when no property value is present, then it doesn't get indexed.

If you are certain in your graph that a property should always be present, then either create an existence constraint, or a node key constraint on the property, and with either of those in place the planner will understand that the property must always be present, so it can plan an index scan without needing to provide the extra hint.

TomVanWemmel added the feature label Mar 12, 2024

JoelBergstrand added the team-cypher label Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Range index-backed Order By should not require filter #13417

Range index-backed Order By should not require filter #13417

TomVanWemmel commented Mar 12, 2024

InverseFalcon commented Mar 12, 2024

TomVanWemmel commented Mar 12, 2024 •

edited

InverseFalcon commented Mar 12, 2024 •

edited

Range index-backed Order By should not require filter #13417

Range index-backed Order By should not require filter #13417

Comments

TomVanWemmel commented Mar 12, 2024

InverseFalcon commented Mar 12, 2024

TomVanWemmel commented Mar 12, 2024 • edited

InverseFalcon commented Mar 12, 2024 • edited

TomVanWemmel commented Mar 12, 2024 •

edited

InverseFalcon commented Mar 12, 2024 •

edited