Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range index-backed Order By should not require filter #13417

Open
TomVanWemmel opened this issue Mar 12, 2024 · 3 comments
Open

Range index-backed Order By should not require filter #13417

TomVanWemmel opened this issue Mar 12, 2024 · 3 comments

Comments

@TomVanWemmel
Copy link

Is your feature request related to a problem? Please describe.
To increase performance of application we allow sorting on properties that have a Range index. We were under the impression that a Range index-backed Order By would significantly speed up our queries. However, it seems that only works when the ORDER BY clause is combined with a filter.

For instance in the Recommendations sample database, there's an index on Movies.title.
The following query will perform a NodeByLabelScan (over 27000 dbhits for about 9000 movies)

MATCH (m:Movie)
RETURN m
ORDER BY m.title

But this query will perform a NodeIndexScan (460 dbhits for about 9000 movies)

MATCH (m:Movie) WHERE m.title IS NOT NULL
RETURN m
ORDER BY m.title

Describe the solution you'd like
It would be an improvement to avoid the insignificant NULL-check that triggers the query-planner to evaluate the index.

Describe alternatives you've considered
Alternatively, the documentation could explicitly mention the requirement of the filter

@InverseFalcon
Copy link

To clarify, the NULL-check is not insignificant. null is not a property value in Neo4j, it is absence of a property. Since indexes in Neo4j only include property values, nodes of the given label that lack the indexed property are not included in the index.

This is why an index scan cannot be used in your first example, because MATCH (m:Movie) will naturally include nodes that may be lacking the title property, and the index does not know about those nodes. If we incorrectly performed an index scan, then the result would be incorrect, as nodes would be missing from the result set.

By adding the restriction in the Cypher query that the property must exist (NOT NULL implies existence), the planner then knows that the index contains the entire result set, and so it is allowed to plan an index scan.

@TomVanWemmel
Copy link
Author

TomVanWemmel commented Mar 12, 2024

To clarify, the NULL-check is not insignificant. null is not a property value in Neo4j, it is absence of a property. Since indexes in Neo4j only include property values, nodes of the given label that lack the indexed property are not included in the index.

This is why an index scan cannot be used in your first example, because MATCH (m:Movie) will naturally include nodes that may be lacking the title property, and the index does not know about those nodes. If we incorrectly performed an index scan, then the result would be incorrect, as nodes would be missing from the result set.

By adding the restriction in the Cypher query that the property must exist (NOT NULL implies existence), the planner then knows that the index contains the entire result set, and so it is allowed to plan an index scan.

I wrote insignificant because every node in that database has a title and the result is not influenced by the filter.
Your explanation does make sense, but I wonder why null-values are excluded from the index?

@InverseFalcon
Copy link

InverseFalcon commented Mar 12, 2024

Again, null is not a value. It is absence of the property entirely. (and likewise, setting a property to null is the same as removing the property)

If I created a new :Movie node without any properties, and then queried MATCH (n:Movie) WHERE n.title IS NULL then that new node would be returned, because it is a node that does not have the title property.

When we create an index, only nodes of the indexed label, that have a value for the property, will be included in the index. Indexes function on property values, and when no property value is present, then it doesn't get indexed.

If you are certain in your graph that a property should always be present, then either create an existence constraint, or a node key constraint on the property, and with either of those in place the planner will understand that the property must always be present, so it can plan an index scan without needing to provide the extra hint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants