Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

searching for distinct values on an indexed property doesn't use the index to return results #7740

Closed
crichey opened this issue Aug 16, 2016 · 3 comments

Comments

@crichey
Copy link

crichey commented Aug 16, 2016

  • Neo4j version: 3.0.4
    Given a schema index on a property:
    ON :Customer(ethnicity) ONLINE

When asked for distinct values of that property the index should be used for performance reasons. However, as indicated by the profile below, it is not:

match(c:Customer) return distinct c.ethnicity;

Compiler CYPHER 3.0

Planner COST

Runtime INTERPRETED

+------------------+----------------+-------+---------+-------------+-------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+------------------+----------------+-------+---------+-------------+-------------+
| +ProduceResults | 28965 | 4 | 0 | c.ethnicity | c.ethnicity |
| | +----------------+-------+---------+-------------+-------------+
| +Distinct | 28965 | 4 | 30489 | c.ethnicity | c.ethnicity |
| | +----------------+-------+---------+-------------+-------------+
| +NodeByLabelScan | 30489 | 30489 | 30490 | c | :Customer |
+------------------+----------------+-------+---------+-------------+-------------+

Total database accesses: 60979

@spacecowboy
Copy link
Contributor

Thanks for the report!

@sherfert sherfert self-assigned this Oct 5, 2017
@prostokarablik
Copy link

We are interested in a fix of this problem too.

@sherfert
Copy link
Contributor

The reason this does not use an index is because of null values. If any of your Customer nodes is missing the ethnicity property, you will have an extra row with null.

If you know that you do have this property on every Customer node you can change the query to

MATCH(c:Customer) WHERE EXISTS(c.ethnicity) RETURN DISTINCT c.ethnicity

This will use a NodeIndexScan.

Please let me know if that solves your problem and if we can close the issue
@crichey @prostokarablik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants