Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregations: Return an upper bound of the maximum error for terms #6696

Closed
jpountz opened this issue Jul 2, 2014 · 0 comments

Comments

Projects
None yet
4 participants
@jpountz
Copy link
Contributor

commented Jul 2, 2014

The fact that terms aggregations don't give accurate counts is a bit deceptive. Without changing the way they are implemented, maybe we should make terms aggregations return an upper bound of the maximum error on the document count as part of the response? I think this would help make clear that there are potential accuracy issues, as well as make this inaccuracy easier to manage since there is a known upper bound on the error?

@jpountz jpountz self-assigned this Jul 2, 2014

@martijnvg martijnvg assigned colings86 and unassigned jpountz Jul 4, 2014

colings86 added a commit to colings86/elasticsearch that referenced this issue Jul 25, 2014

Aggregations: Added an option to show the upper bound of the error fo…
…r the terms aggregation.

This is only applicable when the order is set to _count.  The upper bound of the error in the doc count is calculated by summing the doc count of the last term on each shard which did not return the term.  The implementation calculates the error by summing the doc count for the last term on each shard for which the term IS returned and then subtracts this value from the sum of the doc counts for the last term from ALL shards.

Closes elastic#6696

@colings86 colings86 closed this in 655157c Jul 25, 2014

colings86 added a commit that referenced this issue Jul 25, 2014

Aggregations: Added an option to show the upper bound of the error fo…
…r the terms aggregation.

This is only applicable when the order is set to _count.  The upper bound of the error in the doc count is calculated by summing the doc count of the last term on each shard which did not return the term.  The implementation calculates the error by summing the doc count for the last term on each shard for which the term IS returned and then subtracts this value from the sum of the doc counts for the last term from ALL shards.

Closes #6696
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.