Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count distinct by field #5322

Closed
ghost opened this issue Mar 3, 2014 · 7 comments
Closed

Count distinct by field #5322

ghost opened this issue Mar 3, 2014 · 7 comments

Comments

@ghost
Copy link

ghost commented Mar 3, 2014

Hi i have a large index of tweets and need to know the numbers of distinct authors of a selected tweets (sql: count(distinct user) ), e.g: I make a query fetching facets of tweets that use #elastic and need to know how many different users wrote on it. thank you this functionality is the only one think that mysql get me and elastic not on this project

@jpountz
Copy link
Contributor

jpountz commented Mar 3, 2014

This is a feature that we plan to add to the aggregations framework, but it is taking some time because there is some infrastructure that we want to setup in order to be able to implement such an aggregation efficiently. Typically, there are some algorithms that only require hashes of the values in order to estimate the number of unique values and this is something we could leverage (by pre-computing hashes instead of computing them on the fly) to make this aggregation fast.

@davidronk
Copy link

We could also use a "distinct" feature. We currently use the elasticsearch-timefacets-plugin to do a distinct date histogram (but we are restricted to a fairly old ES version and would like to upgrade). Could there be a "distinct_value_count" added to the aggregation framework (or something similar)?

@jpountz
Copy link
Contributor

jpountz commented Mar 11, 2014

We definitely have plans for this. Since last time I left a comment on this issue, we started doing experiments with an aggregation to compute unique counts under the feature/cardinality_aggregation branch. This is still work in progress and I can't give you any release date for this feature, but we are making progress!

@davidronk
Copy link

Awesome, thanks for the update! That will be very helpful!

@jpountz
Copy link
Contributor

jpountz commented Mar 13, 2014

Good news, this was just pushed and will be available in Elasticsearch 1.1, see #5426 !

@jpountz jpountz closed this as completed Mar 13, 2014
@davidronk
Copy link

👍

@ghost
Copy link
Author

ghost commented Mar 13, 2014

Great, Thank you

2014-03-13 14:26 GMT-04:30 David Ronk notifications@github.com:

[image: 👍]

Reply to this email directly or view it on GitHubhttps://github.com//issues/5322#issuecomment-37572696
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants