New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cardinality aggregation #5426

Closed
jpountz opened this Issue Mar 13, 2014 · 0 comments

Comments

Projects
None yet
2 participants
@jpountz
Contributor

jpountz commented Mar 13, 2014

The cardinality aggregation is a metric aggregation that allows to compute approximate unique counts based on the HyperLogLog++ algorithm which has the nice properties of both being close to accurate on low cardinalities and having fixed memory usage so that estimating high cardinalities doesn't blow up memory.

Example:

{
    "aggs" : {
        "author_count" : { 
            "cardinality" : { 
                "field" : "author"
            }
        }
    }
}

jpountz added a commit that referenced this issue Mar 13, 2014

Cardinality aggregation.
This aggregation computes unique term counts using the hyperloglog++ algorithm
which uses linear counting to estimate low cardinalities and hyperloglog on
higher cardinalities.

Since this algorithm works on hashes, it is useful for high-cardinality fields
to store the hash of values directly in the index, which is the purpose of
the new `murmur3` field type. This is less necessary on low-cardinality
string fields because the aggregator is smart enough to only compute the hash
once per unique value per segment thanks to ordinals, or on numeric fields
since hashing them is very fast.

Close #5426

@jpountz jpountz closed this in 5821fa0 Mar 13, 2014

@jpountz jpountz added v1.2.0 and removed v1.2.0 labels Mar 13, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment