Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to assume missing field as zero in aggregations #5298

Closed
bobrik opened this issue Feb 28, 2014 · 3 comments
Closed

Ability to assume missing field as zero in aggregations #5298

bobrik opened this issue Feb 28, 2014 · 3 comments
Labels

Comments

@bobrik
Copy link
Contributor

bobrik commented Feb 28, 2014

To keep docs reasonably small we omit fields that has zero value, but when we use avg or extended_stats aggregation it would be nice to make missing values assumed to be zeroes too.

In example below we have 31 970 816 docs in bucket, but only 7 310 of them have non-zero value.

{
   "aggregations": {
      "country": {
         "buckets": [
            {
               "key": "RU",
               "doc_count": 31970816,
               "cents": {
                  "count": 7310,
                  "min": 8,
                  "max": 169800,
                  "avg": 514.1964432284542,
                  "sum": 3758776,
                  "sum_of_squares": 60978796462,
                  "variance": 8077434.639111836,
                  "std_deviation": 2842.0827994820693
               }
            }
        ]
    }
}

Maybe additional boolean parameter could be introduced for extended_stats and avg aggregations, like assume_zeroes?

cc @uboness

@roytmana
Copy link

roytmana commented Mar 1, 2014

I was about to sumitba request on this too.
Imi would suggest that anyvaggregation operating on a field should have missing option. If specified, aggregation should accumulate missing values under that value and honor any nested aggregations within. It should never assume any value like 0 since it may clash with actual keys.

I was planning to show examples of enormous query that is needed for a two lecel aggregation that has to cover all values including missing and other for both levels using missing aggregation. It can be done but not only the query is huge and highly repetitive the result need to be heavily processed to move second level keys nested under missing agg into the first level buckets.

Please please do implement missing as an option in all bucketing aggs!

I am not even asking to have an option to also aggregare other - keys that were not used due to size parameter although it would be veru useful :-)

@jpountz
Copy link
Contributor

jpountz commented Sep 5, 2014

I agree the behavior feels wrong with the avg or stats aggregations. Maybe we could support a missing option like sorting does.

@jpountz
Copy link
Contributor

jpountz commented May 15, 2015

Closed through #11042. Most aggs now have a missing option that allows to configure the value to consider when a document has no values.

@jpountz jpountz closed this as completed May 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants