Make it possible to configure missing values. #11042

jpountz · 2015-05-07T16:17:11Z

Most aggregations (terms, histogram, stats, percentiles, geohash-grid) now
support a new missing option which defines the value to consider when a
field does not have a value. This can be handy if you eg. want a terms
aggregation to handle the same way documents that have "N/A" or no value
for a tag field.

This works in a very similar way to the missing option on the sort
element.

One known issue is that this option sometimes cannot make the right decision
in the unmapped case: it needs to replace all values with the missing value
but might not know what kind of values source should be produced (numerics,
strings, geo points?). For this reason, we might want to add an unmapped_type
option in the future like we did for sorting.

Related to #5324

jpountz · 2015-05-07T16:23:04Z

While the API proposal here is different from the one proposed on #5324, I think it could address most use-cases and even be more generic. For instance, in some cases you might want to have a dedicated bucket for documents that miss a value and all that you would have to do would be to pass a value which doesn't exist in the index (eg. _missing, but the choice is free). In other cases however it might make sense to put documents that miss a value into an existing bucket, I think a good example of that would be the N/A value for a tag field: documents that don't have a value for the tag field and documents that have this value should really be treated the same.

Also I like that we would have a consistent behaviour in all aggregations that support this parameter (ie. all aggregations that work on top of a field or script but missing), which would be consistent with sorting as well.

clintongormley · 2015-05-07T16:25:11Z

docs/reference/aggregations/bucket/datehistogram-aggregation.asciidoc

+
+==== Missing value
+
+The `missing` parameter defines how documents that miss a value should be treated.


"that are missing a value"

clintongormley · 2015-05-07T16:35:19Z

Nice work!

jpountz · 2015-05-11T08:09:01Z

Thanks @clintongormley for helping fix the docs, I pushed a new commit.

colings86 · 2015-05-11T15:07:28Z

docs/reference/aggregations/bucket/datehistogram-aggregation.asciidoc

@@ -123,3 +123,26 @@ settings and filter the returned buckets based on a `min_doc_count` setting (by
 bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds`
 setting, which enables extending the bounds of the histogram beyond the data itself (to read more on why you'd want to
 do that please refer to the explanation <<search-aggregations-bucket-histogram-aggregation-extended-bounds,here>>).
+
+==== Missing value


would this section not fit better in the general aggregations section since it affects (almost) every aggregation and is the same syntax for them all?

I made it similar to other features like script support. While this duplicates the documentation effort, it also has the benefit of showing an example in context (also note that examples try to be meaningful to the aggregation whenever possible)

Ok, that makes sense

colings86 · 2015-05-11T15:20:50Z

@jpountz left a couple of minor comments

colings86 · 2015-05-14T11:24:50Z

LGTM

Most aggregations (terms, histogram, stats, percentiles, geohash-grid) now support a new `missing` option which defines the value to consider when a field does not have a value. This can be handy if you eg. want a terms aggregation to handle the same way documents that have "N/A" or no value for a `tag` field. This works in a very similar way to the `missing` option on the `sort` element. One known issue is that this option sometimes cannot make the right decision in the unmapped case: it needs to replace all values with the `missing` value but might not know what kind of values source should be produced (numerics, strings, geo points?). For this reason, we might want to add an `unmapped_type` option in the future like we did for sorting. Related to elastic#5324

Aggs: Make it possible to configure missing values.

mrfelton · 2015-07-16T08:56:37Z

Can't get this working for the life of me. Is this in 1.6? I can't find any documentation on this feature at https://www.elastic.co/

Parse Failure [Unknown key for a VALUE_STRING in [campaign_term]: [missing].]]

{
  "size": 0,
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "_type:Subscription",
          "analyze_wildcard": true
        }
      }
    }
  },
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "date",
        "interval": "1M",
        "pre_zone_adjust_large_interval": true,
        "min_doc_count": 1
      },
      "aggs": {
        "campaign_term": {
          "terms": {
            "field": "context.campaign.term",
            "size": 0,
            "missing": "hr-openers"
          }
        }
      }
    }
  }
}

colings86 · 2015-07-16T09:40:46Z

@mrfelton this will be available from 2.0 onwards. The documentation for it is availble on the master branch of the docs. There is a new section for each agg called 'Missing Values'. For example: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-metrics-avg-aggregation.html#_missing_value

GrahamHannington · 2015-08-24T14:47:07Z

In the meantime, before 2.0, and with apologies if this has already been covered: can you specify a script in the Kibana "JSON input" field that dynamically replaces a missing field value with zero? (And can someone point me to detailed documentation of what can be specified in that field? My Google-fu has failed there, too.)

GrahamHannington · 2015-08-25T07:54:22Z

Suppose I have an Elasticsearch document with no "grade" field; the "grade" field is missing.

Suppose I have another document with a "grade" field explicitly specified as null:

"grade": null

Will the new-for-2.0 missing option apply to both documents?

jpountz · 2015-08-25T09:07:32Z

Missing and null will be considered the same by default, unless you configure a null_value in your mappings.

Regarding scripting, you can indeed do that in 1.x by running the aggregation of a script (likely with a bit performance/memory usage hit) that would check whether the list of values is empty.

GrahamHannington · 2015-08-26T09:41:30Z

Thanks, @jpountz . Re:

you can indeed [replace a null/missing field value with zero] in 1.x by running the aggregation of a script

Could you please either spoonfeed me (cringe, sorry) the appropriate contents of the Kibana JSON Input field, or point me to detailed documentation for specifying the contents of that field? I can write "if x is null, then set x to 0" in a few programming languages, but I lack the experience and detailed documentation I need to do that in this context (such as the surrounding JSON, the specific syntax and variable names).

clintongormley · 2015-08-26T11:00:51Z

@GrahamHannington

Not sure about the Kibana side, but here's an example (with groovy dynamic scripting) which will replace missing values with -1:

DELETE t

POST t/t/
{
  "num": 1
}
POST t/t/
{
  "num": 2
}
POST t/t/
{

}

GET t/_search?size=0
{
  "aggs": {
    "nums": {
      "histogram": {
        "interval": 1,
        "script": "doc['num'][0] == null ? -1 : doc['num'].value"
      }
    }
  }
}

You could use the expression language instead, but be aware that it doesn't support nulls, so you can't distinguish null from 0. If zeroes aren't important you can do:

GET t/_search?size=0
{
  "aggs": {
    "nums": {
      "histogram": {
        "interval": 1,
        "script": "doc['num'].value || -1",
        "lang": "expression"
      }
    }
  }
}

GrahamHannington · 2015-08-31T04:20:44Z

Thanks again, @jpountz .

I need the average (avg) aggregation to include documents with null or missing field values in its count, and treat those null or missing field values as zero. Otherwise, I get (what I consider to be) skewed averages.

For example, suppose I have the following five Elasticsearch documents, where T_n_ is a timestamp value, and grade is the name of a field on which I want to perform an average calculation:

Timestamp	grade
T1	null or missing
T2	10
T3	null or missing
T4	10
T5	null or missing

Currently, when I use an average aggregration in a visualization, a bucket that includes T1 - T5 shows the average grade as 10:

(10 + 10) / 2 = 10

(that is, it skips the documents with null or missing grade)

whereas I want it to show 4 (to include the documents with null or missing grade, and treat grade as zero):

(0 + 10 + 0 + 10 + 0) / 5 = 4

However, I have so far been unable to trap null field values via the Kibana JSON Input field.

I suspect (I could be wrong) that what is happening is that Kibana (more specifically, Elasticsearch; but I'm doing all of this through the Kibana user interface) skips the documents with null or missing field values, and so those documents never "reach" the JSON Input field value.

I can use the following JSON Input field value to override the values of fields that are present (say, replace 10 with 20):

{ "script": "10 ? 20 : _value" }

but the following has no effect:

{ "script": "null ? 20 : _value" }

Similarly, neither does this, possibly unfaithfully transcribed from your suggestion (much appreciated, thank you):

{ "script" : "doc['a'][0] == null ? 0 : doc['a'].value" }

I'd appreciate some more advice here. I'd like to have a workaround (before 2.0 arrives) for these skewed averages that doesn't involve re-loading the (currently, deliberately "sparse") data with explicit zero field values. Even if that workaround involves a performance hit on large data sets (as I imagine this script-based would; so far, I've only tested it on very small indices).

jpountz · 2015-09-02T07:31:06Z

Unfortunately, this can't be done today because Kibana requires you to configure a field and then merges the agg definition with the value in the json input, which makes elasticsearch run the script on every value instead of every document.

lmath · 2021-06-01T15:14:39Z

We came across this feature of configuring missing values looking at the Terms Aggregation docs and were excited to use it with rollup search, but it doesn't seem like this feature is available yet for rollup search. @polyfractal we were wondering if you might know if configuring missing values are available for rollup search or if there some is other way to search for missing values?

jpountz added v2.0.0-beta1 review :Analytics/Aggregations Aggregations labels May 7, 2015

clintongormley reviewed May 7, 2015
View reviewed changes

colings86 reviewed May 11, 2015
View reviewed changes

jccq mentioned this pull request May 12, 2015

Add "missing" and "other" values to terms agg elastic/kibana#1961

Closed

jpountz force-pushed the feature/aggs_missing branch from 8698a26 to 32e23b9 Compare May 15, 2015 14:32

jpountz added a commit that referenced this pull request May 15, 2015

Merge pull request #11042 from jpountz/feature/aggs_missing

bf599d6

Aggs: Make it possible to configure missing values.

jpountz merged commit bf599d6 into elastic:master May 15, 2015

jpountz deleted the feature/aggs_missing branch May 15, 2015 14:33

kevinkluge removed the review label May 15, 2015

This was referenced May 15, 2015

Ability to assume missing field as zero in aggregations #5298

Closed

null buckets missing from terms aggregation #6273

Closed

clintongormley added >feature release highlight labels May 15, 2015

jpountz mentioned this pull request May 21, 2015

Add support for "missing" to all bucket aggregations #5324

Closed

clintongormley changed the title ~~Aggs: Make it possible to configure missing values.~~ Make it possible to configure missing values. Jun 6, 2015

spalger mentioned this pull request Sep 9, 2015

Incorrect bar ordering for unique count with terms sub aggregation elastic/kibana#3314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it possible to configure missing values. #11042

Make it possible to configure missing values. #11042

jpountz commented May 7, 2015

jpountz commented May 7, 2015

clintongormley May 7, 2015

clintongormley commented May 7, 2015

jpountz commented May 11, 2015

colings86 May 11, 2015

jpountz May 12, 2015

colings86 May 12, 2015

colings86 commented May 11, 2015

colings86 commented May 14, 2015

mrfelton commented Jul 16, 2015

colings86 commented Jul 16, 2015

GrahamHannington commented Aug 24, 2015

GrahamHannington commented Aug 25, 2015

jpountz commented Aug 25, 2015

GrahamHannington commented Aug 26, 2015

clintongormley commented Aug 26, 2015

GrahamHannington commented Aug 31, 2015

jpountz commented Sep 2, 2015

lmath commented Jun 1, 2021


		==== Missing value

		The `missing` parameter defines how documents that miss a value should be treated.

Make it possible to configure missing values. #11042

Make it possible to configure missing values. #11042

Conversation

jpountz commented May 7, 2015

jpountz commented May 7, 2015

clintongormley May 7, 2015

Choose a reason for hiding this comment

clintongormley commented May 7, 2015

jpountz commented May 11, 2015

colings86 May 11, 2015

Choose a reason for hiding this comment

jpountz May 12, 2015

Choose a reason for hiding this comment

colings86 May 12, 2015

Choose a reason for hiding this comment

colings86 commented May 11, 2015

colings86 commented May 14, 2015

mrfelton commented Jul 16, 2015

colings86 commented Jul 16, 2015

GrahamHannington commented Aug 24, 2015

GrahamHannington commented Aug 25, 2015

jpountz commented Aug 25, 2015

GrahamHannington commented Aug 26, 2015

clintongormley commented Aug 26, 2015

GrahamHannington commented Aug 31, 2015

jpountz commented Sep 2, 2015

lmath commented Jun 1, 2021