Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Aggregations using "missing" lead to AggregationExecutionException because they use wrong datatype if the field was never stored in an index #20163
Elasticsearch version: 2.1.2 and 2.3.5
Plugins installed: [none]
OS version:Windows 10
Description of the problem including expected versus actual behavior:
We are working on a feature where different fields can be added to documents and we want them to be handled as double value.
However we cannot say upfront which datatype these items should have and thus cannot define datatypes directly, but rather need to use dynamic templates in the mapping, so our mapping is something like the following:
When using aggregations together with specifying a missing-value, we found that the returned aggregation-key is different depending upon if the field was ever stored in the index or not.
Also if we query across two indices, one which contains a value for that field and one which does not, we get the following exceptions when doing a Bucket-Term-Aggregation together with a missing-attribute:
In the test-case below, the 2nd query returns a string "-1", although ideally we would get a Double value -1.0. In the 3rd query we get the exception.
Steps to reproduce:
The strange thing here is that filtering/querying works fine, also aggregations work fine unless the missing-attribute is specified.
And it also happens with non-dynamic mappings if one index has a field defined with a datatype and the other not.
The full stacktrace is:
Yes, seems to be the case, yet it is somewhat unexpected to get an exveption here, this invalidates our use-case around aggs with missing..
One option would be to return the same datatype that the "missing" had, that would make it work for numeric values, but probably not for dates wher there is no separate json datatype.
The full solution would be to resolve the datatype of the field against the full index mapping including the dynamic_template similar to when the field is stored the first time for an index, not sure if this is feasible, though.
A third way could be to allow to specify "missing_type" where the user can define the datatype for any index where the field is missing completely.
We just discussed this issue in FixitFriday. Sorting has this feature through the
I could actually come up with sort of a small workaround for "Long" and "Date" fields when using terms-aggregations via the "value_type" that is originally intended for scripts, but is evaluated always and thus allows to "coerce" the type of the value accordingly.
It has the "side effect" of making the returned value for the "missing" value a "Long" even if the field is not yet created as field for the index.
However it does not work for Double (my main use case) due to
I think anything more will require code-changes in Elasticsearch to get it working.