Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregations using "missing" lead to AggregationExecutionException because they use wrong datatype if the field was never stored in an index #20163

Closed
centic9 opened this Issue Aug 25, 2016 · 5 comments

Comments

Projects
None yet
3 participants
@centic9
Copy link
Contributor

centic9 commented Aug 25, 2016

Elasticsearch version: 2.1.2 and 2.3.5

Plugins installed: [none]

JVM version:1.8.0_80

OS version:Windows 10

Description of the problem including expected versus actual behavior:

We are working on a feature where different fields can be added to documents and we want them to be handled as double value.

However we cannot say upfront which datatype these items should have and thus cannot define datatypes directly, but rather need to use dynamic templates in the mapping, so our mapping is something like the following:

{
    "testtype": {
        "dynamic_templates": [{
            "num_fields": {
                "match": "amount*",
                "mapping": {
                    "type": "double"
                }
            }
        }]
    }
}

When using aggregations together with specifying a missing-value, we found that the returned aggregation-key is different depending upon if the field was ever stored in the index or not.

Also if we query across two indices, one which contains a value for that field and one which does not, we get the following exceptions when doing a Bucket-Term-Aggregation together with a missing-attribute:

AggregationExecutionException[Merging/Reducing the aggregations failed when computing the aggregation [ Name: testagg, Type: terms ] because: the field you gave in the aggregation query existed as two different types in two different indices]

In the test-case below, the 2nd query returns a string "-1", although ideally we would get a Double value -1.0. In the 3rd query we get the exception.

Steps to reproduce:

  1. See test-case below
public class OldIndexWithoutMappingTest extends ESIntegTestCase {
    @Test
    public void testTwoIndicesOneWithoutMapping() throws Exception {
        initializeIndex();

        // aggregate by terms with a missing-value set
        final TermsBuilder agg = AggregationBuilders.terms("testagg").field("amount-high").missing(-1);

        // first try with first index only, here the "missing" value is returned as double
        SearchResponse searchResponse = client().prepareSearch("testindex-1").addAggregation(agg).execute().actionGet();
        assertEquals(0, searchResponse.getFailedShards());
        assertEquals(2, searchResponse.getHits().getTotalHits());
        assertEquals(2, ((Terms)searchResponse.getAggregations().get("testagg")).getBuckets().size());
        assertEquals(1234.0, ((Terms)searchResponse.getAggregations().get("testagg")).getBucketByKey("1234.0").getKey());
        assertEquals(-1.0, ((Terms)searchResponse.getAggregations().get("testagg")).getBucketByKey("-1.0").getKey());

        // then with second index only, here the missing value is returned as string!
        searchResponse = client().prepareSearch("testindex-2").addAggregation(agg).execute().actionGet();
        assertEquals(0, searchResponse.getFailedShards());
        assertEquals(1, searchResponse.getHits().getTotalHits());
        assertEquals(1, ((Terms)searchResponse.getAggregations().get("testagg")).getBuckets().size());
        assertEquals("-1", ((Terms)searchResponse.getAggregations().get("testagg")).getBucketByKey("-1").getKey());

        // and finally with both indices, here it fails because we take the missing value from both indices and these are different!
        searchResponse = client().prepareSearch("testindex*").addAggregation(agg).execute().actionGet();
        assertEquals(0, searchResponse.getFailedShards());
        assertEquals(2, searchResponse.getHits().getTotalHits());
        assertEquals(2, ((Terms)searchResponse.getAggregations().get("testagg")).getBuckets().size());
    }

    private void initializeIndex() throws Exception {
        String testMapping = getMapping(XContentFactory.jsonBuilder().startObject().startObject("testtype")).string();
        System.out.println("Mapping: " + testMapping);

        assertAcked(prepareCreate("testindex-1").addMapping("testtype", testMapping));
        assertAcked(prepareCreate("testindex-2").addMapping("testtype", testMapping));

        List<IndexRequestBuilder> indexBuilders = new ArrayList<>();
        indexBuilders.add(client().prepareIndex("testindex-1", "testtype", "index1-1").setSource("{\"amount-high\":1234}"));
        indexBuilders.add(client().prepareIndex("testindex-1", "testtype", "index1-2").setSource("{}"));
        indexBuilders.add(client().prepareIndex("testindex-2", "testtype", "index2-1").setSource("{}"));
        indexRandom(true, indexBuilders);
    }

    private XContentBuilder getMapping(XContentBuilder testtype) throws IOException {
        return testtype.startArray("dynamic_templates")
                    .startObject()
                        .startObject("num_fields")
                            .field("match", "amount*")
                            .startObject("mapping")
                                .field("type", "double").endObject()
                        .endObject()
                    .endObject()
                .endArray();
    }
}
@centic9

This comment has been minimized.

Copy link
Contributor Author

centic9 commented Aug 25, 2016

The strange thing here is that filtering/querying works fine, also aggregations work fine unless the missing-attribute is specified.

And it also happens with non-dynamic mappings if one index has a field defined with a datatype and the other not.

The full stacktrace is:

Failed to execute phase [merge], [reduce] 

    at __randomizedtesting.SeedInfo.seed([3052F79434FAD641:8EABC959AFC832B0]:0)
    at org.elasticsearch.action.search.SearchDfsQueryThenFetchAsyncAction$3.onFailure(SearchDfsQueryThenFetchAsyncAction.java:212)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: AggregationExecutionException[Merging/Reducing the aggregations failed when computing the aggregation [ Name: testagg, Type: terms ] because: the field you gave in the aggregation query existed as two different types in two different indices]
    at org.elasticsearch.search.aggregations.bucket.terms.InternalTerms.doReduce(InternalTerms.java:188)
    at org.elasticsearch.search.aggregations.InternalAggregation.reduce(InternalAggregation.java:153)
    at org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:170)
    at org.elasticsearch.search.controller.SearchPhaseController.merge(SearchPhaseController.java:411)
    at org.elasticsearch.action.search.SearchDfsQueryThenFetchAsyncAction$3.doRun(SearchDfsQueryThenFetchAsyncAction.java:198)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    ... 3 more
@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Aug 25, 2016

@centic9 the problem is that we don't know what data type the missing field is supposed to have, and we need to know the datatype in order to be able to merge the agg results from each node.

@centic9

This comment has been minimized.

Copy link
Contributor Author

centic9 commented Aug 25, 2016

Yes, seems to be the case, yet it is somewhat unexpected to get an exveption here, this invalidates our use-case around aggs with missing..

One option would be to return the same datatype that the "missing" had, that would make it work for numeric values, but probably not for dates wher there is no separate json datatype.

The full solution would be to resolve the datatype of the field against the full index mapping including the dynamic_template similar to when the field is stored the first time for an index, not sure if this is feasible, though.

A third way could be to allow to specify "missing_type" where the user can define the datatype for any index where the field is missing completely.

@jpountz

This comment has been minimized.

Copy link
Contributor

jpountz commented Aug 26, 2016

We just discussed this issue in FixitFriday. Sorting has this feature through the unmapped_type option, which would be one way to fix this issue. But then there are also some concerns that every aggregation supporting the missing option would need to support this additional parameter, so we'd like to collect more data about the problem before moving forward to have a better idea of how much this is an issue and whether there are other ways that we could fix it.

@centic9

This comment has been minimized.

Copy link
Contributor Author

centic9 commented Aug 30, 2016

I could actually come up with sort of a small workaround for "Long" and "Date" fields when using terms-aggregations via the "value_type" that is originally intended for scripts, but is evaluated always and thus allows to "coerce" the type of the value accordingly.

It has the "side effect" of making the returned value for the "missing" value a "Long" even if the field is not yet created as field for the index.

However it does not work for Double (my main use case) due to ValuesSource.Numeric.EMPTY which always returns "false" for isFloatingPoint() and thus prevents values of type "double".

I think anything more will require code-changes in Elasticsearch to get it working.

jpountz added a commit to jpountz/elasticsearch that referenced this issue Dec 14, 2016

`value_type` is useful regardless of scripting.
Today we only expose `value_type` in scriptable aggregations, however it is
also useful with unmapped fields. I suspect we never noticed because
`value_type` was not documented (fixed) and most aggregations are scriptable.

Closes elastic#20163

jpountz added a commit that referenced this issue Dec 22, 2016

`value_type` is useful regardless of scripting. (#22160)
Today we only expose `value_type` in scriptable aggregations, however it is
also useful with unmapped fields. I suspect we never noticed because
`value_type` was not documented (fixed) and most aggregations are scriptable.

Closes #20163

jpountz added a commit that referenced this issue Dec 22, 2016

`value_type` is useful regardless of scripting. (#22160)
Today we only expose `value_type` in scriptable aggregations, however it is
also useful with unmapped fields. I suspect we never noticed because
`value_type` was not documented (fixed) and most aggregations are scriptable.

Closes #20163
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.