Aggregations: toXContent response contains duplicate fields #33514

spinscale · 2018-09-07T16:22:30Z

While debugging a watcher issue I encountered a problem that serializing a search response returned an exception saying that there were duplicate fields. Checking the JSON resulted in duplicate fields. I do not however understand why this is not an issue when writing out a search response, maybe the XContentHelper.convertToMap implementation is the culprit.

This happened on 6.3.2, but I also reproduced this up to master.

This is the test I was able to reproduce the issue with

    public void testFieldGetsWrittenOutTwice() throws Exception {
        // you need to add an additional index in order to trigger this (or potentially a shard)
        createIndex("foo_1");

        XContentBuilder builder = jsonBuilder().startObject().startObject("properties")
            .startObject("@timestamp").field("type", "date").endObject()
            .startObject("license").startObject("properties")
            .startObject("count").field("type", "long").endObject()
            .startObject("partnumber").field("type", "text").startObject("fields").startObject("keyword")
            .field("type", "keyword").field("ignore_above", 256)
            .endObject().endObject().endObject()
            .endObject().endObject().endObject().endObject();
        assertAcked(client().admin().indices().prepareCreate("foo_2")
            .addMapping("doc", builder).get());

        XContentBuilder docBuilder = jsonBuilder().startObject()
            .startObject("license").field("partnumber", "foobar").field("count", 2).endObject()
            .field("@timestamp", "2018-07-08T08:07:00.599Z")
            .endObject();

        client().prepareIndex("foo_2", "doc").setSource(docBuilder).setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE).get();

        client().admin().indices().prepareRefresh();

        TermsAggregationBuilder groupByLicenseAgg = AggregationBuilders.terms("group_by_license_partnumber").field("license.partnumber.keyword");
        MaxBucketPipelineAggregationBuilder peakPipelineAggBuilder =
            PipelineAggregatorBuilders.maxBucket("peak", "licenses_per_day>total_licenses");
        SumAggregationBuilder sumAggBuilder = AggregationBuilders.sum("total_licenses").field("license.count");
        DateHistogramAggregationBuilder licensePerDayBuilder =
            AggregationBuilders.dateHistogram("licenses_per_day").field("@timestamp").dateHistogramInterval(DateHistogramInterval.DAY);
        licensePerDayBuilder.subAggregation(sumAggBuilder);
        groupByLicenseAgg.subAggregation(licensePerDayBuilder);
        groupByLicenseAgg.subAggregation(peakPipelineAggBuilder);

        SearchResponse response = client().prepareSearch("foo_*").setSize(0).addAggregation(groupByLicenseAgg).get();
        // you can see in the logging output that there are actually two peak fields
        // I do not yet understand why this is not a problem in our rest layer but only in this case
        logger.info(Strings.toString(response));
        // conversion to a map fails because there are two `peak` fields
        BytesReference bytes = XContentHelper.toXContent(response, XContentType.JSON, false);
        XContentHelper.convertToMap(bytes, false, XContentType.JSON);
    }

The text was updated successfully, but these errors were encountered:

UnmappedTerms aggs try to delegate reduction to a sibling object that is not unmapped. That delegated agg will run the reductions, and also reduce any pipeline aggs. But because delegation comes before running pipelines, the UnmappedTerms _also_ tries to run pipeline aggs. This causes the pipeline to run twice, and potentially double it's output in buckets which can create invalid JSON and break when converting to maps. This fixes the issue by toggling a flag in UnmappedTerms if it delegated away reduction so that it knows not to run pipeline aggs either. Closes elastic#33514

elasticmachine · 2018-09-07T20:28:16Z

Pinging @elastic/es-search-aggs

Previously, unmapped aggs try to delegate reduction to a sibling agg that is mapped. That delegated agg will run the reductions, and also reduce any pipeline aggs. But because delegation comes before running pipelines, the unmapped agg _also_ tries to run pipeline aggs. This causes the pipeline to run twice, and potentially double it's output in buckets which can create invalid JSON (e.g. same key multiple times) and break when converting to maps. This fixes by sorting the list of aggregations ahead of time so that mapped aggs appear first, meaning they preferentially lead the reduction. If all aggs are unmapped, the first unmapped agg simply creates a new unmapped object and returns that for the reduction. This means that unmapped aggs no longer defer and there is no chance for a secondary execution of pipelines (or other side effects caused by deferring execution). Closes #33514

$@polyfractal$ polyfractal mentioned this issue Sep 7, 2018

Unmapped aggs should not run pipelines if they delegate reduction #33528

Merged

nik9000 added the :Analytics/Aggregations Aggregations label Sep 7, 2018

$@polyfractal$ polyfractal closed this as completed in #33528 Sep 26, 2018

fixmebot bot referenced this issue in VectorXz/elasticsearch Apr 22, 2021

Create TestFixMe.md

a9fae03

fixmebot bot referenced this issue in VectorXz/elasticsearch May 28, 2021

Create Helloworld.md

1398a04

fixmebot bot referenced this issue in VectorXz/elasticsearch Aug 4, 2021

Update Helloworld.md

f68abab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregations: toXContent response contains duplicate fields #33514

Aggregations: toXContent response contains duplicate fields #33514

spinscale commented Sep 7, 2018

elasticmachine commented Sep 7, 2018

Aggregations: toXContent response contains duplicate fields #33514

Aggregations: toXContent response contains duplicate fields #33514

Comments

spinscale commented Sep 7, 2018

elasticmachine commented Sep 7, 2018