Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregations: toXContent response contains duplicate fields #33514

Closed
spinscale opened this issue Sep 7, 2018 · 1 comment · Fixed by #33528
Closed

Aggregations: toXContent response contains duplicate fields #33514

spinscale opened this issue Sep 7, 2018 · 1 comment · Fixed by #33528
Labels

Comments

@spinscale
Copy link
Contributor

While debugging a watcher issue I encountered a problem that serializing a search response returned an exception saying that there were duplicate fields. Checking the JSON resulted in duplicate fields. I do not however understand why this is not an issue when writing out a search response, maybe the XContentHelper.convertToMap implementation is the culprit.

This happened on 6.3.2, but I also reproduced this up to master.

This is the test I was able to reproduce the issue with

    public void testFieldGetsWrittenOutTwice() throws Exception {
        // you need to add an additional index in order to trigger this (or potentially a shard)
        createIndex("foo_1");

        XContentBuilder builder = jsonBuilder().startObject().startObject("properties")
            .startObject("@timestamp").field("type", "date").endObject()
            .startObject("license").startObject("properties")
            .startObject("count").field("type", "long").endObject()
            .startObject("partnumber").field("type", "text").startObject("fields").startObject("keyword")
            .field("type", "keyword").field("ignore_above", 256)
            .endObject().endObject().endObject()
            .endObject().endObject().endObject().endObject();
        assertAcked(client().admin().indices().prepareCreate("foo_2")
            .addMapping("doc", builder).get());

        XContentBuilder docBuilder = jsonBuilder().startObject()
            .startObject("license").field("partnumber", "foobar").field("count", 2).endObject()
            .field("@timestamp", "2018-07-08T08:07:00.599Z")
            .endObject();

        client().prepareIndex("foo_2", "doc").setSource(docBuilder).setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE).get();

        client().admin().indices().prepareRefresh();

        TermsAggregationBuilder groupByLicenseAgg = AggregationBuilders.terms("group_by_license_partnumber").field("license.partnumber.keyword");
        MaxBucketPipelineAggregationBuilder peakPipelineAggBuilder =
            PipelineAggregatorBuilders.maxBucket("peak", "licenses_per_day>total_licenses");
        SumAggregationBuilder sumAggBuilder = AggregationBuilders.sum("total_licenses").field("license.count");
        DateHistogramAggregationBuilder licensePerDayBuilder =
            AggregationBuilders.dateHistogram("licenses_per_day").field("@timestamp").dateHistogramInterval(DateHistogramInterval.DAY);
        licensePerDayBuilder.subAggregation(sumAggBuilder);
        groupByLicenseAgg.subAggregation(licensePerDayBuilder);
        groupByLicenseAgg.subAggregation(peakPipelineAggBuilder);

        SearchResponse response = client().prepareSearch("foo_*").setSize(0).addAggregation(groupByLicenseAgg).get();
        // you can see in the logging output that there are actually two peak fields
        // I do not yet understand why this is not a problem in our rest layer but only in this case
        logger.info(Strings.toString(response));
        // conversion to a map fails because there are two `peak` fields
        BytesReference bytes = XContentHelper.toXContent(response, XContentType.JSON, false);
        XContentHelper.convertToMap(bytes, false, XContentType.JSON);
    }
polyfractal added a commit to polyfractal/elasticsearch that referenced this issue Sep 7, 2018
UnmappedTerms aggs try to delegate reduction to a sibling object that is
not unmapped.  That delegated agg will run the reductions, and also
reduce any pipeline aggs.  But because delegation comes before running
pipelines, the UnmappedTerms _also_ tries to run pipeline aggs.

This causes the pipeline to run twice, and potentially double it's output
in buckets which can create invalid JSON and break when converting to
maps.

This fixes the issue by toggling a flag in UnmappedTerms if it delegated
away reduction so that it knows not to run pipeline aggs either.

Closes elastic#33514
@nik9000 nik9000 added the :Analytics/Aggregations Aggregations label Sep 7, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

polyfractal added a commit that referenced this issue Sep 26, 2018
Previously, unmapped aggs try to delegate reduction to a sibling agg that is
mapped.  That delegated agg will run the reductions, and also
reduce any pipeline aggs.  But because delegation comes before running
pipelines, the unmapped agg _also_ tries to run pipeline aggs.

This causes the pipeline to run twice, and potentially double it's output
in buckets which can create invalid JSON (e.g. same key multiple times)
and break when converting to maps.

This fixes by sorting the list of aggregations ahead of time so that mapped
aggs appear first, meaning they preferentially lead the reduction.  If all aggs
are unmapped, the first unmapped agg simply creates a new unmapped object
and returns that for the reduction.

This means that unmapped aggs no longer defer and there is no chance for 
a secondary execution of pipelines (or other side effects caused by deferring
execution).

Closes #33514
polyfractal added a commit that referenced this issue Sep 26, 2018
Previously, unmapped aggs try to delegate reduction to a sibling agg that is
mapped.  That delegated agg will run the reductions, and also
reduce any pipeline aggs.  But because delegation comes before running
pipelines, the unmapped agg _also_ tries to run pipeline aggs.

This causes the pipeline to run twice, and potentially double it's output
in buckets which can create invalid JSON (e.g. same key multiple times)
and break when converting to maps.

This fixes by sorting the list of aggregations ahead of time so that mapped
aggs appear first, meaning they preferentially lead the reduction.  If all aggs
are unmapped, the first unmapped agg simply creates a new unmapped object
and returns that for the reduction.

This means that unmapped aggs no longer defer and there is no chance for
a secondary execution of pipelines (or other side effects caused by deferring
execution).

Closes #33514
polyfractal added a commit that referenced this issue Sep 26, 2018
Previously, unmapped aggs try to delegate reduction to a sibling agg that is
mapped.  That delegated agg will run the reductions, and also
reduce any pipeline aggs.  But because delegation comes before running
pipelines, the unmapped agg _also_ tries to run pipeline aggs.

This causes the pipeline to run twice, and potentially double it's output
in buckets which can create invalid JSON (e.g. same key multiple times)
and break when converting to maps.

This fixes by sorting the list of aggregations ahead of time so that mapped
aggs appear first, meaning they preferentially lead the reduction.  If all aggs
are unmapped, the first unmapped agg simply creates a new unmapped object
and returns that for the reduction.

This means that unmapped aggs no longer defer and there is no chance for
a secondary execution of pipelines (or other side effects caused by deferring
execution).

Closes #33514
kcm pushed a commit that referenced this issue Oct 30, 2018
Previously, unmapped aggs try to delegate reduction to a sibling agg that is
mapped.  That delegated agg will run the reductions, and also
reduce any pipeline aggs.  But because delegation comes before running
pipelines, the unmapped agg _also_ tries to run pipeline aggs.

This causes the pipeline to run twice, and potentially double it's output
in buckets which can create invalid JSON (e.g. same key multiple times)
and break when converting to maps.

This fixes by sorting the list of aggregations ahead of time so that mapped
aggs appear first, meaning they preferentially lead the reduction.  If all aggs
are unmapped, the first unmapped agg simply creates a new unmapped object
and returns that for the reduction.

This means that unmapped aggs no longer defer and there is no chance for 
a secondary execution of pipelines (or other side effects caused by deferring
execution).

Closes #33514
fixmebot bot referenced this issue in VectorXz/elasticsearch Apr 22, 2021
fixmebot bot referenced this issue in VectorXz/elasticsearch May 28, 2021
fixmebot bot referenced this issue in VectorXz/elasticsearch Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants