Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multi-value handling in composite agg #88638

Merged
merged 3 commits into from Jul 20, 2022

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Jul 20, 2022

Multi-valued number fields are handled differently by the composite terms agg in comparison to the regular terms agg. In particular, we use set semantics when grouping on multi-valued numeric fields using terms / multi-terms aggs (see e.g.

) while we don't do the same for composite terms aggs. The following example in particular shows some discrepancies, which are fixed in this PR:

DELETE test

PUT test/_doc/id1?refresh=true
{
  "foo" : [1, 1]
}

GET test/_search
{
  "docvalue_fields": ["foo"],
  "aggs": {
    "terms_foo": {
      "terms": {
        "field": "foo"
      },
      "aggs": {
        "stats": {
          "sum": {
            "field": "foo"
          }
        }
      }
    },
    "comp_foo": {
      "composite": {
        "sources": [
          {
            "foo": {
              "terms": {
                "field": "foo"
              }
            }
          }
        ]
      },
      "aggs": {
        "stats": {
          "sum": {
            "field": "foo"
          }
        }
      }
    }
  }
}

which yields

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "test",
        "_id": "id1",
        "_score": 1,
        "_source": {
          "foo": [
            1,
            1
          ]
        },
        "fields": {
          "foo": [
            1,
            1
          ]
        }
      }
    ]
  },
  "aggregations": {
    "comp_foo": {
      "after_key": {
        "foo": 1
      },
      "buckets": [
        {
          "key": {
            "foo": 1
          },
          "doc_count": 1,
          "stats": {
            "value": 4
          }
        }
      ]
    },
    "terms_foo": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1,
          "doc_count": 1,
          "stats": {
            "value": 2
          }
        }
      ]
    }
  }
}

Relates #88598

@elasticsearchmachine
Copy link
Collaborator

Hi @ywelsch, I've created a changelog YAML for you.

for (int i = 0; i < num; i++) {
currentValue = dvs.nextValue();
missingCurrentValue = false;
next.collect(doc, bucket);
if (i == 0 || previous != currentValue) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumericTermsAggregator implements the same "set" semantics here.

@ywelsch ywelsch marked this pull request as ready for review July 20, 2022 10:31
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 20, 2022
@ywelsch
Copy link
Contributor Author

ywelsch commented Jul 20, 2022

@elasticsearchmachine run elasticsearch-ci/part-1
@elasticsearchmachine run elasticsearch-ci/docs-check

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Aggs shouldn't collect twice for two values that fall into the same bucket.

@ywelsch ywelsch merged commit 6bcb50d into elastic:master Jul 20, 2022
@ywelsch
Copy link
Contributor Author

ywelsch commented Jul 20, 2022

Thanks @nik9000!!!!

gmarouli pushed a commit to gmarouli/elasticsearch that referenced this pull request Jul 21, 2022
Multi-valued number fields are handled differently by the composite terms agg in comparison to the regular terms agg.
In particular, we use set semantics when grouping on multi-valued numeric fields using terms / multi-terms aggs while
we don't do the same for composite terms aggs. This commit fixes composite aggs to use the same set semantics as
well.
arteam pushed a commit to arteam/elasticsearch that referenced this pull request Jul 22, 2022
Multi-valued number fields are handled differently by the composite terms agg in comparison to the regular terms agg.
In particular, we use set semantics when grouping on multi-valued numeric fields using terms / multi-terms aggs while
we don't do the same for composite terms aggs. This commit fixes composite aggs to use the same set semantics as
well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants