Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampler aggregation returns high filter doc_count #77575

Open
qn895 opened this issue Sep 10, 2021 · 2 comments
Open

Sampler aggregation returns high filter doc_count #77575

qn895 opened this issue Sep 10, 2021 · 2 comments
Labels
:Analytics/Aggregations Aggregations >bug feedback_needed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@qn895
Copy link
Member

qn895 commented Sep 10, 2021

Elasticsearch version (bin/elasticsearch --version):
Version: 8.0.0-SNAPSHOT, Build: default/tar/f89eda5f9d89fa6b197dd00cb1dd700b78880887/2021-08-31T14:32:38.983979153Z

JVM version (java -version):
JVM: 16.0.2

OS version (uname -a if on a Unix-like system):
MacOS 11.5.2

Description of the problem including expected versus actual behavior:

Running the filter agg in the sample aggregation is returning high doc count.

Steps to reproduce:

Example query:

GET traces-apm*,apm-*,logs-apm*,apm-*,metrics-apm*,apm-*/_search
{
  "size": 0,
  "track_total_hits": false,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": {
              "gte": 1612192789884,
              "lte": 1625273552141,
              "format": "epoch_millis"
            }
          }
        },
        {
          "match_all": {}
        }
      ]
    }
  },
  "aggs": {
    "sample": {
      "sampler": {
        "shard_size": 100
      },
      "aggs": {
        "field_0_count": {
          "filter": {
            "exists": {
              "field": "@timestamp"
            }
          }
        },
        "field_0_cardinality": {
          "cardinality": {
            "field": "@timestamp"
          }
        },
        "agent.build.original_count": {
          "filter": {
            "exists": {
              "field": "agent.build.original"
            }
          }
        },
        "agent.name_count": {
          "filter": {
            "exists": {
              "field": "agent.name"
            }
          }
        },
        "code_signature.status_cardinality": {
          "cardinality": {
            "field": "code_signature.status"
          }
        },
        "code_signature.subject_name_count": {
          "filter": {
            "exists": {
              "field": "code_signature.subject_name"
            }
          }
        }
      }
    }
  }
}

Current response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 11,
    "successful" : 11,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "sample" : {
      "meta" : { },
      "doc_count" : 700,
      "field_0_cardinality" : {
        "value" : 470
      },
      "agent.name_count" : {
        "meta" : { },
        "doc_count" : 86687
      },
      "code_signature.subject_name_count" : {
        "meta" : { },
        "doc_count" : 0
      },
      "code_signature.status_cardinality" : {
        "value" : 0
      },
      "field_0_count" : {
        "doc_count" : 86787
      },
      "agent.build.original_count" : {
        "meta" : { },
        "doc_count" : 0
      }
    }
  }
}
@qn895 qn895 added >bug needs:triage Requires assignment of a team area label labels Sep 10, 2021
@iverase iverase added the :Analytics/Aggregations Aggregations label Sep 13, 2021
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 13, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@iverase iverase removed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) needs:triage Requires assignment of a team area label labels Sep 13, 2021
@not-napoleon not-napoleon added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 3, 2022
@martijnvg
Copy link
Member

@qn895 Sorry, for the late reply here. Are you able to share a full reproduction of this reported issue here? Without that, it is hard to debug why there is a large difference in doc counts here. Maybe this isn't really a bug, but excepted behaviour. For example in case the fields are pre-aggregated or doc count fields are used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug feedback_needed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

6 participants