incorrect pagination with date_histogram and format in composite aggregation #68963

zhnpeng · 2021-02-14T16:51:08Z

ES Version

{
  "name" : "baize-server-d7d300ed",
  "cluster_name" : "",
  "cluster_uuid" : "J8pd0v6-SLy_sYY75rVpAQ",
  "version" : {
    "number" : "7.11.0",
    "build_flavor" : "default",
    "build_type" : "",
    "build_hash" : "8ced7813d6f16d2ef30792e2fcde3e755795ee04",
    "build_date" : "2021-02-08T22:44:01.320463Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Doing a composite aggregation on date_histogram source with format:epoch_second param seems to exhibit some inconsistent behavior:

it can return excess values and "get stuck" on the search_after key.

My testing mapping is:

{
  "mappings": {
    "properties": {
      "@timestamp": {
        "format": "yyyy-MM-dd hh:mm:ss",
        "type": "date"
      },
      "app": {
        "type": "keyword"
      },
      "count": {
        "type": "long"
      }
    }
  }
}

And data (and use POST _bulk):

{ "create" : { "_index" : "test_comp_aggs", "_id" : 1 } }
{ "@timestamp" : "2021-02-14 10:00:00", "app" :  "tiktok",  "count": 1 }
{ "create" : { "_index" : "test_comp_aggs", "_id" : 2 } }
{ "@timestamp" : "2021-02-14 10:00:00", "app" :  "wechat",  "count": 1 }
{ "create" : { "_index" : "test_comp_aggs", "_id" : 3 } }
{ "@timestamp" : "2021-02-14 10:00:00", "app" :  "facebook",  "count": 1 }
{ "create" : { "_index" : "test_comp_aggs", "_id" : 4 } }
{ "@timestamp" : "2021-02-14 10:00:00", "app" :  "wechat",  "count": 1 }
{ "create" : { "_index" : "test_comp_aggs", "_id" : 5 } }
{ "@timestamp" : "2021-02-14 10:01:00", "app" :  "wechat",  "count": 2 }
{ "create" : { "_index" : "test_comp_aggs", "_id" : 6 } }
{ "@timestamp" : "2021-02-14 10:01:00", "app" :  "facebook",  "count": 2 }
{ "create" : { "_index" : "test_comp_aggs", "_id" : 7 } }
{ "@timestamp" : "2021-02-14 10:01:00", "app" :  "tiktok",  "count": 2 }

Running:

composite aggregation with a large enough size (10)，and format: epoch_second

{
  "size": 0,
  "aggs": {
    "results": {
      "composite": {
        "size": 10,
        "sources": [
            {
              "app": {
                "terms": {
                  "field": "app", 
                  "missing_bucket": true
                }
              }
            }, 
            {
              "ts": {
                "date_histogram": {
                  "field": "@timestamp", 
                  "fixed_interval": "1m", 
                  "time_zone": "Asia/Hong_Kong",
                  "format": "epoch_second"
                }
              }
            }
        ]
      }
    }
  }
}

will return the the correct all (3 in this case) buckets and the latter as the search after key:

      "after_key" : {
        "app" : "wechat",
        "ts" : "1613260800"
      },

then we use after_key from previous result to search again

  "size": 0,
  "aggs": {
    "results": {
      "composite": {
        "after" : {
          "app" : "wechat",
          "ts" : "1613260800"
        },
        "size": 10,
        "sources": [
            {
              "app": {
                "terms": {
                  "field": "app", 
                  "missing_bucket": true
                }
              }
            }, 
            {
              "ts": {
                "date_histogram": {
                  "field": "@timestamp", 
                  "fixed_interval": "1m", 
                  "time_zone": "Asia/Hong_Kong",
                  "format": "epoch_second"
                }
              }
            }
        ]
      }
    }
  }
}

will return excess values, and the after_key go stuck:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "results" : {
      "after_key" : {
        "app" : "wechat",
        "ts" : "1613260800"
      },
      "buckets" : [
        {
          "key" : {
            "app" : "wechat",
            "ts" : "1613260800"
          },
          "doc_count" : 3
        }
      ]
    }
  }
}

The text was updated successfully, but these errors were encountered:

zhnpeng · 2021-02-14T16:57:47Z

similar to #65685

elasticmachine · 2021-02-18T21:12:40Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

not-napoleon · 2021-05-25T17:14:25Z

I did some testing with this, and I think the problem is that you're using a time zone with epoch_seconds. It's probably a bug that we even allow you to specify that, since epoch seconds are defined in terms of UTC. But if you remove the time zone in the second query, you'll correctly get no further results.

zhnpeng · 2021-05-28T16:52:26Z

Well Thanks, and what should I do, If I want to format my date into seconds with interval "1d" and timezone "Asia/HongKong" instead of UTC？
for example:

    "ts": {
      "date_histogram": {
        "field": "@timestamp", 
        "fixed_interval": "1d", 
        "time_zone": "Asia/Hong_Kong",
        "format": "epoch_second"
      }
    }

not-napoleon · 2021-06-02T14:57:58Z

So I'm looking into fixing the bug with epoch_seconds and timezones. The good news is that it's "just" a format issue, which means the data should be correct and you can get it out in a different format. Unfortunately, I don't think you can get exactly what you want right now, but hopefully we can get you close enough for now.

My best suggestion for a work around is to use a different format, and convert to epoch seconds on the client side. If you use something like iso8601, you should still have enough information to convert to epoch seconds. I know that's not ideal, but it's the best I can think of until I can get a fix in for the format issue.

zhnpeng · 2021-06-05T15:30:26Z

Yes, your suggestion is what we are doing for a work around, convert iso8601 format date to epoch seconds in client side.

zhnpeng added >bug needs:triage Requires assignment of a team area label labels Feb 14, 2021

jimczi added :Analytics/Aggregations Aggregations and removed needs:triage Requires assignment of a team area label labels Feb 18, 2021

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Feb 18, 2021

MartinBurian mentioned this issue Apr 30, 2021

Composite aggregation iterates forever with date_histogram format #72556

Closed

not-napoleon self-assigned this May 25, 2021

not-napoleon mentioned this issue Jun 9, 2021

Fix bug when formatting epoch dates #73955

Merged

not-napoleon closed this as completed in #73955 Jun 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect pagination with date_histogram and format in composite aggregation #68963

incorrect pagination with date_histogram and format in composite aggregation #68963

zhnpeng commented Feb 14, 2021 •

edited

zhnpeng commented Feb 14, 2021

elasticmachine commented Feb 18, 2021

not-napoleon commented May 25, 2021

zhnpeng commented May 28, 2021 •

edited

not-napoleon commented Jun 2, 2021

zhnpeng commented Jun 5, 2021

incorrect pagination with date_histogram and format in composite aggregation #68963

incorrect pagination with date_histogram and format in composite aggregation #68963

Comments

zhnpeng commented Feb 14, 2021 • edited

zhnpeng commented Feb 14, 2021

elasticmachine commented Feb 18, 2021

not-napoleon commented May 25, 2021

zhnpeng commented May 28, 2021 • edited

not-napoleon commented Jun 2, 2021

zhnpeng commented Jun 5, 2021

zhnpeng commented Feb 14, 2021 •

edited

zhnpeng commented May 28, 2021 •

edited