Skip to content

Date histogram aggregation unexpected result with calendar_interval and offset >= 30 days #93180

@xiaofeiXM

Description

@xiaofeiXM

Elasticsearch Version

7.17

Installed Plugins

No response

Java Version

11

OS Version

x86_64

Problem Description

Based on this doc, elasticsearch supports Calendar-aware interval on data histogram aggregation. Specially for a "quarter" interval:

"One quarter is the interval between the start day of the month and time of day and the same day of the month and time of day three months later, so that the day of the month and time of day are the same at the start and end."

My understanding of this is every bucket in the response should start at the same day, but that's not the case when I use offset >= 30 days. Some buckets start at, say 5th of the month, while one other bucket starts at the 6th of the month.

Steps to Reproduce

PUT /test01 
{
  "mappings": {
    "properties": {
      "date": { "type": "date" }
    }
  }
}

POST /test01/_doc { "date": 1642658400000 } // 01/20/2022, 6AM UTC
POST /test01/_doc { "date": 1645336800000 } // 02/20/2022, 6AM UTC
...
POST /test01/_doc { "date": 1660975200000 } // 08/20/2022, 6AM UTC
// basically one document every month from January to August

Queries:

{
    "size": 0,
    "aggs": {
        "ShiftedQuarter": {
            "date_histogram": {
                "calendar_interval": "quarter",
                "min_doc_count": 1,
                "time_zone": "UTC",
                "offset": "+20d", // offset is less than 30 days
                "field": "date"
            }
        }
    }
}

returns below as expected (every bucket starts at the 21st day of the month):

"aggregations": {
    "ShiftedQuarter": {
      "buckets": [
        {
          "key_as_string": "2021-10-21T00:00:00.000Z",
          "key": 1634774400000,
          "doc_count": 1
        },
        {
          "key_as_string": "2022-01-21T00:00:00.000Z",
          "key": 1642723200000,
          "doc_count": 3
        },
        {
          "key_as_string": "2022-04-21T00:00:00.000Z",
          "key": 1650499200000,
          "doc_count": 3
        },
        {
          "key_as_string": "2022-07-21T00:00:00.000Z",
          "key": 1658361600000,
          "doc_count": 1
        }
      ]
    }
  }

However if the query use a offset that is large than 30, like

{
    "size": 0,
    "aggs": {
        "ShiftedQuarter": {
            "date_histogram": {
                "calendar_interval": "quarter",
                "min_doc_count": 1,
                "time_zone": "UTC",
                "offset": "+35d",
                "field": "date"
            }
        }
    }
}

then some bucket starts at the 5th but some starts at the 6th:

"aggregations": {
    "ShiftedQuarter": {
      "buckets": [
        {
          "key_as_string": "2021-11-05T00:00:00.000Z",
          "key": 1636070400000,
          "doc_count": 1
        },
        {
          "key_as_string": "2022-02-05T00:00:00.000Z",
          "key": 1644019200000,
          "doc_count": 3
        },
        {
          "key_as_string": "2022-05-06T00:00:00.000Z", // 06 as opposed to 05 for other buckets
          "key": 1651795200000,
          "doc_count": 3
        },
        {
          "key_as_string": "2022-08-05T00:00:00.000Z",
          "key": 1659657600000,
          "doc_count": 1
        }
      ]
    }
  }

Logs (if relevant)

No response

Metadata

Metadata

Assignees

Labels

:Analytics/AggregationsAggregations>docsGeneral docs changesTeam:AnalyticsMeta label for analytical engine team (ESQL/Aggs/Geo)Team:DocsMeta label for docs team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions