Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for calendar/fixed intervals #41919

Merged
merged 2 commits into from
May 10, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
284 changes: 197 additions & 87 deletions docs/reference/aggregations/bucket/datehistogram-aggregation.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,122 +10,194 @@ that here the interval can be specified using date/time expressions. Time-based
data requires special support because time-based intervals are not always a
fixed length.

==== Setting intervals

There seems to be no limit to the creativity we humans apply to setting our
clocks and calendars. We've invented leap years and leap seconds, standard and
daylight savings times, and timezone offsets of 30 or 45 minutes rather than a
full hour. While these creations help keep us in sync with the cosmos and our
environment, they can make specifying time intervals accurately a real challenge.
The only universal truth our researchers have yet to disprove is that a
millisecond is always the same duration, and a second is always 1000 milliseconds.
Beyond that, things get complicated.

Generally speaking, when you specify a single time unit, such as 1 hour or 1 day, you
are working with a _calendar interval_, but multiples, such as 6 hours or 3 days, are
_fixed-length intervals_.

For example, a specification of 1 day (1d) from now is a calendar interval that
means "at
this exact time tomorrow" no matter the length of the day. A change to or from
daylight savings time that results in a 23 or 25 hour day is compensated for and the
specification of "this exact time tomorrow" is maintained. But if you specify 2 or
more days, each day must be of the same fixed duration (24 hours). In this case, if
the specified interval includes the change to or from daylight savings time, the
interval will end an hour sooner or later than you expect.

There are similar differences to consider when you specify single versus multiple
minutes or hours. Multiple time periods longer than a day are not supported.

Here are the valid time specifications and their meanings:
==== Calendar and Fixed intervals

milliseconds (ms) ::
Fixed length interval; supports multiples.
When configuring a date histogram aggregation, the interval can be specified
in two manners: calendar-aware time intervals, and fixed time intervals.

seconds (s) ::
1000 milliseconds; fixed length interval (except for the last second of a
minute that contains a leap-second, which is 2000ms long); supports multiples.
Calendar-aware intervals understand that daylight savings changes the length
of specific days, months have different amounts of days, and leap seconds can
be tacked onto a particular year.

minutes (m) ::
Fixed intervals are, by contrast, always multiples of SI units and do not change
based on calendaring context.

[NOTE]
.Combined `interval` field is deprecated
==================================
deprecated[7.2, `interval` field is deprecated] Historically both calendar and fixed
intervals were configured in a single `interval` field, which led to confusing
semantics. Specifying `1d` would be assumed as a calendar-aware time,
whereas `2d` would be interpreted as fixed time. To get "one day" of fixed time,
the user would need to specify the next smaller unit (in this case, `24h`).

This combined behavior was often unknown to users, and even when knowledgeable about
the behavior it was difficult to use and confusing.

This behavior has been deprecated in favor of two new, explicit fields: `calendar_interval`
and `fixed_interval`.

By forcing a choice between calendar and intervals up front, the semantics of the interval
are clear to the user immediately and there is no ambiguity. The old `interval` field
will be removed in the future.
==================================

===== Calendar Intervals

Calendar-aware intervals are configured with the `calendar_interval` parameter.
Calendar intervals can only be specified in "singular" quantities of the unit
(`1d`, `1M`, etc). Multiples, such as `2d`, are not supported and will throw an exception.

The accepted units for calendar intervals are:

minute (`m`, `1m`) ::
All minutes begin at 00 seconds.

* One minute (1m) is the interval between 00 seconds of the first minute and 00
One minute is the interval between 00 seconds of the first minute and 00
seconds of the following minute in the specified timezone, compensating for any
intervening leap seconds, so that the number of minutes and seconds past the
hour is the same at the start and end.
* Multiple minutes (__n__m) are intervals of exactly 60x1000=60,000 milliseconds
each.
intervening leap seconds, so that the number of minutes and seconds past the
hour is the same at the start and end.

hours (h) ::
hours (`h`, `1h`) ::
All hours begin at 00 minutes and 00 seconds.

* One hour (1h) is the interval between 00:00 minutes of the first hour and 00:00
One hour (1h) is the interval between 00:00 minutes of the first hour and 00:00
minutes of the following hour in the specified timezone, compensating for any
intervening leap seconds, so that the number of minutes and seconds past the hour
is the same at the start and end.
* Multiple hours (__n__h) are intervals of exactly 60x60x1000=3,600,000 milliseconds
each.
is the same at the start and end.

days (d) ::

days (`d`, `1d`) ::
All days begin at the earliest possible time, which is usually 00:00:00
(midnight).

* One day (1d) is the interval between the start of the day and the start of
One day (1d) is the interval between the start of the day and the start of
of the following day in the specified timezone, compensating for any intervening
time changes.
* Multiple days (__n__d) are intervals of exactly 24x60x60x1000=86,400,000
milliseconds each.

weeks (w) ::
week (`w`, `1w`) ::

* One week (1w) is the interval between the start day_of_week:hour:minute:second
and the same day of the week and time of the following week in the specified
One week is the interval between the start day_of_week:hour:minute:second
and the same day of the week and time of the following week in the specified
timezone.
* Multiple weeks (__n__w) are not supported.

months (M) ::
month (`M`, `1M`) ::

* One month (1M) is the interval between the start day of the month and time of
One month is the interval between the start day of the month and time of
day and the same day of the month and time of the following month in the specified
timezone, so that the day of the month and time of day are the same at the start
and end.
* Multiple months (__n__M) are not supported.

quarters (q) ::
quarter (`q`, `1q`) ::

* One quarter (1q) is the interval between the start day of the month and
One quarter (1q) is the interval between the start day of the month and
time of day and the same day of the month and time of day three months later,
so that the day of the month and time of day are the same at the start and end. +
* Multiple quarters (__n__q) are not supported.

years (y) ::
year (`y`, `1y`) ::

* One year (1y) is the interval between the start day of the month and time of
day and the same day of the month and time of day the following year in the
One year (1y) is the interval between the start day of the month and time of
day and the same day of the month and time of day the following year in the
specified timezone, so that the date and time are the same at the start and end. +
* Multiple years (__n__y) are not supported.

NOTE:
In all cases, when the specified end time does not exist, the actual end time is
the closest available time after the specified end.
===== Calendar Interval Examples
As an example, here is an aggregation requesting bucket intervals of a month in calendar time:

Widely distributed applications must also consider vagaries such as countries that
start and stop daylight savings time at 12:01 A.M., so end up with one minute of
Sunday followed by an additional 59 minutes of Saturday once a year, and countries
that decide to move across the international date line. Situations like
that can make irregular timezone offsets seem easy.
[source,js]
--------------------------------------------------
POST /sales/_search?size=0
{
"aggs" : {
"sales_over_time" : {
"date_histogram" : {
"field" : "date",
"calendar_interval" : "month"
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[setup:sales]

As always, rigorous testing, especially around time-change events, will ensure
that your time interval specification is
what you intend it to be.
If you attempt to use multiples of calendar units, the aggregation will fail because only
singular calendar units are supported:

WARNING:
To avoid unexpected results, all connected servers and clients must sync to a
reliable network time service.
[source,js]
--------------------------------------------------
POST /sales/_search?size=0
{
"aggs" : {
"sales_over_time" : {
"date_histogram" : {
"field" : "date",
"calendar_interval" : "2d"
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[setup:sales]
// TEST[catch:bad_request]

==== Examples
[source,js]
--------------------------------------------------
{
"error" : {
"root_cause" : [...],
"type" : "x_content_parse_exception",
"reason" : "[1:82] [date_histogram] failed to parse field [calendar_interval]",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "The supplied interval [2d] could not be parsed as a calendar interval.",
"stack_trace" : "java.lang.IllegalArgumentException: The supplied interval [2d] could not be parsed as a calendar interval."
}
}
}

--------------------------------------------------
// NOTCONSOLE

===== Fixed Intervals

Fixed intervals are configured with the `fixed_interval` parameter.

In contrast to calendar-aware intervals, fixed intervals are a fixed number of SI
units and never deviate, regardless of where they fall on the calendar. One second
is always composed of 1000ms. This allows fixed intervals to be specified in
any multiple of the supported units.

However, it means fixed intervals cannot express other units such as months,
since the duration of a month is not a fixed quantity. Attempting to specify
a calendar interval like month or quarter will throw an exception.

The accepted units for fixed intervals are:

milliseconds (ms) ::

seconds (s) ::
Defined as 1000 milliseconds each

minutes (m) ::
All minutes begin at 00 seconds.

Requesting bucket intervals of a month.
Defined as 60 seconds each (60,000 milliseconds)

hours (h) ::
All hours begin at 00 minutes and 00 seconds.
Defined as 60 minutes each (3,600,000 milliseconds)

days (d) ::
All days begin at the earliest possible time, which is usually 00:00:00
(midnight).

Defined as 24 hours (86,400,000 milliseconds)

===== Fixed Interval Examples

If we try to recreate the "month" `calendar_interval` from earlier, we can approximate that with
30 fixed days:

[source,js]
--------------------------------------------------
Expand All @@ -135,7 +207,7 @@ POST /sales/_search?size=0
"sales_over_time" : {
"date_histogram" : {
"field" : "date",
"calendar_interval" : "month"
"fixed_interval" : "30d"
}
}
}
Expand All @@ -144,11 +216,7 @@ POST /sales/_search?size=0
// CONSOLE
// TEST[setup:sales]

You can also specify time values using abbreviations supported by
<<time-units,time units>> parsing.
Note that fractional time values are not supported, but you can address this by
shifting to another
time unit (e.g., `1.5h` could instead be specified as `90m`).
But if we try to use a calendar unit that is not supported, such as weeks, we'll get an exception:

[source,js]
--------------------------------------------------
Expand All @@ -158,14 +226,58 @@ POST /sales/_search?size=0
"sales_over_time" : {
"date_histogram" : {
"field" : "date",
"fixed_interval" : "90m"
"fixed_interval" : "2w"
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[setup:sales]
// TEST[catch:bad_request]

[source,js]
--------------------------------------------------
{
"error" : {
"root_cause" : [...],
"type" : "x_content_parse_exception",
"reason" : "[1:82] [date_histogram] failed to parse field [fixed_interval]",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "failed to parse setting [date_histogram.fixedInterval] with value [2w] as a time value: unit is missing or unrecognized",
"stack_trace" : "java.lang.IllegalArgumentException: failed to parse setting [date_histogram.fixedInterval] with value [2w] as a time value: unit is missing or unrecognized"
}
}
}

--------------------------------------------------
// NOTCONSOLE

===== Notes

In all cases, when the specified end time does not exist, the actual end time is
the closest available time after the specified end.

Widely distributed applications must also consider vagaries such as countries that
start and stop daylight savings time at 12:01 A.M., so end up with one minute of
Sunday followed by an additional 59 minutes of Saturday once a year, and countries
that decide to move across the international date line. Situations like
that can make irregular timezone offsets seem easy.

As always, rigorous testing, especially around time-change events, will ensure
that your time interval specification is
what you intend it to be.

WARNING:
To avoid unexpected results, all connected servers and clients must sync to a
reliable network time service.

NOTE: fractional time values are not supported, but you can address this by
shifting to another time unit (e.g., `1.5h` could instead be specified as `90m`).

NOTE: You can also specify time values using abbreviations supported by
<<time-units,time units>> parsing.

===== Keys

Expand Down Expand Up @@ -522,8 +634,6 @@ control the order using
the `order` setting. This setting supports the same `order` functionality as
<<search-aggregations-bucket-terms-aggregation-order,`Terms Aggregation`>>.

deprecated[6.0.0, Use `_key` instead of `_time` to order buckets by their dates/keys]

===== Using a script to aggregate by day of the week

When you need to aggregate the results by day of the week, use a script that
Expand Down