Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior/documentation with Date Agg Intervals #50091

Open
niemyjski opened this issue Dec 11, 2019 · 3 comments
Open

Inconsistent behavior/documentation with Date Agg Intervals #50091

niemyjski opened this issue Dec 11, 2019 · 3 comments
Labels
:Analytics/Aggregations Aggregations >bug >docs General docs changes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@niemyjski
Copy link
Contributor

Elasticsearch version (bin/elasticsearch --version): 7.5

Plugins installed: ["mapper-size"]

JVM version (java -version): docker-image

OS version (uname -a if on a Unix-like system): docker-image

Description of the problem including expected versus actual behavior:

We noticed that aggregations don't behave how we expected them to as per the docs. Please see this issue we logged on nest for more context: elastic/elasticsearch-net#4251

Steps to reproduce:

POST /test1/_doc/
{
  "date": "2019-10-17T07:01:26.3488489+00:00"
}

GET /test1/_search 
{
  "aggs": {
    "date_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      }
    },
    "date_m": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "M"
      }
    },
    "date_onem": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "1M"
      }
    }
  }, 
  "size": 0
}

The docs state that 1M, month and M are valid calendar intervals. However not all of them work correctly when running in ES. This translates to pretty much all the values listed in the documentation (on two different pages, one talks about nanos and micro seconds but the main documentation doesn't)

Also of note, it appears when you use kibana it creates a date_histogram via a snippet using interval (legacy).

Provide logs (if relevant):

I doing a date_histogram aggregation with calendar_interval values of:

  • 1M
    • Works directly in ES
    • Works if I specify the string 1M to DateHistogramAggregationDescriptor<T>.CalendarInterval
  • month
    • Works directly in ES
    • Fails to parse if I specify the string month to DateHistogramAggregationDescriptor<T>.CalendarInterval with Expression 'month' is invalid
  • M
    • Fails directly in ES with The supplied interval [M] could not be parsed as a calendar interval., but the documentation page says that it is a valid value.
    • Fails to parse if I specify the string M to DateHistogramAggregationDescriptor<T>.CalendarInterval with Expression 'M' is invalid

Originally posted by @ejsmith in elastic/elasticsearch-net#4251 (comment)


But the docs at least online don't say anything about time units smaller than 1m

The Date Histogram docs have fixed intervals to milliseconds. The time values docs have values down to nanos.

but I could even see us supporting 1ms. What do you recommend we use in the mean time (We went through QA and are about to release but are blocked as we have saved agg queries that customers need to run, from seconds to year time buckets).

I'm not sure what the smallest value supported for buckets is, but looking at DateHistogramInterval, it looks like it may be 1s, which can also be supplied as 1000ms input:

public class DateHistogramInterval implements Writeable, ToXContentFragment {
public static final DateHistogramInterval SECOND = new DateHistogramInterval("1s");
public static final DateHistogramInterval MINUTE = new DateHistogramInterval("1m");
public static final DateHistogramInterval HOUR = new DateHistogramInterval("1h");
public static final DateHistogramInterval DAY = new DateHistogramInterval("1d");
public static final DateHistogramInterval WEEK = new DateHistogramInterval("1w");
public static final DateHistogramInterval MONTH = new DateHistogramInterval("1M");
public static final DateHistogramInterval QUARTER = new DateHistogramInterval("1q");
public static final DateHistogramInterval YEAR = new DateHistogramInterval("1y");
public static DateHistogramInterval seconds(int sec) {
return new DateHistogramInterval(sec + "s");
}
public static DateHistogramInterval minutes(int min) {
return new DateHistogramInterval(min + "m");
}
public static DateHistogramInterval hours(int hours) {
return new DateHistogramInterval(hours + "h");
}
public static DateHistogramInterval days(int days) {
return new DateHistogramInterval(days + "d");
}
public static DateHistogramInterval weeks(int weeks) {
return new DateHistogramInterval(weeks + "w");
}

  • M
    • Fails directly in ES with The supplied interval [M] could not be parsed as a calendar interval., but the documentation page says that it is a valid value.
    • Fails to parse if I specify the string M to DateHistogramAggregationDescriptor<T>.CalendarInterval with Expression 'M' is invalid

There's something not right; either the documentation should be clearer, or the implementation should also support the single character time units, which it looks like it doesn't

static {
Map<String, Rounding.DateTimeUnit> dateFieldUnits = new HashMap<>();
dateFieldUnits.put("year", Rounding.DateTimeUnit.YEAR_OF_CENTURY);
dateFieldUnits.put("1y", Rounding.DateTimeUnit.YEAR_OF_CENTURY);
dateFieldUnits.put("quarter", Rounding.DateTimeUnit.QUARTER_OF_YEAR);
dateFieldUnits.put("1q", Rounding.DateTimeUnit.QUARTER_OF_YEAR);
dateFieldUnits.put("month", Rounding.DateTimeUnit.MONTH_OF_YEAR);
dateFieldUnits.put("1M", Rounding.DateTimeUnit.MONTH_OF_YEAR);
dateFieldUnits.put("week", Rounding.DateTimeUnit.WEEK_OF_WEEKYEAR);
dateFieldUnits.put("1w", Rounding.DateTimeUnit.WEEK_OF_WEEKYEAR);
dateFieldUnits.put("day", Rounding.DateTimeUnit.DAY_OF_MONTH);
dateFieldUnits.put("1d", Rounding.DateTimeUnit.DAY_OF_MONTH);
dateFieldUnits.put("hour", Rounding.DateTimeUnit.HOUR_OF_DAY);
dateFieldUnits.put("1h", Rounding.DateTimeUnit.HOUR_OF_DAY);
dateFieldUnits.put("minute", Rounding.DateTimeUnit.MINUTES_OF_HOUR);
dateFieldUnits.put("1m", Rounding.DateTimeUnit.MINUTES_OF_HOUR);
dateFieldUnits.put("second", Rounding.DateTimeUnit.SECOND_OF_MINUTE);
dateFieldUnits.put("1s", Rounding.DateTimeUnit.SECOND_OF_MINUTE);
DATE_FIELD_UNITS = unmodifiableMap(dateFieldUnits);
}

Originally posted by @russcam in elastic/elasticsearch-net#4251 (comment)

@dnhatn dnhatn added the :Analytics/Aggregations Aggregations label Dec 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@polyfractal
Copy link
Contributor

Heya, thanks for the detailed ticket... and apologies for the confusion/troubles :( I'll start with the easy ones first:

Kibana

Also of note, it appears when you use kibana it creates a date_histogram via a snippet using interval (legacy).

Correct, although hopefully only temporarily. Kibana, like many user applications, is still migrating off the old interval. Since date histograms are so widely used, we expect to have a long deprecation period on interval.

Nanoseconds

one talks about nanos and micro seconds

Nanoseconds are in a tricky situation right now. They can be indexed and searched (at cost of reduced dynamic range), but aggregations currently only work in millisecond range regardless of the underlying data type. This is noted in the date_nano docs at the bottom. That's why nano support isn't mentioned on the date histogram page.

DateHistogramInterval helpers

I'm not sure what the smallest value supported for buckets is, but looking at DateHistogramInterval, it looks like it may be 1s, which can also be supplied as 1000ms input:

These are technically just "helper" methods, and were more useful with the old interval. Today they just append the correct suffix to the provided numeric value. It does look like we're missing the "milliseconds" helper, but you can manually specify it with:

new DateHistogramInterval("5ms")

Intervals

1M

This should work for calendar_interval and interval, but not fixed_interval

month

This should work for calendar_interval and interval, but not fixed_interval

M

This should not work, and I don't think it's ever been supported. E.g. neither of the calendar-aware parsers in 6.0 or 2.0 support just the unit by itself, and fixed intervals don't support calendar concepts like months. I just tested on a 6.0 build and it does indeed throw an exception:

failed to parse setting [DateHistogramAggregationBuilder.interval] with value [M] as a time value: unit is missing or unrecognized

I tested the fixed units too and they also don't work, although from a parsing issue and an ugly exception:

"reason": "failed to parse [s]",
  "caused_by": {
    "type": "number_format_exception",
    "reason": "For input string: \"\""
}

Conclusions

So in conclusion it seems we (ES) has two action items:

  • docs are just incorrect/misleading about using time units in isolation. These are not permitted and we should modify the docs so that it is clearer
  • We should add a milliseconds() helper method to match the rest of the DateHistogramInterval helpers

I hope this helped clear up the situation!

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@wchaparro wchaparro added the >docs General docs changes label Oct 25, 2021
@elasticmachine elasticmachine added the Team:Docs Meta label for docs team label Oct 25, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@wchaparro wchaparro added the >bug label Oct 25, 2021
@lockewritesdocs lockewritesdocs self-assigned this Oct 27, 2021
@debadair debadair removed the Team:Docs Meta label for docs team label Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug >docs General docs changes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

8 participants