Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram Aggregation accepts Dates and Date Intervals #23193

Closed
pickypg opened this issue Feb 15, 2017 · 7 comments
Closed

Histogram Aggregation accepts Dates and Date Intervals #23193

pickypg opened this issue Feb 15, 2017 · 7 comments
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@pickypg
Copy link
Member

pickypg commented Feb 15, 2017

While working with some example data set, I witnessed a user accidentally fire off a histogram aggregation against a date field with a date interval:

PUT /test/type/1
{
  "date": "2017-02-15T17:39:00.000Z"
}

PUT /test/type/2
{
  "date": "2017-02-15T17:40:00.000Z"
}

GET /test/_search
{
  "size": 0,
  "aggs": {
    "my_histogram": {
      "histogram": {
        "field": "date",
        "interval": "1d"
      }
    }
  }
}

A few things of interest:

  1. I would expect this to block the request because the interval is invalid with respect to the histogram agg.
  2. The interval is ignored, and the value is defaulted to 1. This results in the above example creating around 60000 buckets because the dates are 1 minute apart (given that they're really numbers in the background).
    • For the above example, the "took" time is 8ms, but the actual time to receive it is disturbingly long because of the number of buckets.
@ChineseElectricPanda
Copy link

Hi, I've had a look into this issue and it seems like the interval is being parsed as a numerical value - in this case, 1d is being parsed as (double) 1. The same happens if 1f is entered instead. Trying the request with an interval value of 1000d results in a bucket size of 1000 (1 second), so I don't think it's defaulting to 1, rather just parsing 1d as 1. Putting a differently formatted time interval (like 24h) returns a parsing exception as expected.

I'm new to the elasticsearch project, so I'm not sure if it is intended behavior for the histogram agg to parse interval strings with suffix d or f as numbers. The parser could be changed to reject any interval value with letters in it, but that would probably break some other use cases.

@ppf2
Copy link
Member

ppf2 commented Nov 13, 2017

+1 We encountered a similar incident where it caused ES to OOM.

          "histogram" : {
            "field" : "crawl_date",
            "interval" : 1
          }

@markharwood
Copy link
Contributor

Since #27581 elasticsearch should now have the necessary safeguards to prevent an OOM situation.

Generally speaking we strive for consistency in our APIs and "1d" is a valid number but having said that it is also a very common choice of interval for data histograms. The consequences of getting the wrong choice of histogram should not now be catastrophic (no OOMs) but the question remains how many users would know that the next step would be to pick a date_histogram rather than a regular histogram.
Maybe the correct answer here is to reject the search if the target field is a date field and the chosen interval uses date-like expressions. Or maybe just completely reject histogram aggs on date fields?

cc @elastic/es-search-aggs

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@polyfractal
Copy link
Contributor

I had a closer look at this, to see if it still affects master. Answer is "Yes", but the reason for it is actually subtle.

The histogram field doesn't support date intervals as it turns out. For example, this fails:

{
  "size": 0,
  "aggs": {
    "my_histogram": {
      "histogram": {
        "field": "date",
        "interval": "1m"
      }
    }
  }
}

{
  "error" : {
    "root_cause" : [
      {
        "type" : "x_content_parse_exception",
        "reason" : "[7:21] [histogram] failed to parse field [interval]"
      }
    ],
    "type" : "x_content_parse_exception",
    "reason" : "[7:21] [histogram] failed to parse field [interval]",
    "caused_by" : {
      "type" : "number_format_exception",
      "reason" : "For input string: \"1m\""
    }
  },
  "status" : 400
}

The reason 1d works is because we are using Java's string-to-decimal conversion tools, and those tools accept various numeric conventions. E.g. x is used in hex notation (0x00), e or E represents scientific notation, etc. Part of those conventions also say that d/D == "double", and f/F == "float"

So "1d" is a valid numerical interval because that says "1.0 as double".

I'm honestly not sure how we would fix this in a sane manner... and if it's even a thing we want to "fix" (since it'd break anyone who is using those legitimate conventions). I'll mark this as team-discuss... but I'm thinking we might just want to document the behavior and close.

@polyfractal
Copy link
Contributor

We discussed this today in our team meeting:

  • We don't think we can "fix" this behavior, since it is also valid elsewhere (e.g. you can specify 1d when indexing a double field, and it is interpreted as 1(double) ).
  • We decided to document the behavior with a big warning so that users are aware
  • We considered deprecating/removing the ability to run histogram on date fields, which would fix the issue in a different way. But we decided there are more egregious combinations that need fixing first (date_histo on boolean fields), and people not-uncommonly interchange dates and longs-as-millis, so this could be a potentially controversial deprecation. It's not a huge maintenance burden so probably not worth deprecating at this time.

@bzick
Copy link

bzick commented Feb 4, 2021

Can you add a message to the log about a large number of buckets?

Some ES clients allow for this kind of error with aggregations. Finding an error in a large cluster is very difficult and time-consuming. A log message could simplify diagnosis of this a problem.

This bug led to the failure of the entire cluster. :(

@not-napoleon
Copy link
Member

We discussed this again on 2022-01-12. This behavior is confusing, but it's also essentially intended. There are many ways that aggregations can generate too many buckets and run into trouble, and we do our best to find and fix those (e.g. by adding more things to our memory circuit breakers). We aren't going to change the parsing behavior and we aren't going to drop support for running numeric histograms on date fields, so I don't see anything actionable on this issue. Closing based on that assessment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

9 participants