-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Histogram Aggregation accepts Dates and Date Intervals #23193
Comments
Hi, I've had a look into this issue and it seems like the interval is being parsed as a numerical value - in this case, I'm new to the elasticsearch project, so I'm not sure if it is intended behavior for the |
+1 We encountered a similar incident where it caused ES to OOM. "histogram" : { |
Since #27581 elasticsearch should now have the necessary safeguards to prevent an OOM situation. Generally speaking we strive for consistency in our APIs and "1d" is a valid number but having said that it is also a very common choice of interval for data histograms. The consequences of getting the wrong choice of histogram should not now be catastrophic (no OOMs) but the question remains how many users would know that the next step would be to pick a date_histogram rather than a regular histogram. cc @elastic/es-search-aggs |
I had a closer look at this, to see if it still affects master. Answer is "Yes", but the reason for it is actually subtle. The {
"size": 0,
"aggs": {
"my_histogram": {
"histogram": {
"field": "date",
"interval": "1m"
}
}
}
}
{
"error" : {
"root_cause" : [
{
"type" : "x_content_parse_exception",
"reason" : "[7:21] [histogram] failed to parse field [interval]"
}
],
"type" : "x_content_parse_exception",
"reason" : "[7:21] [histogram] failed to parse field [interval]",
"caused_by" : {
"type" : "number_format_exception",
"reason" : "For input string: \"1m\""
}
},
"status" : 400
}
The reason So I'm honestly not sure how we would fix this in a sane manner... and if it's even a thing we want to "fix" (since it'd break anyone who is using those legitimate conventions). I'll mark this as team-discuss... but I'm thinking we might just want to document the behavior and close. |
We discussed this today in our team meeting:
|
Can you add a message to the log about a large number of buckets? Some ES clients allow for this kind of error with aggregations. Finding an error in a large cluster is very difficult and time-consuming. A log message could simplify diagnosis of this a problem. This bug led to the failure of the entire cluster. :( |
We discussed this again on 2022-01-12. This behavior is confusing, but it's also essentially intended. There are many ways that aggregations can generate too many buckets and run into trouble, and we do our best to find and fix those (e.g. by adding more things to our memory circuit breakers). We aren't going to change the parsing behavior and we aren't going to drop support for running numeric histograms on date fields, so I don't see anything actionable on this issue. Closing based on that assessment. |
While working with some example data set, I witnessed a user accidentally fire off a
histogram
aggregation against adate
field with a date interval:A few things of interest:
interval
is invalid with respect to thehistogram
agg.The text was updated successfully, but these errors were encountered: