Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate search.max_buckets? #51731

Closed
jpountz opened this issue Jan 31, 2020 · 12 comments · Fixed by #57042
Closed

Deprecate search.max_buckets? #51731

jpountz opened this issue Jan 31, 2020 · 12 comments · Fixed by #57042
Assignees
Labels
:Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@jpountz
Copy link
Contributor

jpountz commented Jan 31, 2020

PR #46751 introduces circuit breaking for the reduce phase and raises the question of whether we can get rid of search.max_buckets, which served a similar purpose by enforcing a limit on the number of buckets rather than their memory usage.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@jimczi
Copy link
Contributor

jimczi commented Jan 31, 2020

I agree that #46751 should change the perception of the search.max_buckets setting but I wonder if we should rather increase the default limit or apply this setting only during the final reduction. IMO this setting is a good protection for users because it forces to think of solutions that don't require to return thousands of buckets even if the size doesn't exceed the memory. This is similar in spirit to the max window size we have for top hits that redirects users to scrolls or search_after if they want to deeply paginate.

@polyfractal
Copy link
Contributor

Been thinking about this a bit. I agree that the setting is still helpful from a "soft limit" standpoint, in that it helps prevent abusive aggs even if they would technically execute without breaking anything. It helps prevent users from adding size: MAX_INT to all their terms aggs by default because they just want all the results, and instead reason about the best way to accomplish the goal (increase the limit? Different agg structure? Use composite? etc)

It also offers a convenient place to actually limit agg response size, so that clients aren't expected to handle unreasonably large responses.

Now that the memory-safety implications are max_buckets are diminished, I wonder if we need to change the semantics a bit though. There are a few problem areas with how it works today:

  • Even if the soft-limit is increased, it can be confusing to users when the setting trips despite the final bucket count being under the limit (Deprecate search.max_buckets? #51731). If a user sets a terms agg to size: 1000 they may be confused when the threshold trips because 20 shards report back 1000*shard_size terms each and we abort during an incremental reduction
  • Some aggs like rare_terms generate more buckets on the shard and prune those during reductions, so the final bucket count might be very small but fail due to the soft-limit. I imagine future clustering aggs will be similar.
  • Some operations implicitly generate many, many buckets. Geospatial bucketing (tile grid, etc), x/y scatter charts with histograms, 3d heatmaps, etc necessarily needs a lot of buckets, and even a high soft-limit could be problematic. We could potentially solve these with specialized aggs (scatter_chart) that operate more like metrics than buckets, but the issue is still generally a problem.

I'm not sure what to do about these problem cases, or if we should do anything. I guess I see a few options:

  1. Deprecate search.max_buckets. Problem solved :) Although we do lose a potentially useful soft limit
  2. Raise max_buckets to a large threshold, makes many of the common issues disappear. Anyone hitting the previously mentioned corner cases still have to work around them or increase limit
  3. Allow max_buckets to be configured on a per-request basis, so a user can opt out of the limit for something like rare_terms or geospatial bucketing. We didn't want this before because the setting was tied to memory implications, but perhaps we can loosen that restriction now?
  4. Only count buckets at the end of final reduction and rely on the recent breaker changes to catch memory issues. This would help with cases like terms and rare_terms avoid false-positives, but would not help the geospatial case.

Or some combination of the above :)

@polyfractal
Copy link
Contributor

Ruminated on this a bit more, leaning towards the dual approach of:

  1. Increase the limit to something larger, 50,000?
  2. Only count buckets during final reduction, after all buckets have been merged. This means it is solely used to limit response size, and avoids all the confusing side-effects (like tripping because terms might have k buckets that must be merged before you can get to the requested n buckets)

@jpountz
Copy link
Contributor Author

jpountz commented Feb 21, 2020

This sounds like a plan to me.

@jasontedor
Copy link
Member

It seems that we have a plan here, is it okay to remove the discuss label or do we think that broader input would be helpful?

@jpountz jpountz removed the discuss label Feb 22, 2020
@jpountz
Copy link
Contributor Author

jpountz commented Feb 22, 2020

I think it is, I just removed it.

@nickpeihl
Copy link
Member

If we do set a new limit for search.max_buckets, I think it would be useful for the Elastic Maps application to set it to at least 65,536. We have a POC in Kibana for constructing vector tiles from documents and geo_tile grid aggregations. The size of the vector tiles we generate is a multiple of 64 pixels by 64 pixels and usually 256x256. A sufficiently large geo_tile grid precision could generate a bucket for each pixel in the 256x256 tile. So if we are setting search.max_buckets to an arbitrary number, perhaps we can consider at least 65,536?

@thomasneirynck

@thomasneirynck
Copy link
Contributor

+1 on @nickpeihl's suggestion. If search.max_buckets is an arbitrary limit, a limit which is a square of a 2 ^ n makes a lot of sense in the context of mapping and tiling (especially wrt the use of geotile_grid.

@polyfractal
Copy link
Contributor

👍 seems reasonable to me.

Related note, I'll bring this up in our team meeting because we probably want to talk through if we can increase the limit in a 7.x minor, or if that would count as a "break". E.g. if a client is using that to limit responses sizes, it could potentially break clients by now receiving a very large response, which they weren't expecting.

If we decide it's a breaking change, a potential plan could be:

  1. Increase the limit to a large power of 2 in 8.0, leave at 10k for 7.x
  2. Only count buckets during final reduction
  3. Add a per-request setting that allows clients to override in 7.x.

@thomasneirynck
Copy link
Contributor

wrt (3)

Being able to override the limit on a per-request setting would be useful for Maps in the 7.x time-frame.

There are two ongoing efforts on the Maps-side in 7.x that really could use this:

cc @alexfrancoeur @nreese @nickpeihl

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@polyfractal polyfractal assigned imotov and unassigned andyb-elastic May 19, 2020
imotov added a commit to imotov/elasticsearch that referenced this issue May 21, 2020
Increases the default search.max_buckets limit to 65,535, and only counts
buckets during reduce phase.

Closes elastic#51731
imotov added a commit that referenced this issue Jun 3, 2020
Increases the default search.max_buckets limit to 65,535, and only counts
buckets during reduce phase.

Closes #51731
imotov added a commit to imotov/elasticsearch that referenced this issue Jun 3, 2020
Increases the default search.max_buckets limit to 65,535, and only counts
buckets during reduce phase.

Closes elastic#51731
imotov added a commit that referenced this issue Jun 3, 2020
Increases the default search.max_buckets limit to 65,535, and only counts
buckets during reduce phase.

Closes #51731
@gleventhal
Copy link

How does someone (using e.g: v. 6.6.2) prevent unreasonably large aggregations from hobbling the cluster without returning misleading results to clients (due to the limits on buckets)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.