APIs like /_cluster/state Break for Large Clusters due to Response Size Limitations

ES currently is not able to return REST responses larger than 2Gb (max int value) because of the way we serialize the messages into `BytesReference` instances.
This causes APIs like `/_cluster/state` to stop working eventually (in this case we're talking about ~15k indices with Auditbeat templates when using `?human&pretty` and an almost 1G response without those parameters).

Even before outright breaking due to the 2G size limit, requesting a response of this size can destabilize smaller master nodes. This has already been observed for smaller states when concurrent requests come into the mix.

This is not all that important of an issue in practice for most users because of the limited usefulness of these massive responses in most cases, but:
One implication of this issue is that the support diagnostics tool breaks and/or that running it might destabilize the master/cluster.
Another issue is orchestration tooling that might hit endpoints like the cluster state endpoint and destabilize/break clusters that way (observed in the real-world already).

It is definitely a bug to have endpoints that eventually become unusable or worse yet allow for bringing down a node if called.
A solution to this is likely to not have these endpoints instead of making them work at larger scale and force users/tooling to use more specific endpoints for the problem at hand instead. 

relates #77466 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

APIs like /_cluster/state Break for Large Clusters due to Response Size Limitations #79560

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

APIs like /_cluster/state Break for Large Clusters due to Response Size Limitations #79560

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions