Skip to content

router: add request_body_buffer_limit for large request buffering#40254

Merged
yanavlasov merged 20 commits intoenvoyproxy:mainfrom
agrawroh:large-buffering-1
Aug 11, 2025
Merged

router: add request_body_buffer_limit for large request buffering#40254
yanavlasov merged 20 commits intoenvoyproxy:mainfrom
agrawroh:large-buffering-1

Conversation

@agrawroh
Copy link
Copy Markdown
Member

@agrawroh agrawroh commented Jul 16, 2025

Description

ML/inference requests often require buffering the entire request body to determine routing destination based on content rather than headers, and to support retries of failed requests. The existing per_request_buffer_limit_bytes (32-bit) is insufficient for large ML payloads that can exceed 4GB.

This PR adds request_body_buffer_limit configuration to VirtualHost and Route for buffering large request bodies beyond connection buffer limits. This enables support for ML/inference workloads that require buffering entire request bodies for processing and retries.

When request_body_buffer_limit is not configured, the existing per_request_buffer_limit_bytes behavior is preserved. Routes inherit from virtual hosts when not explicitly configured.

See #40028


Commit Message: router: add request_body_buffer_limit for large request buffering
Additional Description: Added request_body_buffer_limit configuration to VirtualHost and Route for buffering large request bodies beyond connection buffer limits.
Risk Level: Low
Testing: Added Unit + Integration Tests
Docs Changes: Added
Release Notes: Added

@repokitteh-read-only
Copy link
Copy Markdown

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #40254 was opened by agrawroh.

see: more, trace.

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @abeyad
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #40254 was opened by agrawroh.

see: more, trace.

Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
@agrawroh agrawroh force-pushed the large-buffering-1 branch from cd62d01 to 46fc331 Compare July 16, 2025 21:05
// The maximum bytes which will be buffered for request bodies to support large request body
// buffering beyond the ``per_connection_buffer_limit_bytes``.
//
// This limit is specifically for request body buffering and allows buffering larger inference
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is meant by inference payloads here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It refers to the Model Serving/Inference workloads which are typically large. See the Background section here. If it's confusing then I'm happy to change it to what you prefer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is general API (not specific to ML), perhaps just saying "larger payloads while maintaining..." would be better (i.e. remove "inference")? In any case, I'm fine either way. Thanks!

@agrawroh agrawroh requested a review from yanavlasov July 16, 2025 21:22
@agrawroh agrawroh marked this pull request as ready for review July 16, 2025 21:23
Copy link
Copy Markdown
Contributor

@abeyad abeyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

Comment on lines +372 to +383

// The maximum bytes which will be buffered for request bodies to support large request body
// buffering beyond the ``per_connection_buffer_limit_bytes``.
//
// This limit is specifically for request body buffering and allows buffering larger inference
// payloads while maintaining the flow control.
//
// If not set, defaults to ``per_request_buffer_limit_bytes`` behavior.
//
// When set, this limit supersedes ``per_connection_buffer_limit_bytes`` for request body buffering
// but ``per_request_buffer_limit_bytes`` still controls flow control chunk sizes.
google.protobuf.UInt64Value request_body_buffer_limit = 20;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between the per_request_buffer_limit_bytes and request_body_buffer_limit?

/wait-any

Copy link
Copy Markdown
Member

@wbpcode wbpcode Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the context. IMO, this new field and previous per_request_buffer_limit_bytes are pretty confusing. The @yanavlasov may want the per_request_buffer_limit_bytes to help with flow control. But IMO, it actually change the semantic of the per_request_buffer_limit_bytes and make the API hard to understand/use.

So, my suggestion here is deprecate the per_request_buffer_limit_bytes and ensure only one of the new field and the old field could be set.

If the flow control is necessary, we could use another number like the connection buffer size etc. This ensure the both fields (per_request_buffer_limit_bytes and request_buffer_limit_bytes) share same semantic except the new field extend the number range.

/wait

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we can deprecate per_request_buffer_limit_bytes.
If neither per_request_buffer_limit_bytes nor request_body_buffer_limit are set then the buffering limit is listeners per_connection_buffer_limit_bytes

If per_request_buffer_limit_bytes is set but request_body_buffer_limit is not set the body buffer is min(per_request_buffer_limit_bytes, per_connection_buffer_limit_bytes) - the behavior does not change.

If per_request_buffer_limit_bytes is NOT set but request_body_buffer_limit is set - then the buffer limit is request_body_buffer_limit.

If both per_request_buffer_limit_bytes AND request_body_buffer_limit set, then the buffer limit is request_body_buffer_limit.

For flow control we can use min(per_connection_buffer_limit_bytes, 16Kb) chunk sizes. 16Kb is somewhat arbitrary, we can make it configurable later on.

Copy link
Copy Markdown
Contributor

@yanavlasov yanavlasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait-any

Comment on lines +372 to +383

// The maximum bytes which will be buffered for request bodies to support large request body
// buffering beyond the ``per_connection_buffer_limit_bytes``.
//
// This limit is specifically for request body buffering and allows buffering larger inference
// payloads while maintaining the flow control.
//
// If not set, defaults to ``per_request_buffer_limit_bytes`` behavior.
//
// When set, this limit supersedes ``per_connection_buffer_limit_bytes`` for request body buffering
// but ``per_request_buffer_limit_bytes`` still controls flow control chunk sizes.
google.protobuf.UInt64Value request_body_buffer_limit = 20;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we can deprecate per_request_buffer_limit_bytes.
If neither per_request_buffer_limit_bytes nor request_body_buffer_limit are set then the buffering limit is listeners per_connection_buffer_limit_bytes

If per_request_buffer_limit_bytes is set but request_body_buffer_limit is not set the body buffer is min(per_request_buffer_limit_bytes, per_connection_buffer_limit_bytes) - the behavior does not change.

If per_request_buffer_limit_bytes is NOT set but request_body_buffer_limit is set - then the buffer limit is request_body_buffer_limit.

If both per_request_buffer_limit_bytes AND request_body_buffer_limit set, then the buffer limit is request_body_buffer_limit.

For flow control we can use min(per_connection_buffer_limit_bytes, 16Kb) chunk sizes. 16Kb is somewhat arbitrary, we can make it configurable later on.

Comment thread envoy/router/router.h Outdated
@@ -686,6 +686,15 @@ class VirtualHost {
*/
virtual uint32_t retryShadowBufferLimit() const PURE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the per_request_buffer_limit_bytes value? If so can we just rename this method to decrease confusion?

Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
@agrawroh agrawroh force-pushed the large-buffering-1 branch from ca31e21 to fecf789 Compare July 17, 2025 19:12
agrawroh added 3 commits July 17, 2025 13:06
…ng-1

Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
@agrawroh agrawroh force-pushed the large-buffering-1 branch from fecf789 to 0a099ea Compare July 17, 2025 21:30
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Comment thread envoy/router/router.h Outdated
Comment on lines +687 to +696
virtual uint32_t perRequestBufferLimit() const PURE;

/**
* @return uint64_t the maximum bytes which should be buffered for request bodies. This enables
* buffering larger request bodies beyond the connection buffer limit for use cases
* with large payloads. If not set, falls back to perRequestBufferLimit() behavior.
* When set, this limit supersedes per_connection_buffer_limit_bytes for request body
* buffering but perRequestBufferLimit() still controls flow control.
*/
virtual uint64_t requestBodyBufferLimit() const PURE;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's difference between the perRequestBufferLimit and the requestBodyBufferLimit?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed retryShadowBufferLimit() and retry_shadow_buffer_limit which were tracking per_request_buffer_limit_bytes from the route to perRequestBufferLimit() and per_request_buffer_limit per the suggestion from @yanavlasov as the existing naming is a bit confusing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean should we keep only one requestBodyBufferLimit as in the API, the new request_body_buffer_limit field should be used to replace the old per_request_buffer_limit_bytes?

That's say we will only have one buffer limit the the route. It's unnecessary to keep perRequestBufferLimit() and requestBodyBufferLimit().

/wait

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I tried consolidating it. Please take another look.

agrawroh added 4 commits July 18, 2025 01:44
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
@agrawroh
Copy link
Copy Markdown
Member Author

/retest

1 similar comment
@agrawroh
Copy link
Copy Markdown
Member Author

/retest

agrawroh added 4 commits July 19, 2025 10:04
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Signed-off-by: yanavlasov <yavlasov@google.com>
@yanavlasov yanavlasov enabled auto-merge (squash) August 11, 2025 20:02
@abeyad
Copy link
Copy Markdown
Contributor

abeyad commented Aug 11, 2025

/lgtm api

@agrawroh
Copy link
Copy Markdown
Member Author

/retest

@yanavlasov yanavlasov merged commit 369ace2 into envoyproxy:main Aug 11, 2025
26 checks passed
yanavlasov pushed a commit that referenced this pull request Dec 16, 2025
…2611)

We introduced a bug at #40254 where the legacy vhost buffer limit will
take precedence over the legacy route buffer limit.

Risk Level: low.
Testing: unit.
Docs Changes: n/a.
Release Notes: added.
Platform Specific Features: n/a.

---------

Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>
grnmeira pushed a commit to grnmeira/envoy that referenced this pull request Mar 20, 2026
…voyproxy#42611)

We introduced a bug at envoyproxy#40254 where the legacy vhost buffer limit will
take precedence over the legacy route buffer limit.

Risk Level: low.
Testing: unit.
Docs Changes: n/a.
Release Notes: added.
Platform Specific Features: n/a.

---------

Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com>
Signed-off-by: Gustavo <grnmeira@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants