router: add request_body_buffer_limit for large request buffering#40254
router: add request_body_buffer_limit for large request buffering#40254yanavlasov merged 20 commits intoenvoyproxy:mainfrom
Conversation
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
cd62d01 to
46fc331
Compare
| // The maximum bytes which will be buffered for request bodies to support large request body | ||
| // buffering beyond the ``per_connection_buffer_limit_bytes``. | ||
| // | ||
| // This limit is specifically for request body buffering and allows buffering larger inference |
There was a problem hiding this comment.
what is meant by inference payloads here?
There was a problem hiding this comment.
It refers to the Model Serving/Inference workloads which are typically large. See the Background section here. If it's confusing then I'm happy to change it to what you prefer.
There was a problem hiding this comment.
Since this is general API (not specific to ML), perhaps just saying "larger payloads while maintaining..." would be better (i.e. remove "inference")? In any case, I'm fine either way. Thanks!
|
|
||
| // The maximum bytes which will be buffered for request bodies to support large request body | ||
| // buffering beyond the ``per_connection_buffer_limit_bytes``. | ||
| // | ||
| // This limit is specifically for request body buffering and allows buffering larger inference | ||
| // payloads while maintaining the flow control. | ||
| // | ||
| // If not set, defaults to ``per_request_buffer_limit_bytes`` behavior. | ||
| // | ||
| // When set, this limit supersedes ``per_connection_buffer_limit_bytes`` for request body buffering | ||
| // but ``per_request_buffer_limit_bytes`` still controls flow control chunk sizes. | ||
| google.protobuf.UInt64Value request_body_buffer_limit = 20; |
There was a problem hiding this comment.
What's the difference between the per_request_buffer_limit_bytes and request_body_buffer_limit?
/wait-any
There was a problem hiding this comment.
I get the context. IMO, this new field and previous per_request_buffer_limit_bytes are pretty confusing. The @yanavlasov may want the per_request_buffer_limit_bytes to help with flow control. But IMO, it actually change the semantic of the per_request_buffer_limit_bytes and make the API hard to understand/use.
So, my suggestion here is deprecate the per_request_buffer_limit_bytes and ensure only one of the new field and the old field could be set.
If the flow control is necessary, we could use another number like the connection buffer size etc. This ensure the both fields (per_request_buffer_limit_bytes and request_buffer_limit_bytes) share same semantic except the new field extend the number range.
/wait
There was a problem hiding this comment.
I agree we can deprecate per_request_buffer_limit_bytes.
If neither per_request_buffer_limit_bytes nor request_body_buffer_limit are set then the buffering limit is listeners per_connection_buffer_limit_bytes
If per_request_buffer_limit_bytes is set but request_body_buffer_limit is not set the body buffer is min(per_request_buffer_limit_bytes, per_connection_buffer_limit_bytes) - the behavior does not change.
If per_request_buffer_limit_bytes is NOT set but request_body_buffer_limit is set - then the buffer limit is request_body_buffer_limit.
If both per_request_buffer_limit_bytes AND request_body_buffer_limit set, then the buffer limit is request_body_buffer_limit.
For flow control we can use min(per_connection_buffer_limit_bytes, 16Kb) chunk sizes. 16Kb is somewhat arbitrary, we can make it configurable later on.
|
|
||
| // The maximum bytes which will be buffered for request bodies to support large request body | ||
| // buffering beyond the ``per_connection_buffer_limit_bytes``. | ||
| // | ||
| // This limit is specifically for request body buffering and allows buffering larger inference | ||
| // payloads while maintaining the flow control. | ||
| // | ||
| // If not set, defaults to ``per_request_buffer_limit_bytes`` behavior. | ||
| // | ||
| // When set, this limit supersedes ``per_connection_buffer_limit_bytes`` for request body buffering | ||
| // but ``per_request_buffer_limit_bytes`` still controls flow control chunk sizes. | ||
| google.protobuf.UInt64Value request_body_buffer_limit = 20; |
There was a problem hiding this comment.
I agree we can deprecate per_request_buffer_limit_bytes.
If neither per_request_buffer_limit_bytes nor request_body_buffer_limit are set then the buffering limit is listeners per_connection_buffer_limit_bytes
If per_request_buffer_limit_bytes is set but request_body_buffer_limit is not set the body buffer is min(per_request_buffer_limit_bytes, per_connection_buffer_limit_bytes) - the behavior does not change.
If per_request_buffer_limit_bytes is NOT set but request_body_buffer_limit is set - then the buffer limit is request_body_buffer_limit.
If both per_request_buffer_limit_bytes AND request_body_buffer_limit set, then the buffer limit is request_body_buffer_limit.
For flow control we can use min(per_connection_buffer_limit_bytes, 16Kb) chunk sizes. 16Kb is somewhat arbitrary, we can make it configurable later on.
| @@ -686,6 +686,15 @@ class VirtualHost { | |||
| */ | |||
| virtual uint32_t retryShadowBufferLimit() const PURE; | |||
There was a problem hiding this comment.
Is this the per_request_buffer_limit_bytes value? If so can we just rename this method to decrease confusion?
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
ca31e21 to
fecf789
Compare
…ng-1 Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
fecf789 to
0a099ea
Compare
| virtual uint32_t perRequestBufferLimit() const PURE; | ||
|
|
||
| /** | ||
| * @return uint64_t the maximum bytes which should be buffered for request bodies. This enables | ||
| * buffering larger request bodies beyond the connection buffer limit for use cases | ||
| * with large payloads. If not set, falls back to perRequestBufferLimit() behavior. | ||
| * When set, this limit supersedes per_connection_buffer_limit_bytes for request body | ||
| * buffering but perRequestBufferLimit() still controls flow control. | ||
| */ | ||
| virtual uint64_t requestBodyBufferLimit() const PURE; |
There was a problem hiding this comment.
What's difference between the perRequestBufferLimit and the requestBodyBufferLimit?
There was a problem hiding this comment.
I renamed retryShadowBufferLimit() and retry_shadow_buffer_limit which were tracking per_request_buffer_limit_bytes from the route to perRequestBufferLimit() and per_request_buffer_limit per the suggestion from @yanavlasov as the existing naming is a bit confusing.
There was a problem hiding this comment.
I mean should we keep only one requestBodyBufferLimit as in the API, the new request_body_buffer_limit field should be used to replace the old per_request_buffer_limit_bytes?
That's say we will only have one buffer limit the the route. It's unnecessary to keep perRequestBufferLimit() and requestBodyBufferLimit().
/wait
There was a problem hiding this comment.
Thanks, I tried consolidating it. Please take another look.
|
/retest |
1 similar comment
|
/retest |
Signed-off-by: Rohit Agrawal <rohit.agrawal@databricks.com>
Signed-off-by: yanavlasov <yavlasov@google.com>
|
/lgtm api |
|
/retest |
…voyproxy#42611) We introduced a bug at envoyproxy#40254 where the legacy vhost buffer limit will take precedence over the legacy route buffer limit. Risk Level: low. Testing: unit. Docs Changes: n/a. Release Notes: added. Platform Specific Features: n/a. --------- Signed-off-by: wbpcode/wangbaiping <wbphub@gmail.com> Signed-off-by: Gustavo <grnmeira@gmail.com>
Description
ML/inference requests often require buffering the entire request body to determine routing destination based on content rather than headers, and to support retries of failed requests. The existing
per_request_buffer_limit_bytes(32-bit) is insufficient for large ML payloads that can exceed 4GB.This PR adds
request_body_buffer_limitconfiguration toVirtualHostandRoutefor buffering large request bodies beyond connection buffer limits. This enables support for ML/inference workloads that require buffering entire request bodies for processing and retries.When
request_body_buffer_limitis not configured, the existingper_request_buffer_limit_bytesbehavior is preserved. Routes inherit from virtual hosts when not explicitly configured.See #40028
Commit Message: router: add request_body_buffer_limit for large request buffering
Additional Description: Added
request_body_buffer_limitconfiguration toVirtualHostandRoutefor buffering large request bodies beyond connection buffer limits.Risk Level: Low
Testing: Added Unit + Integration Tests
Docs Changes: Added
Release Notes: Added