Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy: Support retries on POST requests (for gRPC) #6130

Closed
olix0r opened this issue May 17, 2021 · 0 comments · Fixed by linkerd/linkerd2-proxy#1017
Closed

proxy: Support retries on POST requests (for gRPC) #6130

olix0r opened this issue May 17, 2021 · 0 comments · Fixed by linkerd/linkerd2-proxy#1017
Assignees
Milestone

Comments

@olix0r
Copy link
Member

olix0r commented May 17, 2021

Linkerd's proxy does not currently support retries on requests with payloads. This precludes Linkerd from retrying gRPC requests, which is a substantial limitation.

I propose that we add support for retries for requests with payloads when a content-length header is set and its value is less than 64KB. When retries are configured on a route and this content-length restriction applies, we should read the request body (and trailers!) into a buffer before issuing the outbound request so that, if the request fails and the failure is retryable, we can reissue the request.

We should be sure that we never do this buffering for requests on which retries are not configured.

@olix0r olix0r added this to the stable-2.11.0 milestone May 17, 2021
hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue May 20, 2021
Currently, the `http_box::BoxBody` type erases both the type of the
`http_body::Body` _and_ the `Body::Data` associated type (to `Box<dyn
Buf + Send>`). However, it turns out that erasing the `Data` type isn't
actually necessary: _all_ `Body` types in Linkerd use `Bytes` as the
`Bytes` type.

This branch changes `BoxBody` to just require that `B::Data` is `Bytes`,
and always produce `Bytes` as its data type. This will enable a much
more performant implementation of request body buffering for POST
retries (linkerd/linkerd2#6130). Rather than copying all the bytes from
a `dyn Buf` `Data` type into an additional buffer, we can just clone the
`Bytes`, increasing the reference count for that buffer, and return the
original `Bytes`. This means we don't need to perform additional memcpys
to buffer bodies for retries; we just hold onto the already-allocated
`Bytes` buffers.

The cloned `Bytes` can then be stored in a vector, and if we have to
write them out again for a retry, we can do one big vectored write :)
hawkw added a commit to linkerd/linkerd2-proxy that referenced this issue May 28, 2021
Linkerd's proxy does not currently support retries on requests with
payloads. This precludes Linkerd from retrying gRPC requests, which is a
substantial limitation.

This PR adds support for retries on requests with bodies, if and only if
the request has a `Content-length` header and the content length is <=
64 KB.

In order to retry requests with payloads, we need to buffer the body
data for that request so that it can be sent again if a retry is
necessary. This is implemented by wrapping profile requests in a new
`Body` type which lazily buffers each chunk of data polled from the
inner `Body`. The buffered data is shared with a clone of the request,
and when the original body is dropped, ownership of the buffered data is
transferred to the clone. When the cloned request is sent, polling its
body will yield the buffered data.

If the server returns an error _before_ an entire streaming body has
been read, the replay body will continue reading from the initial
request body after playing back the buffered portion.

Data is buffered by calling `Buf::copy_to_bytes` on each chunk of data.
Although we call this method on an arbitrary `Buf` type, all data chunks
in the proxy are actually `Bytes`, and their `copy_to_bytes` methods are
therefore a cheap reference-count bump on the `Bytes`. After calling
`copy_to_bytes`, we can clone the returned `Bytes` and store it in a
vector. This allows us to buffer the body without actually copying the
bytes --- we just increase the reference count on the original buffer.

This buffering strategy also has the advantage of allowing us to write
out the entire buffered body in one big `writev` call. Because we store
the buffered body as a list of distinct buffers for each chunk, we can
expand the buffered body to a large number of scatter-gather buffers in
`Buf::bytes_vectored`. This should make replaying the body more
efficient, as we don't have to make a separate `write` call for each
chunk.

I've also added several tests for the new buffering body. In particular,
there are tests for a number of potential edge cases, including:

- when a retry is started before the entire initial body has been read
  (i.e. if the server returns an error before the request completes),
- when a retry body is cloned multiple times, including if the client's
  body has not yet completed,
- dropping clones prior to completion

Closes linkerd/linkerd2#6130.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants