Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipe.fold: destination sink can remain open even after pipe closed #1412

Closed
jhump opened this issue Jan 29, 2024 · 2 comments · Fixed by #1413
Closed

Pipe.fold: destination sink can remain open even after pipe closed #1412

jhump opened this issue Jan 29, 2024 · 2 comments · Fixed by #1413

Comments

@jhump
Copy link
Contributor

jhump commented Jan 29, 2024

If a call to Pipe.sink.close() is invoked concurrently while Pipe.fold() is running, it can be missed by the folding thread, leaving the fold destination open, even though it should be closed. Depending on how the sink is used, this can lead to deadlock or slowness and timeout failures.

jhump added a commit to connectrpc/connect-kotlin that referenced this issue Jan 30, 2024
jhump added a commit to connectrpc/connect-kotlin that referenced this issue Jan 31, 2024
This includes a fix to the recently observed flakiness in client
streams, which was introduced (in #196).

The problems are described below. The fixes are each in their own
commit, so reviewing commit-by-commit might make this PR easier to read.

1. There was an issue in the use of `Result<Unit>` as the return type
for `send` operations. Since the return type (`Unit`) is really a
void/no-return type, calling code was never checking the result. That
means that when the operation failed, it was never noticed. I'm a little
surprised that we don't have linters or warnings for calls to functions
that return `Result` where that return value is ignored.

It turns out that this was not just a problem in my calling code,
failing to check the return value for failure, but in the framework
itself: the stream wrapper in `ProtocolClient` (wrapping underlying
stream returned by `ConnectOkHttpClient`) was using an `onSend` callback
that called the underlying stream's `send`. But the `onSend` callback
simply returned `Unit` instead of `Result<Unit>`, and the method that
propagated the result wasn't checking the result and throwing.
    
I think this is the biggest commit here, and it's because I did some
overhauling of `Stream`. For one, I changed it to an interface -- mainly
so that we could apply a decorator pattern to it and
`HTTPClientInterface` (more on that in a later PR). This makes the
wrapper in `ProtocolClient` simpler -- instead of it being a full
implementation, with its own atomic booleans to guard/track the close
operations, it just delegates to the underlying implementation.
    
2. The Connect unary protocol can return a 408 status code for
"canceled" and "deadline exceeded" RPC errors. But okhttp auto-retries
this status code, even though the requests are not idempotent (i.e. even
for POST calls). This isn't an issue with the stream conformance tests,
but was noticed later after I added an extra check to the reference
server so that it catches cases where a client sends an RPC for the same
test case more than once. This commit adds a network interceptor to the
`OkHttpClient` that will identify 408 responses that look like Connect
unary responses and change their status code to 499. That is the only
way I could find to prevent the retries.

3. The recently introduced flakiness in client streams is actually a
rather severe issue. It was mainly observed in the new conformance suite
with server streams when gzip was used, because it was all due to race
conditions and the gzip operations would slow down the producer thread
just enough to tickle the issue. The problem is that the
`RequestBody.writeTo` function should not return before the request body
is finished when the request body is not duplex. But it was calling
`pipe.fold` and then immediately returning. The `fold` method swaps in a
new sink in place of the read-side of the pipe and then returns, without
waiting for the pipe's write side to complete. So now we use a
`CountDownLatch` to wait until the writer is complete (which is signaled
via a call to `close`).

4. The last issue I encountered was much less frequent, and also turned
out to be a race condition. It was caused by a concurrency bug in
`okio.Pipe` (square/okio#1412). Basically,
some duplex operations (i.e. bidi RPCs) would infrequently timeout
because, even though the stream writer had closed the pipe, the HTTP
request body incorrectly remained open. I've opened a PR with a fix in
the `okio` library, but I've also added a work-around for now in the
code here, by using extra synchronization between the calls to `write`,
`close`, and `fold`.
@jhump
Copy link
Contributor Author

jhump commented Feb 5, 2024

@swankjesse, thanks for reviewing and merging that PR! Any ideas on when this bugfix can expect to appear in a released (non-snapshot) version?

@swankjesse
Copy link
Member

@jhump Lemme try to land something by the end of the week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants