New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
onResponseHeaders not called on continuous sendData #2213
Comments
Interesting bug! I'm slightly unclear about something. In the test, data is sent via setOnSendWindowAvailable() which, I assume, takes a callback which is executed when new send window is available. In the test, is there ever a period when no send window is available? |
Here is the buggy sequence:
Here is how this should go, more or less:
|
Right, that makes sense. But I'm confused why we seem to have infinite send window available. Do you understand why we're never hitting a limit? (Independent of the issue you're raising that onResponseHeaders should have priority over OnSendWindowAvailable) |
Looking into this a bit, it looks like Client::sendData() calls onSendWindowAvailable() immediately if there is still window available:
@alyssawilk it looks like this code came from #1545. Should we consider doing something async here? |
This is not an infinite window. In fact this is very sequential, one ByteBuffer at a time.
Look at the Cronet test: |
Hm. I'm not sure we're on the same page. With, say, HTTP/2 or HTTP/3 there is a per-stream limit on the amount of data the client is allowed to send to the peer. With HTTP/1 this limit is in the OS at the socket layer. I would have expected that we would eventually exhaust this limit, but it doesn't look like that is happening in this test and that confuses me. I agree with you that onResponseHeaders should be called and the fact that it is not called is a problem. I'm just trying to understand the test setup. |
So to follow up from the offline discussion, the test sends only 1000 bytes (albeit one byte at a time) which explains why we never hit the limit. Thanks, this explains it! |
but scheduling repeated onSendWindowAvailable calls in the next dispatcher iteration. Adds the regression test Charles cooked up in envoyproxy#2212 Fixes envoyproxy#2213
by scheduling repeated onSendWindowAvailable calls in the next dispatcher iteration. Adds the regression test Charles cooked up in envoyproxy#2212 Fixes envoyproxy#2213 Signed-off-by: Ryan Hamilton <rch@google.com>
After cooking up a fix for this issue in #2221 and discussing the behavior with @goaway, I think this is an artifact of how the Cronet executor is implemented. the onSendWindowAvailable() callback is handed off to the executor to schedule. In non-cronet executors, this happens on a different thread from the Envoy networking thread. As a result, the thread hopping allows the read loop to be pumped and the headers to be delivered. However, assuming we understand the Cronet executor correctly, with Cronet the callback happens on the same thread as the networking thread and so the read loop is never pumped. While we could land a fix like #2221, it's not clear that is the right approach since it is at odds with how other callbacks work in Envoy Mobile. I think that probably a change to the Cronet executor is probably in order. @carloseltuerto, what do you think? |
In this case, the user specifically requested to use the Network Thread to run all the business logic. Chromium/Cronet behaves as expected under this constraint: Cronet runs all the user code under the Network Thread, in the expected order. In other words, the sole Executor provided to Cronvoy is a Direct Executor: there is nothing else to choose from. Unless Cronvoy unilaterally spawns a "private" Executor Thread, ignoring the user's prescription. Also, please keep in mind that Cronet was designed to let the savvy user to use the Network Thread: This is obviously an optimisation - this avoids context switching between each request body ByteBuffer. We could take the decision to ignore this optimisation, but this comes probably at a price: for some performance tests, Cronvoy won't be able to surpass Cronet. |
Oh, interesting. Thanks! I had forgotten about Cronet's direct executor mode. Hm. Ok, I'll do some more investigation. (I'm also curious about the buffering differences between Cronet and Envoy Mobile and if there are semantic differences between Cronet's onWriteComplete() and Envoy Mobile's onSendWindowAvailable()) |
The semantic is the opposite at the surface. Cronet tells you when it is finished, and EM tells you when it is ready for next ByteBuffer. The Cronvoy implementation needs to fill the gap here. Since there is no final callback for the "last write", Cronvoy takes the liberty of immediately scheduling the Cronet's onWriteComplete last callback upon writing the last ByteBuffer. This could look like a "lie" - the last ByteBuffer might not in reality have hit the wire when onWriteComplete gets called. Still, for all intent and purposes, this does not matter at all - if there is a problem, onError will get called anyway. And I do not know how Cronet implemented this: I would not be surprised if the Cronet C++ layer does exactly the same. |
OK, I've spend the day coming up to speed on the behavior of Ok, so I think that explains what is happening. But I'm not entirely sure what we should do to fix this. @alyssawilk how feasible would it be for Envoy/Envoy Mobile to expose a signal more like Cronet's OnWriteComplete? Maybe this is really hard due to the communication / buffering that happens between upstream and downstream? |
Ok, let me summarize where we are here. Cronet direct executor mode is an important performance optimization in which writes happen on the network thread. It seems like we should preserve this behavior. PR #2221 attempts to resolve this issue by scheduling the After taking with @alyssawilk, maybe the right approach would be to try to land #2221 and see what it reveals. If we are able to sort out any issues then we can use this as a first step towards moving all Envoy Mobile callbacks. @goaway, how does that sound to you? |
@RyanTheOptimist it looks like at least in the current version of #2221 the new logic is fully guarded by |
…g, (#2221) Description: Fix Envoy Mobile bug where writing prevents the read loop from running, by scheduling repeated onSendWindowAvailable calls in the next dispatcher iteration. Adds the regression test Charles cooked up in #2212 Fixes #2213 Risk Level: Low Testing: New regression test Docs Changes: N/A Release Notes: N/A Signed-off-by: Ryan Hamilton <rch@google.com>
Unfortunately, this fix crashes EM. Trying to remove this
|
…cancelled. Fixes: envoyproxy#2213 Signed-off-by: Ryan Hamilton <rch@google.com>
cc: @alyssawilk @goaway
Pushing more ByteBuffers always has priority over performing the onResponseHeaders callback. You could have received the ResponseHeaders one hour ago - still no callback indicating so if you keep pushing more data.
This problem is systematic on H2 with explicit flow control when using the EM Network Thread to perform the callback logic.
There is a specific test for this in Cronet: https://source.chromium.org/chromium/chromium/src/+/main:components/cronet/android/test/javatests/src/org/chromium/net/BidirectionalStreamTest.java;l=465
This test is being ported, but is currently failing due to this issue.
And there is a PR with a simple test demonstrating the failure: #2212
The text was updated successfully, but these errors were encountered: