-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP/2: Hyper communication is stuck if too many requests are spawned. #3338
Comments
Hey there! It looks like the linked repo is missing the server code. I'm also curious, does it work if you don't use a waitgroup/semaphore? Just relying on hyper to automatically not go over the concurrent streams? |
Firstly, Appreciate your prompt response. |
I removed code parts using waitgroup in my personal space and ran the code. It still gets stuck. We still hit the issue. Semaphore is needed to keep check on number of parallel attempts at a time. So kept it. If you want I can update the repo with waitgroup removed. Code with waitgroup is tested though, for successful runs. |
Ok, I was just trying to figure out how small we can make the test case. Do you have the trace logs? Could you paste them in a comment here inside |
For 2-3 seconds of runtime, its pretty big log. After couple of runs I arrived at a log file within that limit: I will still suggest that please run it from the sample code so that you can have bigger runtime to capture any log that is printed while things are stuck. |
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Problem was with hyper-openssl, which is probably bit outdated. With hyper-rustls works as expected (and is actually faster by app. %50%, number of clients does not play a role - I guess it already enough parallel on one client). Can check here https://github.com/izderadicka/hyper-stuck |
I was too too quick with conclusions sorry. Previous example worked because it uses HTTP/1.1. It demonstrates like this:
Updated version which demonstrates problem will be on https://github.com/izderadicka/hyper-stuck |
I've done one more test - against Rust hyper server (added to repo) instead and now client works fine, without problems. Also one interesting finding - client against Rust server on HTTP/2.0 is about 4 times slower, then on HTTP/1.1 - against same server, only switched protocols, this is bit contra-intuitive. Go client on the other hand is little bit faster against Rust server, that it was against Go server. I wanted to try hyper 1.0 in client, but do not know yet how to setup TLS - 1.0 is still very fresh, hyper-rustls is on 0.14 and did not have time to dig deeper how to manually create appropriate connector. |
And one more final finding - from trace log - the log from stuck run against Go server have a lot of
|
I've done bit of additional exploration:
|
Excellent work investigating! So, is it related to flow control? Is there connection window available, but not stream? It's fine to investigate with hyper v0.14, it's most likely something in the |
Yes it's same in 0.14 and 1.0 - so I assume it could be h2. Not sure what you mean exactly by "Is there connection window available, but not stream?" - Is there any way how to check this? |
I'm combing through the logs. It doesn't seem like there's an easy way to see the connection window. But as I scroll through the logs, I do eventually see mention of "buffered" for a stream being really high, and things like It does seem like the hyper client is getting starved via flow control. It's quite interesting that a Rust server makes the client happy. But I wonder what the Go server does differently... Can you change your program to print the status code of each response? I notice that the Go server will try to read the full request body, but if there's some sort of error, it will return a 400 response. I wonder if any of those happen, and if because of it, some window capacity isn't released for that stream in particular. Or if the buffered data is no longer allowed to keep being sent. |
Hi,
So I do start all 300 requests (each in separate task), I got only 4 back, then it's stuck. I have attached complete tracing log, some interesting notes: At the time of stuck (~18:34:50) a lot of "stream capacity is 0" as already noted before, and streams (grepped by id - see below), usually ends by this message At the time of stuck alot of Goways like this one:
It's interesting to look at few first streams by id (I think odd ids are from client)
So for streams up to 7 (lower ids not shown, they have exactly same number of messages) it went well I think - it corresponds probably to responses I got back, however from stream 9 up it gets messed up - it looks similar for following streams up to very last streams, which behaves differently:
I guess by comparing individual streams - especially say 7 with 9 some insight might be gained. The only think I noticed that stream 7 did not have any WINDOW_UPDATE frames received, but problematic stream 7 already has quite few of them - 27. |
Did some investigating, it seems like all the streams are starved on connection capacity and couldn't proceed to the next state. Happens because the high Fut limit means the streams are fighting for the capacity (and some have reserved capacities). Log: https://gist.github.com/dswij/9cd727705018f41a7e627b995584deed Although popping If we change the fut limit (i.e. |
@dswij thanks so much for digging! I really want to get this fixed, but haven't had time between hyper 1.0, upgrading reqwest, and end of year needs. Would you want to look into this further, and recommend a fix for hyper? |
Sure, I'll have a go at it |
As an update, @dwij has a proposed change that appears to fix a reported case in this issue. If you'd like to take a look, that'd be a huge help! hyperium/h2#730 |
This PR changes the the assign-capacity queue to prioritize streams that are send-ready. This is necessary to prevent a lockout when streams aren't able to proceed while waiting for connection capacity, but there is none. Closes hyperium/hyper#3338
The fix has been published in |
This PR changes the the assign-capacity queue to prioritize streams that are send-ready. This is necessary to prevent a lockout when streams aren't able to proceed while waiting for connection capacity, but there is none. Closes hyperium/hyper#3338
This PR changes the the assign-capacity queue to prioritize streams that are send-ready. This is necessary to prevent a lockout when streams aren't able to proceed while waiting for connection capacity, but there is none. Closes hyperium/hyper#3338 Co-authored-by: dswij <dharmasw@outlook.com>
Version
0.14.27
Platform
Linux ubuntu 4.15.0-213-generic # 224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Short Description of the issue
Client network communications using Rust hyper library continues to stuck if too many requests are spawned.
Description
I see this bug has been earlier reported in #2419 and also has another manifestation in #3041.
Both are marked fixed. #3041 also says the fix is tested with 100 requests on 50 streams.
Generally speaking, HTTP2 Servers allows 250 parallel streams. However, client can create unlimited streams. The RFC has the following recommendation: "It is recommended that this value be no smaller than 100, so as to not unnecessarily limit parallelism."
We have a RUST client implementation, using hyper 0.14.27, along with a GoLang Server. Client requests are restricted to 400 at any point of time. Go Client runs smoothly. hyper client gets stuck every time. The bug is easily reproducible. Both RUST client and Go Server implementation can be found in the link below.
Where is the sample code that hits the issue?
Sample Code Link: https://github.com/kundu-subhajit/hyper-stuck
It has a GoLang Server implementation, Rust Client implementation using Hyper-0.14.27 & a GoLang Client implementation.
The build instructions are very straightforward
Any known workaround with the sample code?
We do acknowledge that this same issue isn't reproducible with semaphore count below 250 in the sample code shared.
Why we cant use smaller number of parallel streams as workaround?
In our local setup we have 100 as max number of HTTP/2 streams at server. But even if we try with semaphore count of 90, network communications through hyper are stuck. Trying to isolate that we arrived here and at the sample code.
What we have done differently in the sample code compared to previous reporting?
Nothing much. We have done only two minor things differently here:
CC: @jeromegn, @seanmonstar, @jfourie1
Please help us fix this bug at your earliest. Any help with this is highly appreciated. I thank you in anticipation. Thanks.
The text was updated successfully, but these errors were encountered: