New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Puma v5.0.3+ times out slow connections after 30 seconds #2512
Comments
We have the same issue. Since we updated from 5.0.2 to 5.0.4, our clients would complain about 408 responses which never occurred before. We tracked them down to puma - neither our load balancer nor the app code is producing them. According to our LB logs, the 408 response can occur between 0.2 to 15 seconds after forwarding, even though we do not set the We will try downgrading to 5.0.2 for now. I hope we can find a way to get the behavior of that version and still receive the latest puma updates. |
So, like a couple of other bugs, what you're seeing here is something that was (in our view) fixed by the Reactor refactor in 5.0.3. @bmclean Yes, you should set first_data_timeout to a higher value if you need Puma to buffer clients for more than 30 seconds. @sbilharz That would depend on how your LB is configured. If it's opening a connection to Puma before the request is fully buffered, that would be consistent with the behavior you're seeing. I suggest also increasing first_data_timeout. I believe 30 seconds is a sensible default. Higher values open you up more to slow client attacks. |
Thanks @nateberkopec |
Totally agree. Does the puma timeout reset with new bytes? The Heroku router will allow an indefinite connection as long as new bytes arrive in sub 30 second intervals.
I think the reactor inherently protects against slow client attacks even without a timeout. It's one of the main reasons I started recommending puma over unicorn https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#slow-clients. Though I'm not sure if there are more resource limitations by moving to NIO4r from native IO.select. |
Background
Our application is a Rails API for mobile clients running on Heroku. The product is agriculture based and our users often have poor cellular coverage.
This API receives a Base64 encoded file (125K max size). Even though the file is small, on a poor cellular connection it can be slow to upload. The Heroku router's 30 second request timeout window doesn't start until after the client has sent its full payload, so there's nothing that should stop the payload from being received.
With Puma versions 5.0.2 and older this is exactly how it worked. Requests with Heroku service times of even 90 seconds were completed successfully.
Problem
However, starting with Puma 5.0.3 we started seeing H18 errors from Heroku (socket closed on the server) after 30 seconds. Puma is closing the socket. If we increase the Puma config property
first_data_timeout
(say to 60 seconds) we reduce the number of error occurrences.Is this expected behaviour from Puma version 5.0.3 and newer? Should we be setting a large
first_data_timeout
to get 5.0.2 like functionality?To Reproduce
We can reproduce the problem with a very simple Rails controller and a script. The script posts a large string payload while running a network link conditioner on a 3G profile. If this needs to be investigated further we can certainly provide them.
The text was updated successfully, but these errors were encountered: