Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puma v5.0.3+ times out slow connections after 30 seconds #2512

Closed
bmclean opened this issue Dec 11, 2020 · 6 comments
Closed

Puma v5.0.3+ times out slow connections after 30 seconds #2512

bmclean opened this issue Dec 11, 2020 · 6 comments

Comments

@bmclean
Copy link

bmclean commented Dec 11, 2020

Background
Our application is a Rails API for mobile clients running on Heroku. The product is agriculture based and our users often have poor cellular coverage.

This API receives a Base64 encoded file (125K max size). Even though the file is small, on a poor cellular connection it can be slow to upload. The Heroku router's 30 second request timeout window doesn't start until after the client has sent its full payload, so there's nothing that should stop the payload from being received.

With Puma versions 5.0.2 and older this is exactly how it worked. Requests with Heroku service times of even 90 seconds were completed successfully.

Problem
However, starting with Puma 5.0.3 we started seeing H18 errors from Heroku (socket closed on the server) after 30 seconds. Puma is closing the socket. If we increase the Puma config property first_data_timeout (say to 60 seconds) we reduce the number of error occurrences.

Is this expected behaviour from Puma version 5.0.3 and newer? Should we be setting a large first_data_timeout to get 5.0.2 like functionality?

To Reproduce
We can reproduce the problem with a very simple Rails controller and a script. The script posts a large string payload while running a network link conditioner on a 3G profile. If this needs to be investigated further we can certainly provide them.

@sbilharz
Copy link

We have the same issue. Since we updated from 5.0.2 to 5.0.4, our clients would complain about 408 responses which never occurred before. We tracked them down to puma - neither our load balancer nor the app code is producing them. According to our LB logs, the 408 response can occur between 0.2 to 15 seconds after forwarding, even though we do not set the first_data_timeout setting and therefore assume it to be the default 30 seconds. I am not sure yet how this can be explained.

We will try downgrading to 5.0.2 for now. I hope we can find a way to get the behavior of that version and still receive the latest puma updates.

@nateberkopec
Copy link
Member

So, like a couple of other bugs, what you're seeing here is something that was (in our view) fixed by the Reactor refactor in 5.0.3. first_data_timeout essentially did not work correctly. The 5.0.2 and less behavior is a bug.

@bmclean Yes, you should set first_data_timeout to a higher value if you need Puma to buffer clients for more than 30 seconds.

@sbilharz That would depend on how your LB is configured. If it's opening a connection to Puma before the request is fully buffered, that would be consistent with the behavior you're seeing. I suggest also increasing first_data_timeout.

I believe 30 seconds is a sensible default. Higher values open you up more to slow client attacks.

@bmclean
Copy link
Author

bmclean commented Dec 15, 2020

Thanks @nateberkopec

@bmclean
Copy link
Author

bmclean commented Dec 16, 2020

Hi @sbilharz. While figuring out how to get Puma 5.0.3+ to play nice with rack-timeout I was able to recreate the 408 response. The repo is here if you are curious.

@schneems
Copy link
Contributor

I believe 30 seconds is a sensible default.

Totally agree. Does the puma timeout reset with new bytes? The Heroku router will allow an indefinite connection as long as new bytes arrive in sub 30 second intervals.

Higher values open you up more to slow client attacks.

I think the reactor inherently protects against slow client attacks even without a timeout. It's one of the main reasons I started recommending puma over unicorn https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#slow-clients. Though I'm not sure if there are more resource limitations by moving to NIO4r from native IO.select.

@sbilharz
Copy link

Hi @sbilharz. While figuring out how to get Puma 5.0.3+ to play nice with rack-timeout I was able to recreate the 408 response. The repo is here if you are curious.

Very nice, thank you! That might save me a few hours of figuring out sensible values for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants