Conversation
- Short: Removing this feature makes the internals easier to reason about and I do not believe there's a good reason to disable the reactor that we want to support. Research also points to disabling of the reactor to be an implementation detail in an initial effort to configure the ACCEPT behavior, and over time it morphed into effectively meaning "disable the reactor" - Long: ## Background The `queue_requests` config was originally introduced in #640 in 2015. It is marked as closing this issue #612 (opened in 2014, ten years ago). The use case in the issue for this option to disable the reactor was this: A worker with 1 thread needs to send a request to itself to finish. The example given is downloading a PDF from the same server. If the reactor is accepting requests as fast as it can, the same worker that sent the request might accept the request, however, since it cannot complete the first request (waiting on the second), the second request will never execute. i.e. it will deadlock. The problem described does not come from having a reactor that buffers request bodies, but rather from the application ACCEPT-ing a new request from the socket before it had capacity to process it. Disabling the reactor was effectively an implementation detail. The issue was opened nearly 10 years ago and the justification given for wanting to run with one max thread is that "not all code is thread safe". In the decade since, the Ruby community has come a long way. Sidekiq's threaded job queue has become the industry standard. It's become an expectation that all Ruby code should be threadsafe for quite some time now. An application can avoid this deadlock situation by either not requiring this functionality (making requests to itself) or running with more threads. It's also worth noting that the possibility of deadlock is not zero when moving to 2 threads, as both those threads might be executing a "give me a PDF" request, however the chances are decreased. I do not believe this to be a primary use case we need to support, and further, we've since mitigated the issue in other ways. In the same PR a `wait_until_not_full` method was introduced. But it was only called if queue_requests was disabled. This method still lives on today, but is called unconditionally before the socket is read. https://github.com/puma/puma/blob/a59afabe782ec8869bb709ca991b32c0b2cc95fd/lib/puma/server.rb#L363 What does `wait_until_not_full` do? It is what I call "anti-thundering-herd" measures where a worker will first check if it has capacity to handle a new request (it sees if all threads are busy or not) and only when it has capacity does it ACCEPT a new request from the socket. The situation for a thundering herd looks like this: Puma boots with multiple workers, one worker boots before the others and accepts more requests than it can process. The other workers boot and have nothing to work on and are idle. However it didn't always work as intended. This is one fix 9690d8f. I was present in the 2017 roundtable mentioned in the commit. Since then, this logic has been relatively stable. Since that was introduced, we also introduced `wait_for_less_busy_worker` which adds a tiny sleep to the worker to handle a related yet different scenario caused by OS thread scheduling. In this scenario, the OS might favor executing a busy worker rather than one with capacity to take new requests. More details are in the PR where it was added #2079. The solutio was to add a really tiny sleep to the busy workers to hint to the OS to let the less busy ones be scheduled. This became a default ## Queue requests confusion The `queue_requests` config that was originally introduced to gate ACCEPT behavior, has eventually come to mean "disable the reactor" in the current codebase as defaults have changed and morphed over the years. While the original docs introduced in that PR hint that enabling it deactivates keepalive functionality: > # Note that setting this to false disables HTTP keepalive and > # slow clients will occupy a handler thread while the request > # is being sent. A reverse proxy, such as nginx, can handle > # slow clients and queue requests before they reach puma. It wasn't exactly clear what that meant. i.e. when a keepalive request is made to puma with queue requests disabled, what should be the expected behavior? The released version of puma checks for several conditions before determining whether or not to send a "close" header to a keepalive request https://github.com/puma/puma/blob/d2b3b309bf76f252819cf6c89d72249895daf629/lib/puma/request.rb#L167-L176. The ability to deactivate this keepalive behavior is recent and is staged on main, it was introduced in #3496. Notably even HEAD does not deactivate this behavior based on the state of `queue_requests`. The current behavior is that while that logic returns `true` and the request is not placed back into the reactor (not possible when queue_requests is false) it will loop and continue to accept keepalive requests over the same connection because this code https://github.com/puma/puma/blob/9712d29bc156a7fa05a6f4ec6deda5ac20df50be/lib/puma/server.rb#L468-L470 is inside of a while loop. ## Keepalive connection This looping behavior for keepalive connections is not limited to when the reactor is disabled. There's a bug that currently means that keepalive requests effectively monopolize a thread for up to 10 sequential requests (by default) which is described here #3487. All signs point to needing to put the keepalive request into the reactor as a part of the mitigation. However, as pointed out in this comment #3506 (comment) when request queuing is disabled we are left with a difficult path forward. We can either choose to continue monopolizing the thread and preserve the behavior that exists today. Or we can close the connection which aligns with the docs saying that this setting will not work with keepalive requests. Rather than make that determination for the user, this change suggests we should remove that codepath as an invalid configuration option. ## Usage As part of informal research into useage of the feature I asked on mastodon and no one appears to be using it: - Link: https://ruby.social/@Schneems/113289823703358711 - Screenshot: https://www.dropbox.com/s/9t1ur0rpqz5zkku/Screenshot%202024-10-14%20at%2012.43.27%E2%80%AFPM.png?dl=0 I asked in slack groups and the only example someone could give was this issue comment where (funny enough) they're using it to try to diagnose the very same keepalive buggy behavior #2311 (comment). From my personal experience debugging Ruby applications at Heroku, I have seen this feature used in the wild, so I know it's used. It's just that when it is used, people are not aware that they're losing slow-client protection and disabling the reactor. Usually they're after some other side-effect that disabling it caused. In short, I don't believe it's heavilly used, and for the cases where it is used, I don't believe those are valid use cases. In the comments I added a note requesting people inform us of their use cases if they feel they have a valid one so we can investigate it and possibly add a different configuration option.
Why is GitHub and Shopify still on forking web servers then 😆 😛 |
| if max == 1 | ||
| warn "[WARNING] Puma config `threads #{min}, #{max}` has a max thread count of 1. If the application makes" | ||
| warn " blocking HTTP calls to itself, it may deadlock. Consider increasing the max thread count." | ||
| end |
There was a problem hiding this comment.
This is a questionable warning to add, isn't running multiple workers each with one thread a legitime use-case with Puma?
$ echo 'app { [200, {}, ["OK"]] }' | ruby -Ilib ./bin/puma --config /dev/stdin --port 0 --log-requests --workers 3 --threads 1:1
[WARNING] Puma config `threads 1, 1` has a max thread count of 1. If the application makes
blocking HTTP calls to itself, it may deadlock. Consider increasing the max thread count.
[70321] Puma starting in cluster mode...
[70321] * Puma version: 6.4.3 (ruby 3.3.5-p100) ("The Eagle of Durango")
[70321] * Min threads: 1
[70321] * Max threads: 1
[70321] * Environment: development
[70321] * Master PID: 70321
[70321] * Workers: 3
[70321] * Restarts: (✔) hot (✔) phased
[70321] * Listening on http://0.0.0.0:64174
[70321] Use Ctrl-C to stop
[70322] + Gemfile in context: /Users/dentarg/src/puma/Gemfile
[70323] + Gemfile in context: /Users/dentarg/src/puma/Gemfile
[70324] + Gemfile in context: /Users/dentarg/src/puma/Gemfile
[70321] - Worker 0 (PID: 70322) booted in 0.0s, phase: 0
[70321] - Worker 1 (PID: 70323) booted in 0.0s, phase: 0
[70321] - Worker 2 (PID: 70324) booted in 0.0s, phase: 0
^C[70321] - Gracefully shutting down workers...
[70321] === puma shutdown: 2024-11-07 21:18:42 +0100 ===
[70321] - Goodbye!| def queue_requests(answer=true) | ||
| warn "[DEPRECATION] Puma config `queue_requests` is deprecated and will be removed in Puma 7.0.0." | ||
| warn " In Puma 7.0+ the reactor cannot be deactivated." | ||
|
|
||
| @options[:queue_requests] = answer |
There was a problem hiding this comment.
I'm on board with the deprecation, but there should probably be a way to silence the warning.
Similar to silence_single_worker_warning
Lines 533 to 540 in 29fa3f5
|
I feel like rolling back #612 and saying "actually we're not going to support that" is a bigger deal than you're giving it credit for. "It is less likely to deadlock" isn't useful, it's not a great user story to say "well you'll only deadlock 1% of the time". |
|
Do you still want to do this @schneems? |
|
At the time I was frustrated having to handle the no queue case for fixing keepalives. I am still interested in where in the internals we could simplify things by removing little used features if it means large maintenance gains. But my will to drive this PR forwards is muted. my priority right now is debugging the perf differences between 6.6.1 and 7pre. I might come back to this in the future. Good to close for now. |
|
Worth noting that Puma 7 removed It's unclear to me how exactly the original code would have even prevented deadlock via accept (as I described earlier), as even if the server never accepts the incoming request, it would still queue on the socket. |
Short: Removing this feature makes the internals easier to reason about and I do not believe there's a good reason to disable the reactor that we want to support. Research also points to disabling of the reactor to be an implementation detail in an initial effort to configure the ACCEPT behavior, and over time it morphed into effectively meaning "disable the reactor"
Long:
Background
The
queue_requestsconfig was originally introduced in #640 in 2015. It is marked as closing this issue #612 (opened in 2014, ten years ago).The use case in the issue for this option to disable the reactor was this:
A worker with 1 thread needs to send a request to itself to finish. The example given is downloading a PDF from the same server. If the reactor is accepting requests as fast as it can, the same worker that sent the request might accept the request, however, since it cannot complete the first request (waiting on the second), the second request will never execute. i.e. it will deadlock.
The problem described does not come from having a reactor that buffers request bodies, but rather from the application ACCEPT-ing a new request from the socket before it had capacity to process it. Disabling the reactor was effectively an implementation detail.
The issue was opened nearly 10 years ago and the justification given for wanting to run with one max thread is that "not all code is thread safe". In the decade since, the Ruby community has come a long way. Sidekiq's threaded job queue has become the industry standard. It's become an expectation that all Ruby code should be threadsafe for quite some time now. An application can avoid this deadlock situation by either not requiring this functionality (making requests to itself) or running with more threads. It's also worth noting that the possibility of deadlock is not zero when moving to 2 threads, as both those threads might be executing a "give me a PDF" request, however the chances are decreased. I do not believe this to be a primary use case we need to support, and further, we've since mitigated the issue in other ways.
In the same PR a
wait_until_not_fullmethod was introduced. But it was only called if queue_requests was disabled. This method still lives on today, but is called unconditionally before the socket is read.puma/lib/puma/server.rb
Line 363 in a59afab
What does
wait_until_not_fulldo? It is what I call "anti-thundering-herd" measures where a worker will first check if it has capacity to handle a new request (it sees if all threads are busy or not) and only when it has capacity does it ACCEPT a new request from the socket. The situation for a thundering herd looks like this: Puma boots with multiple workers, one worker boots before the others and accepts more requests than it can process. The other workers boot and have nothing to work on and are idle. However it didn't always work as intended. This is one fix 9690d8f. I was present in the 2017 roundtable mentioned in the commit. Since then, this logic has been relatively stable.Since that was introduced, we also introduced
wait_for_less_busy_workerwhich adds a tiny sleep to the worker to handle a related yet different scenario caused by OS thread scheduling. In this scenario, the OS might favor executing a busy worker rather than one with capacity to take new requests. More details are in the PR where it was added #2079. The solutio was to add a really tiny sleep to the busy workers to hint to the OS to let the less busy ones be scheduled. This became a defaultQueue requests confusion
The
queue_requestsconfig that was originally introduced to gate ACCEPT behavior, has eventually come to mean "disable the reactor" in the current codebase as defaults have changed and morphed over the years.While the original docs introduced in that PR hint that enabling it deactivates keepalive functionality:
It wasn't exactly clear what that meant. i.e. when a keepalive request is made to puma with queue requests disabled, what should be the expected behavior?
The released version of puma checks for several conditions before determining whether or not to send a "close" header to a keepalive request
puma/lib/puma/request.rb
Lines 167 to 176 in d2b3b30
queue_requests.The current behavior is that while that logic returns
trueand the request is not placed back into the reactor (not possible when queue_requests is false) it will loop and continue to accept keepalive requests over the same connection because this codepuma/lib/puma/server.rb
Lines 468 to 470 in 9712d29
Keepalive connection
This looping behavior for keepalive connections is not limited to when the reactor is disabled. There's a bug that currently means that keepalive requests effectively monopolize a thread for up to 10 sequential requests (by default) which is described here #3487.
All signs point to needing to put the keepalive request into the reactor as a part of the mitigation. However, as pointed out in this comment #3506 (comment) when request queuing is disabled we are left with a difficult path forward. We can either choose to continue monopolizing the thread and preserve the behavior that exists today. Or we can close the connection which aligns with the docs saying that this setting will not work with keepalive requests. Rather than make that determination for the user, this change suggests we should remove that codepath as an invalid configuration option.
Usage
As part of informal research into useage of the feature I asked on mastodon and no one appears to be using it:
I asked in slack groups and the only example someone could give was this issue comment where (funny enough) they're using it to try to diagnose the very same keepalive buggy behavior #2311 (comment).
From my personal experience debugging Ruby applications at Heroku, I have seen this feature used in the wild, so I know it's used. It's just that when it is used, people are not aware that they're losing slow-client protection and disabling the reactor. Usually they're after some other side-effect that disabling it caused.
In short, I don't believe it's heavilly used, and for the cases where it is used, I don't believe those are valid use cases. In the comments I added a note requesting people inform us of their use cases if they feel they have a valid one so we can investigate it and possibly add a different configuration option.
Description
Please describe your pull request. Thank you for contributing! You're the best.
Your checklist for this pull request
[ci skip]to the title of the PR.#issue" to the PR description or my commit messages.