Skip to content

Deprecate queue_requests from DSL#3524

Closed
schneems wants to merge 1 commit intomasterfrom
schneems/deprecate_queue_requests
Closed

Deprecate queue_requests from DSL#3524
schneems wants to merge 1 commit intomasterfrom
schneems/deprecate_queue_requests

Conversation

@schneems
Copy link
Contributor

  • Short: Removing this feature makes the internals easier to reason about and I do not believe there's a good reason to disable the reactor that we want to support. Research also points to disabling of the reactor to be an implementation detail in an initial effort to configure the ACCEPT behavior, and over time it morphed into effectively meaning "disable the reactor"

  • Long:

Background

The queue_requests config was originally introduced in #640 in 2015. It is marked as closing this issue #612 (opened in 2014, ten years ago).

The use case in the issue for this option to disable the reactor was this:

A worker with 1 thread needs to send a request to itself to finish. The example given is downloading a PDF from the same server. If the reactor is accepting requests as fast as it can, the same worker that sent the request might accept the request, however, since it cannot complete the first request (waiting on the second), the second request will never execute. i.e. it will deadlock.

The problem described does not come from having a reactor that buffers request bodies, but rather from the application ACCEPT-ing a new request from the socket before it had capacity to process it. Disabling the reactor was effectively an implementation detail.

The issue was opened nearly 10 years ago and the justification given for wanting to run with one max thread is that "not all code is thread safe". In the decade since, the Ruby community has come a long way. Sidekiq's threaded job queue has become the industry standard. It's become an expectation that all Ruby code should be threadsafe for quite some time now. An application can avoid this deadlock situation by either not requiring this functionality (making requests to itself) or running with more threads. It's also worth noting that the possibility of deadlock is not zero when moving to 2 threads, as both those threads might be executing a "give me a PDF" request, however the chances are decreased. I do not believe this to be a primary use case we need to support, and further, we've since mitigated the issue in other ways.

In the same PR a wait_until_not_full method was introduced. But it was only called if queue_requests was disabled. This method still lives on today, but is called unconditionally before the socket is read.

pool.wait_until_not_full

What does wait_until_not_full do? It is what I call "anti-thundering-herd" measures where a worker will first check if it has capacity to handle a new request (it sees if all threads are busy or not) and only when it has capacity does it ACCEPT a new request from the socket. The situation for a thundering herd looks like this: Puma boots with multiple workers, one worker boots before the others and accepts more requests than it can process. The other workers boot and have nothing to work on and are idle. However it didn't always work as intended. This is one fix 9690d8f. I was present in the 2017 roundtable mentioned in the commit. Since then, this logic has been relatively stable.

Since that was introduced, we also introduced wait_for_less_busy_worker which adds a tiny sleep to the worker to handle a related yet different scenario caused by OS thread scheduling. In this scenario, the OS might favor executing a busy worker rather than one with capacity to take new requests. More details are in the PR where it was added #2079. The solutio was to add a really tiny sleep to the busy workers to hint to the OS to let the less busy ones be scheduled. This became a default

Queue requests confusion

The queue_requests config that was originally introduced to gate ACCEPT behavior, has eventually come to mean "disable the reactor" in the current codebase as defaults have changed and morphed over the years.

While the original docs introduced in that PR hint that enabling it deactivates keepalive functionality:

 # Note that setting this to false disables HTTP keepalive and
 # slow clients will occupy a handler thread while the request
 # is being sent. A reverse proxy, such as nginx, can handle
 # slow clients and queue requests before they reach puma.

It wasn't exactly clear what that meant. i.e. when a keepalive request is made to puma with queue requests disabled, what should be the expected behavior?

The released version of puma checks for several conditions before determining whether or not to send a "close" header to a keepalive request

puma/lib/puma/request.rb

Lines 167 to 176 in d2b3b30

force_keep_alive = if @enable_keep_alives
requests < @max_fast_inline ||
@thread_pool.busy_threads < @max_threads ||
!client.listener.to_io.wait_readable(0)
else
# Always set force_keep_alive to false if the server has keep-alives not enabled.
false
end
resp_info = str_headers(env, status, headers, res_body, io_buffer, force_keep_alive)
. The ability to deactivate this keepalive behavior is recent and is staged on main, it was introduced in #3496. Notably even HEAD does not deactivate this behavior based on the state of queue_requests.

The current behavior is that while that logic returns true and the request is not placed back into the reactor (not possible when queue_requests is false) it will loop and continue to accept keepalive requests over the same connection because this code

puma/lib/puma/server.rb

Lines 468 to 470 in 9712d29

case handle_request(client, requests + 1)
when false
break
is inside of a while loop.

Keepalive connection

This looping behavior for keepalive connections is not limited to when the reactor is disabled. There's a bug that currently means that keepalive requests effectively monopolize a thread for up to 10 sequential requests (by default) which is described here #3487.

All signs point to needing to put the keepalive request into the reactor as a part of the mitigation. However, as pointed out in this comment #3506 (comment) when request queuing is disabled we are left with a difficult path forward. We can either choose to continue monopolizing the thread and preserve the behavior that exists today. Or we can close the connection which aligns with the docs saying that this setting will not work with keepalive requests. Rather than make that determination for the user, this change suggests we should remove that codepath as an invalid configuration option.

Usage

As part of informal research into useage of the feature I asked on mastodon and no one appears to be using it:

I asked in slack groups and the only example someone could give was this issue comment where (funny enough) they're using it to try to diagnose the very same keepalive buggy behavior #2311 (comment).

From my personal experience debugging Ruby applications at Heroku, I have seen this feature used in the wild, so I know it's used. It's just that when it is used, people are not aware that they're losing slow-client protection and disabling the reactor. Usually they're after some other side-effect that disabling it caused.

In short, I don't believe it's heavilly used, and for the cases where it is used, I don't believe those are valid use cases. In the comments I added a note requesting people inform us of their use cases if they feel they have a valid one so we can investigate it and possibly add a different configuration option.

Description

Please describe your pull request. Thank you for contributing! You're the best.

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

- Short: Removing this feature makes the internals easier to reason about and I do not believe there's a good reason to disable the reactor that we want to support. Research also points to disabling of the reactor to be an implementation detail in an initial effort to configure the ACCEPT behavior, and over time it morphed into effectively meaning "disable the reactor"

- Long: 

## Background

The `queue_requests` config was originally introduced in #640 in 2015. It is marked as closing this issue #612 (opened in 2014, ten years ago). 

The use case in the issue for this option to disable the reactor was this: 

A worker with 1 thread needs to send a request to itself to finish. The example given is downloading a PDF from the same server. If the reactor is accepting requests as fast as it can, the same worker that sent the request might accept the request, however, since it cannot complete the first request (waiting on the second), the second request will never execute. i.e. it will deadlock.

The problem described does not come from having a reactor that buffers request bodies, but rather from the application ACCEPT-ing a new request from the socket before it had capacity to process it. Disabling the reactor was effectively an implementation detail.

The issue was opened nearly 10 years ago and the justification given for wanting to run with one max thread is that "not all code is thread safe". In the decade since, the Ruby community has come a long way. Sidekiq's threaded job queue has become the industry standard. It's become an expectation that all Ruby code should be threadsafe for quite some time now. An application can avoid this deadlock situation by either not requiring this functionality (making requests to itself) or running with more threads. It's also worth noting that the possibility of deadlock is not zero when moving to 2 threads, as both those threads might be executing a "give me a PDF" request, however the chances are decreased. I do not believe this to be a primary use case we need to support, and further, we've since mitigated the issue in other ways.

In the same PR a `wait_until_not_full` method was introduced. But it was only called if queue_requests was disabled. This method still lives on today, but is called unconditionally before the socket is read. https://github.com/puma/puma/blob/a59afabe782ec8869bb709ca991b32c0b2cc95fd/lib/puma/server.rb#L363

What does `wait_until_not_full` do? It is what I call "anti-thundering-herd" measures where a worker will first check if it has capacity to handle a new request (it sees if all threads are busy or not) and only when it has capacity does it ACCEPT a new request from the socket. The situation for a thundering herd looks like this: Puma boots with multiple workers, one worker boots before the others and accepts more requests than it can process. The other workers boot and have nothing to work on and are idle. However it didn't always work as intended. This is one fix 9690d8f. I was present in the 2017 roundtable mentioned in the commit. Since then, this logic has been relatively stable.

Since that was introduced, we also introduced `wait_for_less_busy_worker` which adds a tiny sleep to the worker to handle a related yet different scenario caused by OS thread scheduling. In this scenario, the OS might favor executing a busy worker rather than one with capacity to take new requests. More details are in the PR where it was added #2079. The solutio was to add a really tiny sleep to the busy workers to hint to the OS to let the less busy ones be scheduled. This became a default 

## Queue requests confusion

The `queue_requests` config that was originally introduced to gate ACCEPT behavior, has eventually come to mean "disable the reactor" in the current codebase as defaults have changed and morphed over the years.

While the original docs introduced in that PR hint that enabling it deactivates keepalive functionality:

>      # Note that setting this to false disables HTTP keepalive and
>      # slow clients will occupy a handler thread while the request
>      # is being sent. A reverse proxy, such as nginx, can handle
>      # slow clients and queue requests before they reach puma.

It wasn't exactly clear what that meant. i.e. when a keepalive request is made to puma with queue requests disabled, what should be the expected behavior?

The released version of puma checks for several conditions before determining whether or not to send a "close" header to a keepalive request https://github.com/puma/puma/blob/d2b3b309bf76f252819cf6c89d72249895daf629/lib/puma/request.rb#L167-L176. The ability to deactivate this keepalive behavior is recent and is staged on main, it was introduced in #3496. Notably even HEAD does not deactivate this behavior based on the state of `queue_requests`.

The current behavior is that while that logic returns `true` and the request is not placed back into the reactor (not possible when queue_requests is false) it will loop and continue to accept keepalive requests over the same connection because this code https://github.com/puma/puma/blob/9712d29bc156a7fa05a6f4ec6deda5ac20df50be/lib/puma/server.rb#L468-L470 is inside of a while loop.

## Keepalive connection

This looping behavior for keepalive connections is not limited to when the reactor is disabled. There's a bug that currently means that keepalive requests effectively monopolize a thread for up to 10 sequential requests (by default) which is described here #3487.

All signs point to needing to put the keepalive request into the reactor as a part of the mitigation. However, as pointed out in this comment #3506 (comment) when request queuing is disabled we are left with a difficult path forward. We can either choose to continue monopolizing the thread and preserve the behavior that exists today. Or we can close the connection which aligns with the docs saying that this setting will not work with keepalive requests. Rather than make that determination for the user, this change suggests we should remove that codepath as an invalid configuration option.

## Usage

As part of informal research into useage of the feature I asked on mastodon and no one appears to be using it:

- Link: https://ruby.social/@Schneems/113289823703358711
- Screenshot: https://www.dropbox.com/s/9t1ur0rpqz5zkku/Screenshot%202024-10-14%20at%2012.43.27%E2%80%AFPM.png?dl=0

I asked in slack groups and the only example someone could give was this issue comment where (funny enough) they're using it to try to diagnose the very same keepalive buggy behavior #2311 (comment).

From my personal experience debugging Ruby applications at Heroku, I have seen this feature used in the wild, so I know it's used. It's just that when it is used, people are not aware that they're losing slow-client protection and disabling the reactor. Usually they're after some other side-effect that disabling it caused.

In short, I don't believe it's heavilly used, and for the cases where it is used, I don't believe those are valid use cases. In the comments I added a note requesting people inform us of their use cases if they feel they have a valid one so we can investigate it and possibly add a different configuration option.
@dentarg
Copy link
Member

dentarg commented Nov 7, 2024

It's become an expectation that all Ruby code should be threadsafe for quite some time now.

Why is GitHub and Shopify still on forking web servers then 😆 😛

Comment on lines +584 to +587
if max == 1
warn "[WARNING] Puma config `threads #{min}, #{max}` has a max thread count of 1. If the application makes"
warn " blocking HTTP calls to itself, it may deadlock. Consider increasing the max thread count."
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a questionable warning to add, isn't running multiple workers each with one thread a legitime use-case with Puma?

$ echo 'app { [200, {}, ["OK"]] }' | ruby -Ilib ./bin/puma --config /dev/stdin --port 0 --log-requests --workers 3 --threads 1:1
[WARNING] Puma config `threads 1, 1` has a max thread count of 1. If the application makes
          blocking HTTP calls to itself, it may deadlock. Consider increasing the max thread count.
[70321] Puma starting in cluster mode...
[70321] * Puma version: 6.4.3 (ruby 3.3.5-p100) ("The Eagle of Durango")
[70321] *  Min threads: 1
[70321] *  Max threads: 1
[70321] *  Environment: development
[70321] *   Master PID: 70321
[70321] *      Workers: 3
[70321] *     Restarts: (✔) hot (✔) phased
[70321] * Listening on http://0.0.0.0:64174
[70321] Use Ctrl-C to stop
[70322] + Gemfile in context: /Users/dentarg/src/puma/Gemfile
[70323] + Gemfile in context: /Users/dentarg/src/puma/Gemfile
[70324] + Gemfile in context: /Users/dentarg/src/puma/Gemfile
[70321] - Worker 0 (PID: 70322) booted in 0.0s, phase: 0
[70321] - Worker 1 (PID: 70323) booted in 0.0s, phase: 0
[70321] - Worker 2 (PID: 70324) booted in 0.0s, phase: 0
^C[70321] - Gracefully shutting down workers...
[70321] === puma shutdown: 2024-11-07 21:18:42 +0100 ===
[70321] - Goodbye!

Comment on lines 1177 to 1181
def queue_requests(answer=true)
warn "[DEPRECATION] Puma config `queue_requests` is deprecated and will be removed in Puma 7.0.0."
warn " In Puma 7.0+ the reactor cannot be deactivated."

@options[:queue_requests] = answer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board with the deprecation, but there should probably be a way to silence the warning.

Similar to silence_single_worker_warning

puma/lib/puma/cluster.rb

Lines 533 to 540 in 29fa3f5

def single_worker_warning
return if @options[:workers] != 1 || @options[:silence_single_worker_warning]
log "! WARNING: Detected running cluster mode with 1 worker."
log "! Running Puma in cluster mode with a single worker is often a misconfiguration."
log "! Consider running Puma in single-mode (workers = 0) in order to reduce memory overhead."
log "! Set the `silence_single_worker_warning` option to silence this warning message."
end

@github-actions github-actions bot added the waiting-for-review Waiting on review from anyone label Nov 23, 2024
@nateberkopec
Copy link
Member

I feel like rolling back #612 and saying "actually we're not going to support that" is a bigger deal than you're giving it credit for. "It is less likely to deadlock" isn't useful, it's not a great user story to say "well you'll only deadlock 1% of the time".

@dentarg
Copy link
Member

dentarg commented Aug 30, 2025

Do you still want to do this @schneems?

@schneems
Copy link
Contributor Author

schneems commented Sep 1, 2025

At the time I was frustrated having to handle the no queue case for fixing keepalives. I am still interested in where in the internals we could simplify things by removing little used features if it means large maintenance gains. But my will to drive this PR forwards is muted.

my priority right now is debugging the perf differences between 6.6.1 and 7pre. I might come back to this in the future. Good to close for now.

@schneems schneems closed this Sep 1, 2025
@schneems
Copy link
Contributor Author

Worth noting that Puma 7 removed wait_until_not_full behavior. So the semantics have changed, yet again. We handled the no queue case for keepalives by closing them, but didn't consider the "action at a distance" expectations of setting this config and removing wait_until_not_full.

It's unclear to me how exactly the original code would have even prevented deadlock via accept (as I described earlier), as even if the server never accepts the incoming request, it would still queue on the socket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs maintenance waiting-for-review Waiting on review from anyone

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants