Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass queued requests to thread pool on server shutdown #2122

Merged
merged 4 commits into from Apr 10, 2020

Conversation

@wjordan
Copy link
Contributor

wjordan commented Feb 20, 2020

Description

Fixes a concurrency bug during server shutdown when queue_requests is enabled (the default), which can can be minimally reproduced by the following sequence:

  • s1 and s2 both open client connections
  • s1 sends a request
  • server shutdown is triggered (but does not complete while s1's request is still in process)
  • s2 sends a request

Expected: s2 receives a successful response
Actual: s2 receives a 500 response

Here is the test implementation, currently failing on master:

def test_shutdown_queued_request
server_run app: ->(env) {
sleep 3
[204, {}, []]
}
s1 = send_http "GET / HTTP/1.1\r\n\r\n"
s2 = send_http "GET / HTTP/1.1\r\n"
sleep 1
@server.stop
sleep 1
s2 << "\r\n"
assert_match /204/, s1.gets
assert_match /204/, s2.gets
end

The underlying issue is that in a 'graceful' shutdown, the server stops accepting connections and the thread pool stops accepting new work while it waits up to force_shutdown_after seconds to finish processing currently active requests. However, a client request queued in the reactor will be rejected if it gets received after the thread pool stops accepting new work.

To fix, this PR makes a small change to shut down the reactor before the thread pool, and to disable queue_requests and pass along any remaining non-'idle' client connections to the thread pool when the reactor shuts down. This allows graceful shutdown to behave properly without prematurely aborting any requests it shouldn't.

The logic for determining 'idle' clients that can be closed immediately follows guidance from RFC 2616 section 8.1.4:

Servers SHOULD always respond to at least one request per connection,
if at all possible.

The added test depends on the small fix in #2121 to pass properly on Windows, so this branch currently includes that commit as well.

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added an entry to History.md if this PR fixes a bug or adds a feature. If it doesn't need an entry to HISTORY.md, I have added [changelog skip] the pull request title.
  • I have added appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.
@wjordan wjordan force-pushed the wjordan:empty_reactor_on_shutdown branch from b379b8a to 17af6cf Feb 20, 2020
@wjordan wjordan changed the title Empty reactor on shutdown Pass queued requests to thread pool on server shutdown Feb 20, 2020
@@ -293,8 +293,6 @@ def run(background=true)
return run_lopez_mode(background)
end

queue_requests = @queue_requests

This comment has been minimized.

Copy link
@wjordan

wjordan Feb 20, 2020

Author Contributor

This removes a change from cc4ad17, since we need to change behavior based on the updated value of the @queue_requests variable after the reactor is shutdown. I assume this was a performance optimization, but I don't know if it actually makes any measurable difference.

@wjordan wjordan mentioned this pull request Feb 21, 2020
8 of 8 tasks complete
@nateberkopec nateberkopec added the bug label Feb 21, 2020
@nateberkopec nateberkopec requested a review from evanphx Feb 21, 2020
# middle of transmitting a response, unless a network or client failure
# is suspected.
def idle?
@requests_served > 0 && !in_data_phase

This comment has been minimized.

Copy link
@nateberkopec

nateberkopec Mar 5, 2020

Member

Why would we not be idle if @requests_served is 0?

This comment has been minimized.

Copy link
@wjordan

wjordan Mar 5, 2020

Author Contributor

I interpreted the RFC statement "Servers SHOULD always respond to at least one request per connection, if at all possible" to mean that the server shouldn't consider unused (@requests_served == 0) connections to be 'idle', so shouldn't force-close them during a graceful shutdown. This makes sense to me- if a client opens a connection it usually makes at least one request pretty soon after.

Maybe #idle? isn't the clearest term for this method? I used it here because it the same RFC section uses it to refer to a connection 'the server has decided to close':

A client, server, or proxy MAY close the transport connection at any
time. For example, a client might have started to send a new request
at the same time that the server has decided to close the "idle"
connection. From the server's point of view, the connection is being
closed while it was idle, but from the client's point of view, a
request is in progress.

I also considered #can_close?, #should_close? or #closeable? if any of those seem more fitting.

This comment has been minimized.

Copy link
@nateberkopec

nateberkopec Mar 5, 2020

Member

Riiight, so if we haven't served any requests, we can just shut down. Got it.

Any of the close methods you proposed I think would be more fitting.

This comment has been minimized.

Copy link
@schneems

schneems Apr 2, 2020

Contributor

I don't think @requests_served is a good variable name here. It is (from what I understand) a count of the number of requests that have been received from the client, and not the number of requests served from the server. I think a better name might be @client_request_count.

Given that tidbit, I don't think the logic here is entirely correct. I do want to take a step back from the specific implementation though and make sure we're on the same page at a high level.

This comment has been minimized.

Copy link
@nateberkopec

nateberkopec Apr 7, 2020

Member

Given that this on client.rb, I'm OK with the variable name.

@@ -422,11 +420,12 @@ def handle_servers

@events.fire :state, @status

graceful_shutdown if @status == :stop || @status == :restart
if queue_requests

This comment has been minimized.

Copy link
@evanphx

evanphx Mar 5, 2020

Member

I presume this needs to be @queue_requests given your change. But I'd also rather you not change this to an ivar read and leave it as a local var.

This comment has been minimized.

Copy link
@wjordan

wjordan Apr 6, 2020

Author Contributor

There were actually two separate queue_requests local vars- one within Server#run, and one within Server#handle_servers.

The local var in #run had to be removed in order for this PR to work at all- see my comment at #2122 (review). However, this one in #handle_servers actually doesn't need to be changed as well, so I left it as-is to minimize the diff.

That said, since this local var in #handle_servers is only ever read on server shutdown and probably has no performance impact, I would lean towards removing it for consistency/simplicity if that's your preference.

This comment has been minimized.

Copy link
@nateberkopec

nateberkopec Apr 7, 2020

Member

This explanation works for me.

@nateberkopec
Copy link
Member

nateberkopec commented Mar 5, 2020

  • Needs a change on the idle method name
  • Leave queue requests as local var
Copy link
Contributor

schneems left a comment

Thanks a ton for your PR. It was really hard to review, mostly due to the complexity of the subject matter as well as complexity of Puma internals. I can only imagine that it must have taken you much longer to write this PR.

I want to take a step back from the implementation here and talk about some high level questions and goals

Is this a regression?

My high level question is: Did this used to work? Is this a regression or is this something new we're trying to find/fix? It looks like this might be related to #2200 but...I don't see where something changed in the reactor. Did something else get changed somewhere? If so, can we git bisect and understand what broke the behavior?

If you go back to 4.0.0 and run the same test does it pass or fail?

Side note: I would love if this process of seeing when a given test would have worked for failed was easier to do somehow. Committing it into git means we can't git bisect

Regardless of if this was broken or not, another question is: What do we want to happen? While digging in the code I found a few situations, and two of them have semi-undefined behavior:

Scenario A) Client 2 is fully buffered

  • Client Request 1 comes in and is fully buffered
  • Client Request 2 comes in and is partially buffered
  • Server gets shutdown request
  • Client Request 2 sends rest of its request before shutdown timeout
  • Server generates and delivers full response of Request 1, sends response
  • Desired: Reactor passes the request to the thread pool before shutting down for a response

I think we can all agree on this.

Scenario B) Client 2 CAN be (but is not yet) fully buffered

  • Client Request 1 comes in and is fully buffered
  • Client Request 2 comes in and is partially buffered
  • Server gets shutdown request
  • Client Request 2 sends rest of its request before shutdown timeout
  • Server generates and delivers full response of Request 1, sends response
  • Desired: Reactor keeps trying to buffer the request to see if it can pass it to the thread pool to be processed before shutting down?

Is this true? Do we want to do that? The downside is we have no way of knowing if the client will ever send the rest of the request in time. How do we detect the difference between scenario B to scenario C where we need to serve need a default response

Scenario C) Client 2 cannot fully buffered

  • Client Request 1 comes in and is fully buffered
  • Client Request 2 comes in and is partially buffered
  • Server gets shutdown request
  • Client Request 2 DOES NOT send rest of its request
  • Server waits for full response of Request 1, sends response
  • Desired: ???

If we pass a partial response to the threadpool for processing, there's no guarantee that it can do anything with it, it might be a half formed json body, that cannot be decoded. While RFC 2616 says we "should" send at least one response, what should we send? If we pass it to the thread pool in this state it's going to be a 500. I think that's not accurate. A 503 would perhaps be more appropriate.

Questions

  1. Is this a regression?
  2. If you pass a non-fully buffered client request to the thread pool what happens? Does it try to buffer it? cc/ @nateberkopec @evanphx
  3. What do we want to do when the client can be fully buffered? (scenario B)
  4. What do we want to do when the client cannot be fully buffered? (scenario C)
# middle of transmitting a response, unless a network or client failure
# is suspected.
def idle?
@requests_served > 0 && !in_data_phase

This comment has been minimized.

Copy link
@schneems

schneems Apr 2, 2020

Contributor

I don't think @requests_served is a good variable name here. It is (from what I understand) a count of the number of requests that have been received from the client, and not the number of requests served from the server. I think a better name might be @client_request_count.

Given that tidbit, I don't think the logic here is entirely correct. I do want to take a step back from the specific implementation though and make sure we're on the same page at a high level.

@wjordan wjordan force-pushed the wjordan:empty_reactor_on_shutdown branch 3 times, most recently from e76799b to 9a0e207 Apr 4, 2020
@wjordan wjordan force-pushed the wjordan:empty_reactor_on_shutdown branch from 9a0e207 to 941262e Apr 6, 2020
Cover various scenario combinations related to timeout settings,
queue_requests configuration, and post/get request type.
@wjordan wjordan force-pushed the wjordan:empty_reactor_on_shutdown branch from 941262e to d8b2680 Apr 6, 2020
@wjordan
Copy link
Contributor Author

wjordan commented Apr 6, 2020

My high level question is: Did this used to work? Is this a regression or is this something new we're trying to find/fix?

As far as I can tell, this never used to work since the reactor implementation has existed. For example, on bb2aeb8 (first commit where the queue_requests configuration option is available), the test in this PR fails with queue_requests = true and passes with queue_requests = false.

I only encountered this issue as an intermittent unit-test failure while working on #2123, and worked backwards from there to create a minimal reproducible failing test and fix. In any case it seems possible that #2200 is caused by this underlying issue, though I can't be sure.

Regardless of if this was broken or not, another question is: What do we want to happen?
What do we want to do when the client can be fully buffered? (scenario B)
What do we want to do when the client cannot be fully buffered? (scenario C)

I wasn't sure how Scenario 'A' differed from 'B' (identical except for the 'desired' part?), so I'll just address B and C.

  • For B, I agree on behavior - for a graceful shutdown, the server should wait to finish receiving the request, and send a response.

    However, the implementation is a bit different. When the server gets a shutdown request, it immediately shuts the reactor down (passing pending requests to the thread pool) and sets queue_requests to false. The thread pool then continues reading the partially-buffered client requests and processing responses in its active threads, using the queue_requests = false code path as though the reactor never existed.

  • For C, the difference from B is that the server doesn't wait forever for the request to finish, it closes the connection after a configurable timeout. (The HTTP spec says that 'Servers will usually have some time-out value beyond which they will no longer maintain an inactive connection,' and allows that 'A client, server, or proxy MAY close the transport connection at any time.')

    In Puma, this timeout is configurable by the lesser of first_data_timeout (default 30 seconds) or force_shutdown_after (default forever)

    Also, historically (since #451), if a request is in the 'data phase' (request headers have been sent and a request body is in-flight) when it times out, Puma returns a 408 response before it drops the connection instead of just dropping without any response. (Call this Scenario C2) I think this is a minor detail, but ideally behavior during shutdown should be consistent with this existing behavior.

  • Finally, worth noting that if the server is busy with a response when the force-shutdown timeout completes, the server responds with a 503 error (uncaught Puma::ThreadPool::ForceShutdown exception). Worth pointing this out as a separate related scenario (call it Scenario D).

To clarify the behavior across all combinations of scenarios, I added a few extra test cases to this PR. Let me know if these are clear enough and if the expected behavior is reasonable, or if there are any additional scenarios you think should be covered.

If you pass a non-fully buffered client request to the thread pool what happens? Does it try to buffer it? cc/ @nateberkopec @evanphx

Yes, this is what happens when queue_requests is set to false, and what this PR does when the server gets a shutdown request.

Update method documentation based on feedback.
@evanphx evanphx self-requested a review Apr 10, 2020
Copy link
Member

evanphx left a comment

Thanks for the hard work. You arrived at a very understandable place, looks good!

@nateberkopec nateberkopec merged commit 19b2a21 into puma:master Apr 10, 2020
27 checks passed
27 checks passed
build
Details
ubuntu-18.04, 2.2
Details
ubuntu-18.04, 2.3
Details
ubuntu-18.04, 2.4
Details
ubuntu-18.04, 2.5
Details
ubuntu-18.04, 2.6
Details
ubuntu-18.04, 2.7
Details
ubuntu-18.04, head
Details
ubuntu-18.04, jruby
Details
ubuntu-18.04, truffleruby-head
Details
macos, 2.2
Details
macos, 2.3
Details
macos, 2.4
Details
macos, 2.5
Details
macos, 2.6
Details
macos, 2.7
Details
macos, head
Details
macos, jruby
Details
macos, truffleruby-head
Details
windows-latest, 2.2
Details
windows-latest, 2.3
Details
windows-latest, 2.4
Details
windows-latest, 2.5
Details
windows-latest, 2.6
Details
windows-latest, 2.7
Details
windows-latest, mingw
Details
ubuntu-latest, jruby-head ubuntu-latest, jruby-head
Details
@nateberkopec
Copy link
Member

nateberkopec commented Apr 10, 2020

👏

@aaronjensen
Copy link

aaronjensen commented Apr 27, 2020

🎊 Is there a timeframe for the next release of puma?

@nateberkopec
Copy link
Member

nateberkopec commented Apr 27, 2020

Basically the two PRs listed in the milestone have to get merged: https://github.com/puma/puma/milestone/7

@aaronjensen
Copy link

aaronjensen commented Apr 28, 2020

Ah, major release is next. Okay, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.