Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rack.after_reply exceptions breaking connections #2861

Merged
merged 2 commits into from Apr 17, 2022

Conversation

BlakeWilliams
Copy link
Contributor

Description

Currently, when a rack.after_reply callable raises an exception we
attempt to handle it like other client errors by responding with an HTTP
500 response. This however doesn't work because rack.after_reply
callbacks are called after the response body has already been written to
the client.

This can cause issues with re-used connections. This is because 2 HTTP
responses are being returned for a single request. If a second HTTP
request is made before the error handling logic completes the timing can
line up causing the second HTTP response to be served a 500 from the
first HTTP requests rack.after_reply callbacks raising.

That may look roughly like:

  1. Request 1 starts, opening a reusable TCP connection
  2. Request 1 is written to and "completed"
  3. Request 1 rack.after_reply callables are called
  4. Request 2 starts, reusing the same TCP connection as request 1
  5. rack.after_reply raises, calls client_error and serves a 500
    response
  6. Request 2 receives the 500 response.

This is somewhat difficult to reproduce using HTTP clients since it's a
race condition whether or not the 500 is written at the "correct" time
or not.

To prevent this issue the rack.after_reply callables are now wrapped
in a begin/rescue/end block that rescues from StandardError and logs
instead of attempting to serve a 500 response.

Closes #2856

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

Currently, when a `rack.after_reply` callable raises an exception we
attempt to handle it like other client errors by responding with an HTTP
500 response. This however doesn't work because `rack.after_reply`
callbacks are called after the response body has already been written to
the client.

This can cause issues with re-used connections. This is because 2 HTTP
responses are being returned for a single request. If a second HTTP
request is made before the error handling logic completes the timing can
line up causing the second HTTP response to be served a 500 from the
first HTTP requests `rack.after_reply` callbacks raising.

That may look roughly like:

1. Request 1 starts, opening a reusable TCP connection
2. Request 1 is written to and "completed"
3. Request 1 `rack.after_reply` callables are called
4. Request 2 starts, reusing the same TCP connection as request 1
5. `rack.after_reply` raises, calls `client_error` and serves a 500
   response
6. Request 2 receives the 500 response.

This is somewhat difficult to reproduce using HTTP clients since it's a
race condition whether or not the 500 is written at the "correct" time
or not.

To prevent this issue the `rack.after_reply` callables are now wrapped
in a begin/rescue/end block that rescues from `StandardError` and logs
instead of attempting to serve a 500 response.
# * If we would block trying to read from the socket, we can assume that
# the erroneous 500 response wasn't/won't be written.
sleep 0.1
assert_raises IO::EAGAINWaitReadable do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the biggest fan of this test, but using Net::HTTP was flaky and the problem is pretty low-level so this is what I ended up with. It is technically testable using Net::HTTP but you have to instance_variable_get(:@socket) on the connection and I wasn't a huge fan of diving into stdlib internals.

Happy to get feedback/suggestions on alternatives here if anyone has any.

@nateberkopec
Copy link
Member

nateberkopec commented Apr 16, 2022

Now this is a high-quality first time contribution to a project! 🥇

@nateberkopec
Copy link
Member

Looks good to me but I'll leave it open for a bit to see if anyone else has anything to say

@MSP-Greg
Copy link
Member

This passes for me locally. The one thing not sure about is whether the begin/rescue block should be inside or outside the after_reply.each loop?

@BlakeWilliams
Copy link
Contributor Author

BlakeWilliams commented Apr 17, 2022

This passes for me locally. The one thing not sure about is whether the begin/rescue block should be inside or outside the after_reply.each loop?

Same, I'm wondering if it's platform differences since most of the failures are on Windows. I'll push up what I think may fix it in a bit.

I chose to put the rescue outside of the each block to keep the existing behavior where a callback raising halts the rest of the chain. Happy to change it if that's the direction y'all prefer, but it may be somewhat of a breaking change. There's a broader discussion about this behavior too in the rack spec over in the rack repo if you're interested.

@BlakeWilliams
Copy link
Contributor Author

It looks like d044ed2 fixed the failures for the windows builds but all Ruby HEAD builds are still failing. I ran the suite against master in my fork too and it looks like Ruby HEAD is failing there too (my fork is 1 commit behind though, if that matters).

@MSP-Greg
Copy link
Member

MSP-Greg commented Apr 17, 2022

@BlakeWilliams

Ignore the TestIntegrationPumactl#test_prune_bundler_with_multiple_workers failures, as this PR doesn't affect those tests.

Puma may need some changes for issues related to default gems...

EDIT: Most of the std-lib has been 'gemified', and testing against Ruby master in the actual gem repos has also caused issues.

@MSP-Greg
Copy link
Member

MSP-Greg commented Apr 17, 2022

Also, OS socket errors may differ between Ubuntu, macOS, and Windows. Kind of a pita when one is wondering why test failures vary...

@nateberkopec nateberkopec merged commit b2283d8 into puma:master Apr 17, 2022
nateberkopec pushed a commit that referenced this pull request Aug 22, 2022
* Fix rack.after_reply exceptions breaking connections

Currently, when a `rack.after_reply` callable raises an exception we
attempt to handle it like other client errors by responding with an HTTP
500 response. This however doesn't work because `rack.after_reply`
callbacks are called after the response body has already been written to
the client.

This can cause issues with re-used connections. This is because 2 HTTP
responses are being returned for a single request. If a second HTTP
request is made before the error handling logic completes the timing can
line up causing the second HTTP response to be served a 500 from the
first HTTP requests `rack.after_reply` callbacks raising.

That may look roughly like:

1. Request 1 starts, opening a reusable TCP connection
2. Request 1 is written to and "completed"
3. Request 1 `rack.after_reply` callables are called
4. Request 2 starts, reusing the same TCP connection as request 1
5. `rack.after_reply` raises, calls `client_error` and serves a 500
   response
6. Request 2 receives the 500 response.

This is somewhat difficult to reproduce using HTTP clients since it's a
race condition whether or not the 500 is written at the "correct" time
or not.

To prevent this issue the `rack.after_reply` callables are now wrapped
in a begin/rescue/end block that rescues from `StandardError` and logs
instead of attempting to serve a 500 response.

* Assert against less specific exception
JuanitoFatas pushed a commit to JuanitoFatas/puma that referenced this pull request Sep 9, 2022
* Fix rack.after_reply exceptions breaking connections

Currently, when a `rack.after_reply` callable raises an exception we
attempt to handle it like other client errors by responding with an HTTP
500 response. This however doesn't work because `rack.after_reply`
callbacks are called after the response body has already been written to
the client.

This can cause issues with re-used connections. This is because 2 HTTP
responses are being returned for a single request. If a second HTTP
request is made before the error handling logic completes the timing can
line up causing the second HTTP response to be served a 500 from the
first HTTP requests `rack.after_reply` callbacks raising.

That may look roughly like:

1. Request 1 starts, opening a reusable TCP connection
2. Request 1 is written to and "completed"
3. Request 1 `rack.after_reply` callables are called
4. Request 2 starts, reusing the same TCP connection as request 1
5. `rack.after_reply` raises, calls `client_error` and serves a 500
   response
6. Request 2 receives the 500 response.

This is somewhat difficult to reproduce using HTTP clients since it's a
race condition whether or not the 500 is written at the "correct" time
or not.

To prevent this issue the `rack.after_reply` callables are now wrapped
in a begin/rescue/end block that rescues from `StandardError` and logs
instead of attempting to serve a 500 response.

* Assert against less specific exception
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

raise in rack.after_reply has bad behavior
3 participants