Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Net::HTTP#start(&block) to ensure closed TCP connections #1117

Merged
merged 4 commits into from
Feb 7, 2020
Merged

Use Net::HTTP#start(&block) to ensure closed TCP connections #1117

merged 4 commits into from
Feb 7, 2020

Conversation

f3ndot
Copy link
Contributor

@f3ndot f3ndot commented Jan 21, 2020

Despite the API indicating otherwise, using Net::HTTP#get (or #request) will open a TCP connection and it will remain open long after the request is finished, even though calling Net::HTTP#finish subsequently raises an IOError and the #to_s shows an incorrect "open=false".

Using the Net::HTTP#start block to explicitly open the connection causes Net::HTTP to fully close the TCP connection once the block complete evaluation.

This was verified by doing the following:

  1. Running a webserver on a *nix OS (selected default nginx on Ubuntu docker)
  2. Running watch 'netstat -tunapl' on that OS
  3. Run curl to GET that webserver many times. Note how there are no "TIME_WAIT" sockets
  4. In IRB run Net::HTTP.get(...) many times. Note how there are no "TIME_WAIT" sockets
  5. In IRB with rest-client run RestClient.get(...) many times. Note how there are no "TIME_WAIT" sockets
  6. In IRB with faraday run Faraday.get(...) many times. Note how there are n number of "TIME_WAIT" sockets on the webserver OS. They timeout and clear after 60s:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      3511/nginx: master
tcp        0      0 172.17.0.2:80           172.17.0.1:60470        TIME_WAIT   -
tcp        0      0 172.17.0.2:80           172.17.0.1:60472        TIME_WAIT   -
tcp        0      0 172.17.0.2:80           172.17.0.1:60478        TIME_WAIT   -
tcp        0      0 172.17.0.2:80           172.17.0.1:60476        TIME_WAIT   -
tcp        0      0 172.17.0.2:80           172.17.0.1:60468        TIME_WAIT   -
tcp        0      0 172.17.0.2:80           172.17.0.1:60474        TIME_WAIT   -
...

This matters because a server receiving many HTTP requests from clients using Faraday will begin to oversaturate and timeout past a particular scale. We ran into this limit.

In short, #start must be used first before any requests are made on a Net::HTTP instance otherwise you'll get a dangling TCP connection server-side.

Todos

  • Changelog

Despite the API indicating otherwise, using Net::HTTP#get (or #request) will open a TCP connection and it will remain open long after the request is finished, even though calling Net::HTTP#finish subsequently raises an IOError and the #to_s shows an incorrect "open=false".

Using the Net::HTTP#start block to explicitly open the connection causes Net::HTTP to fully close the TCP connection once the block complete evaluation.

This was verified by doing the following:

1. Running a webserver on a *nix OS (selected default nginx on Ubuntu docker)
2. Running `watch 'netstat -tunapl'` on that OS
3. Run `curl` to GET that webserver many times. Note how there are no "TIME_WAIT" sockets
4. In IRB run `Net::HTTP.get(...)` many times. Note how there are no "TIME_WAIT" sockets
5. In IRB with `rest-client` run `RestClient.get(...)` many times. Note how there are no "TIME_WAIT" sockets
6. In IRB with `faraday` run `Faraday.get(...)` many times. Note how there are n number of "TIME_WAIT" sockets on the webserver OS. They timeout and clear after 60s:

```
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      3511/nginx: master
tcp        0      0 172.17.0.2:80           172.17.0.1:60470        TIME_WAIT   -
tcp        0      0 172.17.0.2:80           172.17.0.1:60472        TIME_WAIT   -
tcp        0      0 172.17.0.2:80           172.17.0.1:60478        TIME_WAIT   -
tcp        0      0 172.17.0.2:80           172.17.0.1:60476        TIME_WAIT   -
tcp        0      0 172.17.0.2:80           172.17.0.1:60468        TIME_WAIT   -
tcp        0      0 172.17.0.2:80           172.17.0.1:60474        TIME_WAIT   -
...
```

This matters because a server receiving many HTTP requests from clients using Faraday will begin to oversaturate and timeout past a particular scale. We ran into this limit.

In short, #start must be used *first* before any requests are made on a Net::HTTP instance otherwise you'll get a dangling TCP connection server-side.
@f3ndot
Copy link
Contributor Author

f3ndot commented Jan 21, 2020

Build failures match my local rspec failures that occur when running suite against master.

@iMacTia
Copy link
Member

iMacTia commented Feb 6, 2020

Hi @f3ndot, thanks for the contribution, it makes sense after having a quick look.
Apologies for the test suite issue, we're currently investigating it (#1119), but we're being a bit slow due to the team being a little busy in these weeks.
As soon as that is fixed we'll merge master into your branch and we should be able to progress, thanks for your patience.

@iMacTia
Copy link
Member

iMacTia commented Feb 6, 2020

@f3ndot for some reason the "Update branch" button is not available 🤔...
Could you please pull the latest changes from master? That should fix tests!

byronwolfman and others added 2 commits February 6, 2020 15:58
I was mistaken and it is a different beast than raw net/http. The issue does not apply on the persistent adapter.
@f3ndot
Copy link
Contributor Author

f3ndot commented Feb 7, 2020

@iMacTia all set! I had to remove it for net-persistent-http since they have a bit of a different interface. As far as I can tell the user invokes #shutdown when they're done with a connection anyhow.

Copy link
Member

@iMacTia iMacTia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, tests are green, seems to solve an important issue 😃
LGTM

@iMacTia iMacTia merged commit 8560572 into lostisland:master Feb 7, 2020
@f3ndot f3ndot deleted the safely-open-and-close-net-http-connections branch February 7, 2020 15:00
@byronwolfman
Copy link
Contributor

Hey friends, any chance we could get a release cut on this?

@AGarrow
Copy link

AGarrow commented Mar 10, 2020

Hi @iMacTia 👋 ,

do you guys know when there might be a release cut for this?

thanks!

@iMacTia
Copy link
Member

iMacTia commented Mar 28, 2020

Sorry about the delay! Waiting for #1125 to get merged as well and then I guess we can get a release done 😃

@iMacTia
Copy link
Member

iMacTia commented Mar 29, 2020

Faraday v1.0.1 is out 🎉!
https://github.com/lostisland/faraday/releases/tag/v1.0.1

@f3ndot
Copy link
Contributor Author

f3ndot commented Mar 30, 2020

@iMacTia thank you for doing this.

I dug deeper into the "why" and figured I'd post my findings here.

I was originally mistaken in describing the issue. The TIME_WAIT TCP socket status we're seeing is a direct result of an explicitly closed TCP connection (see TCP diagram). This is normal because in TCP/IP, connections can not be closed immediately and packets may arrive out of order. The timeout is part of making sure a connection closed properly.

In HTTP/1.0 the default server behaviour was to close each and every HTTP request it received (aka behave as if HTTP header Connection: close was sent). This produced the behaviour what I described above, and is more laborious on the server side due to the timeout.

With HTTP/1.1 the spec changed the default server behaviour to keeping the connection alive (akin to Connection: keep-alive header) so that a server doesn't waste CPU resources creating and tearing down a TCP connection (thus inducing TIME_WAIT for 1-4min, depending on server OS) every single time an HTTP request is sent. This is the modern webserver behaviour we're used to seeing.

The smoking gun is code in Ruby that's over 20 years old. It assumes all HTTP servers are 1.0 and explicitly adds a Connection: close header under certain conditions. This code was moved around and changed over the years, so the header appendage only happens in one of the many possible ways to use Net::HTTP, proof that this is bugged behaviour/code rot:

net = Net::HTTP.new('localhost', 8080)
50.times { net.get('/') } # is bad: connection close header appended
50.times { Net::HTTP.get(URI('http://localhost:8080/')) } # is OK: no header appended
net = Net::HTTP.new('localhost', 8080)
net.start
50.times { net.get('/') } # is OK: no header appended
net.finish
net = Net::HTTP.new('localhost', 8080)
50.times { net.start { net.get('/') } } # is OK: no header appended

The fact the client is making assumptions about what the server wants and explicitly adding an Connection: close header is interesting, since the server already makes default decisions in absence of the header. I don't know why the Ruby code would try and set it in the first place. Why not let the server decide in absence of a programmer stating explicitly in their userland code?

In any event, I ended up filing a bug report to Ruby about it: https://bugs.ruby-lang.org/issues/16559

@iMacTia
Copy link
Member

iMacTia commented Apr 2, 2020

Thanks @f3ndot for the extensive explanation, I'm sure future readers will find this extremely helpful 😃

@pvdb
Copy link
Contributor

pvdb commented Apr 1, 2021

seems to solve an important issue 😃

I don't necessarily agree that it does... as mentioned by @jeremyevans on the Ruby bug report filed by @f3ndot, this is just making a different trade-off; it also fundamentally changed the behaviour of the Faraday adapter w.r.t. HTTP connections, so it was far more than a patch release, really (v1.0.0 to v1.0.1).

Furthermore, this change now explicitly prevents the use of persistent HTTP/1.1 connections when using the NetHttp Faraday adapter, something that is supported by and perfectly doable with Ruby's underlying Net::HTTP stdlib, and was easy to do with Faraday's own NetHttp adapter before this change.

I'm fully aware of drbrain/net-http-persistent and Faraday's corresponding adapter, but that's not always a suitable option.

And even the net-http-persistent README states that it's just a convenience wrapper around Net::HTTP's native capabilities, not some new functionality that's missing from the underlying stdlib:

Net::HTTP supports persistent connections with some API methods but does not make setting up a single persistent connection or managing multiple connections easy. Net::HTTP::Persistent wraps Net::HTTP and [...]

The changes in this PR explicitly make the native Net::HTTP support for persistent connections inaccessible when using Faraday's NetHttp adapter, which is a massive step backwards, IMO.

At the very least it should be configurable, and arguably closing the connection after each request shouldn't even be the default behaviour of the adapter, but rather something you can tell it to do; that way you keep parity with previous Faraday behaviour, and you can still achieve what this PR is doing, just not by hard coding it.

I see the adapter has since moved to a separate gem so I'll cobble together a quick PR for y'all's consideration. 🙂

@iMacTia
Copy link
Member

iMacTia commented Apr 1, 2021

Thanks a lot @pvdb, the main reason for us moving adapters out is to make it easier for others to contribute on this kind of issues since the core team can't possible know all the inside-outs of each adapter.
We'll also be able to have more granular control over releases semver as they'll be specific to those adapters.

I'm sure a PR on the new faraday-net_http repo will be more then welcome! Please feel free to tag a few people from this thread so we get more eyes to review it 👀!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants