Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache Faraday::Connection for persistent adapters #322

Closed

Conversation

drbrain
Copy link
Contributor

@drbrain drbrain commented May 7, 2020

In order to be a good web citizen and to allow maximum performance HTTP
clients should use persistent connections to make requests. This has
multiple benefits:

  • Users won't be delayed by TCP slow-start figuring out the window size
    for every HTTP request
  • Users won't be delayed by TLS handshakes for each new request
  • Bandwidth overhead per request is reduced through large TCP windows on
    established TLS connections.

Some Slack endpoints are paginated and may take many requests to fetch
all necessary data. Using persistent connections can result in a 4x
speed boost.

Slack::Web::Client uses Faraday which uses Net::HTTP by default. While
Net::HTTP is capable of persistent connections Net::HTTP + Faraday is
not configured this way. The Faraday::Adapter::NetHttpPersistent
adapter does.

Slack::Web::Client uses Faraday#default_adapter so the end user can
switch adapters are used like this:

Faraday.default_adapter = Faraday::Adapter::NetHttpPersistent

c = Slack::Web::Client.new …

Unfortunately Slack::Web::Client does not cache the Faraday::Connection
object and instead creates a new connection for every request. This
breaks the ability to use persistent connections with
Slack::Web::Client.

This can be observed through using Wireshark with an ssl.handshake
filter, or by observing SYN packets sent by tcpdump host slack.com.

This patch adds caching of the Faraday::Connection object to
Slack::Web::Client to allow caching of connections through Faraday. A
cached connection can be reused as a persistent connection.

Recommending users use a persistent connection adapter (like
Faraday::Adapter::NetHttpPersistent) is not part of this pull request,
but I do recommend it. Your users should see an easy speed improvement.

Here is a time output from a script that fetches ~4,000 conversations
using the net-http-persistent adapter without this patch (making it
ineffective):

$ time ruby t.rb
ruby t.rb  2.46s user 0.25s system 14% cpu 18.389 total

and with this patch:

$ time ruby t.rb
ruby t.rb  0.99s user 0.20s system 13% cpu 9.053 total

(This is only a 2x speed boost as the slack conversation.list API is
highly variable in response time.)

In order to be a good web citizen and to allow maximum performance HTTP
clients should use persistent connections to make requests.  This has
multiple benefits:

* Users won't be delayed by TCP slow-start figuring out the window size
  for every HTTP request
* Users won't be delayed by TLS handshakes for each new request
* Bandwidth overhead per request is reduced through large TCP windows on
  established TLS connections.

Some Slack endpoints are paginated and may take many requests to fetch
all necessary data.  Using persistent connections can result in a 4x
speed boost.

Slack::Web::Client uses Faraday which uses Net::HTTP by default.  While
Net::HTTP is capable of persistent connections Net::HTTP + Faraday is
not configured this way.  The Faraday::Adapter::NetHttpPersistent
adapter does.

Slack::Web::Client uses Faraday#default_adapter so the end user can
switch adapters are used like this:

    Faraday.default_adapter = Faraday::Adapter::NetHttpPersistent

    c = Slack::Web::Client.new …

Unfortunately Slack::Web::Client does not cache the Faraday::Connection
object and instead creates a new connection for every request.  This
breaks the ability to use persistent connections with
Slack::Web::Client.

This can be observed through using Wireshark with an `ssl.handshake`
filter, or by observing `SYN` packets sent by `tcpdump host slack.com`.

This patch adds caching of the Faraday::Connection object to
Slack::Web::Client to allow caching of connections through Faraday.  A
cached connection can be reused as a persistent connection.

Recommending users use a persistent connection adapter (like
Faraday::Adapter::NetHttpPersistent) is not part of this pull request,
but I do recommend it.  Your users should see an easy speed improvement.

Here is a `time` output from a script that fetches ~4,000 conversations
using the net-http-persistent adapter without this patch (making it
ineffective):

    $ time ruby t.rb
    ruby t.rb  2.46s user 0.25s system 14% cpu 18.389 total

and with this patch:

    $ time ruby t.rb
    ruby t.rb  0.99s user 0.20s system 13% cpu 9.053 total

(This is only a 2x speed boost as the slack `conversation.list` API is
highly variable in response time.)
@drbrain
Copy link
Contributor Author

drbrain commented May 7, 2020

I don't understand the failure:

308 $ bundle exec danger
309
310 Danger has failed this build. 
311 Found 1 error.
312 The command "bundle exec danger" exited with 1.

I don't know how to reproduce it:

$ bundle exec danger
Could not find the type of CI for Danger to run on.

@wasabigeek
Copy link
Contributor

1 Error
🚫 Please put back the * Your contribution here. line into CHANGELOG.md.
Generated by 🚫 danger

I think you have to address this

@drbrain
Copy link
Contributor Author

drbrain commented May 7, 2020

It's really great of @dangerpr-bot to:

  • Pass on the first commit I make
  • Add a note afterward so I come here

Then when I make a commit to fix the note:

  • It fails with no message
  • It edits its own message much later
  • I can't see the edit because I haven't refreshed this page

Of course @dangerbot-pr:

* Can't fail on the first run of CI
* Can't run on my computer so I can be sure my changes are OK
* Can't show an error in CI when it fails
* Can't be bothered to create a new message so I get a GitHub notification

🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄🙄
@drbrain
Copy link
Contributor Author

drbrain commented May 7, 2020

Oh, and, when @dangerpr-bot edits its own message I don't get a notification 😒

Copy link
Collaborator

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I think we need a way not prevent this behavior, although I am happy with the default. Maybe add an option to turn it on/off, eg. options[:reuse_connection]. It needs documented behavior in README, and a note in UPGRADING.

@@ -19,7 +19,7 @@ def connection
request_options[:open_timeout] = open_timeout if open_timeout
options[:request] = request_options if request_options.any?

::Faraday::Connection.new(endpoint, options) do |connection|
@connection ||= ::Faraday::Connection.new(endpoint, options) do |connection|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the above options construction is wasting its time on a cached connection, so let's wrap that up too into another method? Something like

@connection ||= new_connection

Are there any cases where options change that we would want this not to be cached?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire connection creation is now cached.

I can only guess what these options mean in your library, and I'm unsure how Faraday is best used beyond "recreating a Faraday::Connection breaks adapters' ability to maintain persistent connections" after examining a few of the adapters.

My guess is your users would not expect to be able to change the User-Agent, proxy, TLS settings, timeouts, or request settings in the middle of an interaction with slack. I do not know your users though.

From an HTTP perspective it is incredibly rare that you would not want persistent connections. Perhaps if the HTTP server were broken? I'm pretty sure Slack's HTTP servers are not broken.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@dblock
Copy link
Collaborator

dblock commented May 7, 2020

I would like to default the client to Faraday::Adapter::NetHttpPersistent as well in this PR and bump version to 0.15.0.

WRT danger, feel free to open issues in https://github.com/danger/danger and https://github.com/dblock/danger-changelog, you're right about all these things.

@drbrain
Copy link
Contributor Author

drbrain commented May 7, 2020

I would like to default the client to Faraday::Adapter::NetHttpPersistent as well in this PR and bump version to 0.15.0.

Faraday supports many adapters besides net-http-persistent so users can pick one that matches their needs. In order to make net-http-persistent a hard default you would need to also depend on the gem directly. I'm not a faraday expert, but I believe this would not be in the correct spirit of choice that faraday is built around.

I have seen other faraday users opportunistically default, I have no references handy, but something along the lines of:

begin
  require 'net/http/persistent'

  Faraday.default_adapter = Faraday::Adapter::NetHttpPersistent
rescue LoadError
end

Please note that this will set the default for all libraries using faraday and may override a user's choice of excon, typhoeus, etc. depending on load order.

Perhaps there is a better way to do this, but I don't know it and I think "what library should be the default" is a new feature and outside the scope of this PR which is intended to only fix a bug in faraday usage.

WRT danger, feel free to open issues in https://github.com/danger/danger and https://github.com/dblock/danger-changelog, you're right about all these things.

I am only indirectly a user of danger. You are a direct user so I believe this responsibility is primarily yours. I also note that you seem to be four major releases away from current. Perhaps you upgrading that separately from this bug fix will resolve it?

I don't know, I just want to fix this bug without the frustration of inaccessible messaging.

@drbrain
Copy link
Contributor Author

drbrain commented May 7, 2020

Oh, I missed this other comment:

Thanks for this! I think we need a way not [to?] prevent this behavior, although I am happy with the default.

The default is the same, use whichever faraday adapter the user chose, but now it can operate as the adapter authors and library authors intended. If the user didn't pick a faraday adapter Net::HTTP without persistent connection support is used, but now there's no additional overhead of creating a Faraday::Connection atop a Net::HTTP connection.

This is a bug fix, no new behavior has been introduced.

Maybe add an option to turn it on/off, eg. options[:reuse_connection].

This conflicts with "I would like to default the client to Faraday::Adapter::NetHttpPersistent" so I think you should make that decision and apply it outside this bug fix.

It needs documented behavior in README

What were you thinking for documenting this behavior in README? You already support alternate adapters through faraday. As this is not discussed I could add something like the following:

This library uses faraday, see [link to faraday documentation on adateprs] if you would like to use a different adapter that will give more performance through persistent connections.

We recommend installing net-http-persistent and running [code snippet to enable it] when using this library.

Where in README would you want it added?

and a note in UPGRADING.

This is a bug fix so please clarify what kind of note would go in UPGRADING as there is no new feature added in this PR.

@drbrain drbrain requested a review from dblock May 7, 2020 18:24
@dblock
Copy link
Collaborator

dblock commented May 8, 2020

Ok, you're right we don't need a README update if we're not changing the faraday adapter and leaving it at default.

The caching of the connection is a behavior change though, isn't it? Is there really a way to ensure we broke nobody? I think if you agree we need to give users a way to restore old behavior with an option, no? That said, because it does fix the behavior for persistent adapters, and in general is a good idea, I agree that it should be the default.

@drbrain
Copy link
Contributor Author

drbrain commented May 8, 2020

I would be fine making a change to the README since it will benefit users who don't know about the multi-adapter feature of Faraday, so long as you can suggest an appropriate section. This can bring your users at least a 2x speedup and possibly more depending on the adapter, network conditions, and the load on the Slack API.

I'm as confident as I can be that this change won't break any users. I'll walk you through my reasoning (with links) in case you wish to look for anything I may have missed:

The default faraday adapter wraps Net::HTTP and that adapter wraps both of its usages (#request_via_get_method, #request_via_request_method) of Net::HTTP with Net::HTTP#start which finishes the connection (also #do_finish) after every request (a new TCP connection is created every time) users staying with the default won't see any change in behavior. Users choosing a different adapter will see faraday performing correctly.

I'm quite certain continuing to use a Faraday::Connection object is how Faraday is intended to be used. The examples show Faraday.get which proxies through #method_missing to #default_connection which is cached for the lifetime of the process.

All calls to Faraday.get will reuse that same Faraday::Connection until the default adapter is changed or the default connection options are changed.

PS: I maintain net-http-persistent, have contributed features to net-http (see the contributor list at the top of one of the Net::HTTP links above), and found a similar issue in the curb gem, a libcurl wrapper, long ago. I don't believe users will see any change they're not expecting.

PPS: By "not expecting" I mean that users will now see persistent connections working when they've chosen a Faraday adapter that supports them.

@dblock
Copy link
Collaborator

dblock commented May 8, 2020

Merged via 225c71e (squashed to first commit, fixed typo in spec).

Thanks for your help.

For the README maybe consider a section in web client. I think something short in the lines of "web client uses faraday and you can do more with it", but maybe it's TMI. Up to you.

@dblock dblock closed this May 8, 2020
@dblock
Copy link
Collaborator

dblock commented May 8, 2020

Also, I dislike

Faraday.default_adapter = Faraday::Adapter::NetHttpPersistent

c = Slack::Web::Client.new 

I'd be open to an adapter option so you don't have to set it globally, which then could be a good place to document using Faraday::Adapter::NetHttpPersistent as an example.

Related, in other libraries, such as graphlient we built completely http stack swap, and yielded the connection allowing the caller to completely customize it. Something like this:

Slack::Web::Client.new do |client|
   client.connection do |http|
   end
end

Frankly nobody is asking for this, most of the time people want to swap connection libraries or introduce their own options because they don't want to carry Faraday in their projects. Just something for consideration.

@drbrain drbrain deleted the drbrain/persistent-connection-support branch May 8, 2020 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants