New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Much higher connection number than needed #68

Closed
xb opened this Issue Oct 12, 2011 · 7 comments

Comments

Projects
None yet
2 participants
@xb

xb commented Oct 12, 2011

It looks lik e httpclient employs a LRU strategy in choosing which TCP connection to a HTTP server to re-use when there is a new HTTP request instead of using an MRU strategy.

scenario

Consider following scenario: The user of httpclient uses 5 requests per second, each request takes 2 seconds to complete. The server has a timeout of 60 seconds until the server closes the TCP connection. Consider furthermore that there was a recent usage spike and thus there are 500 TCP connections between httpclient and the HTTP server.

expected result

It would be expected that, over time, the number of open TCP connections drops from 500 to about 10 (because, on average, every 0.2 seconds, there will be 1 TCP connection becoming free to be reused, and every 0.2 seconds, there will be a new HTTP request to be processed, keeping about 10 TCP connections busy).

actual result

The number of open TCP connections drops from 500 to about 300.

analysis

The reason is that httpclient does not let any of the 300 TCP connections to time out, because it does not re-use the most recently used TCP connection first (resulting in many warm TCP connections being cooled down and finally timed out), but it reuses the least recently used TCP connection first (resulting in many warm TCP connections being kept warm and thus never giving them a chance to time out).

recommended solution

Use an MRU approach.

@nahi

This comment has been minimized.

Show comment
Hide comment
@nahi

nahi Oct 12, 2011

Owner

Thanks for detailed explanation. Am I understanding you correctly?

diff --git a/lib/httpclient/session.rb b/lib/httpclient/session.rb
index 306d2dc..dae4cc7 100644
--- a/lib/httpclient/session.rb
+++ b/lib/httpclient/session.rb
@@ -251,7 +251,7 @@ class HTTPClient

     def add_cached_session(sess)
       @sess_pool_mutex.synchronize do
-        @sess_pool << sess
+        @sess_pool.unshift(sess)
       end
     end
   end

I think it's server-friendly and I should commit this.

Do you think we should add a leaper thread for collecting cold (long-living but not used) sessions? I think MRU would cause many cold sessions which is not collected forever... (Related to #56)

Owner

nahi commented Oct 12, 2011

Thanks for detailed explanation. Am I understanding you correctly?

diff --git a/lib/httpclient/session.rb b/lib/httpclient/session.rb
index 306d2dc..dae4cc7 100644
--- a/lib/httpclient/session.rb
+++ b/lib/httpclient/session.rb
@@ -251,7 +251,7 @@ class HTTPClient

     def add_cached_session(sess)
       @sess_pool_mutex.synchronize do
-        @sess_pool << sess
+        @sess_pool.unshift(sess)
       end
     end
   end

I think it's server-friendly and I should commit this.

Do you think we should add a leaper thread for collecting cold (long-living but not used) sessions? I think MRU would cause many cold sessions which is not collected forever... (Related to #56)

@nahi

This comment has been minimized.

Show comment
Hide comment
@nahi

nahi Oct 12, 2011

Owner

Adding some timeout check in Session#invalidated? (like def invalidated?; @invalidated or @last_used + 60 > Time.now; end) might be enough for this problem.

Owner

nahi commented Oct 12, 2011

Adding some timeout check in Session#invalidated? (like def invalidated?; @invalidated or @last_used + 60 > Time.now; end) might be enough for this problem.

@nahi

This comment has been minimized.

Show comment
Hide comment
@nahi

nahi Oct 12, 2011

Owner

One more question. Do you think we should do O(1) search in get_cached_session?

Owner

nahi commented Oct 12, 2011

One more question. Do you think we should do O(1) search in get_cached_session?

@xb

This comment has been minimized.

Show comment
Hide comment
@xb

xb Oct 12, 2011

Yes, this should be desireable,we are talking about 1000 connections to a specific server. However, O(1) support could be made a separate issue (as get_cached_session is also currently not O(1)).

xb commented Oct 12, 2011

Yes, this should be desireable,we are talking about 1000 connections to a specific server. However, O(1) support could be made a separate issue (as get_cached_session is also currently not O(1)).

@nahi

This comment has been minimized.

Show comment
Hide comment
@nahi

nahi Oct 12, 2011

Owner

Yeah, it should be a separate issue, but I think I should add the above invalidated check, too. The check cost would make get_cached_session too heavy under a environment like cached sessions >1000.

Anyway, I'll try to implement it.

Owner

nahi commented Oct 12, 2011

Yeah, it should be a separate issue, but I think I should add the above invalidated check, too. The check cost would make get_cached_session too heavy under a environment like cached sessions >1000.

Anyway, I'll try to implement it.

@nahi nahi closed this in 08ec3ad Oct 13, 2011

@nahi

This comment has been minimized.

Show comment
Hide comment
@nahi

nahi Oct 13, 2011

Owner

@xb can you check 08ec3ad ? Thanks!

Owner

nahi commented Oct 13, 2011

@xb can you check 08ec3ad ? Thanks!

@xb

This comment has been minimized.

Show comment
Hide comment
@xb

xb Oct 13, 2011

From reading the code, it looks good. I haven't tested it, though.

xb commented Oct 13, 2011

From reading the code, it looks good. I haven't tested it, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment