Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Make connection pool fair with respect to waiting threads. #6416

Merged
merged 1 commit into from

6 participants

pmahoney Steve Klabnik Aaron Patterson Rafael Mendonça França Yasuo Honda Jonathan Rochkind
pmahoney

Introduction: I'm using JRuby in a Java servlet container to run an application that uses ActiveRecord. This is a fully multithreaded environment where a thread pool services http requests, possibly in parallel. I've been having problems with timeouts when obtaining a database connection with only moderate concurrent requests.

(Reducing the size of the thread pool to match that of the connection pool may be one option, but I am running multiple apps in the servlet container all sharing one thread pool... So for now, the thread pool is 32 and the connection pool is 5).

I'm putting this patch out there to solicit comments. I think it needs more testing. I may have a chance this week to test in a real application, but for now I only have my two benchmark scripts ar_test.rb (for JRuby) and ar_test_mri.rb. Notably, I have not tested the behavior when connections are removed explicitly or removed by the reaper.

I also do not understand the distinction between PoolFullError and ConnectionTimeoutError. I also introduced an @available array of connections ready to be checked out. At the time this seemed like a reasonable idea, but I had to make some inelegant additions (essentially everywhere @connections is modified, @available must be also, e.g. #clear_reloadable_connections).

I apologize for the length of this report.

tl;dr the fairness patch reduces outliers in time to acquire a connection but needs review and testing

The test suite passes, tested on Linux/Ruby-1.9.2-p290.

activerecord$ bundle exec rake test_sqlite3
Finished tests in 193.646555s, 17.1859 tests/s, 52.2653 assertions/s.
3328 tests, 10121 assertions, 0 failures, 0 errors, 12 skips

This patch makes the connection pool "fair" in the first-come, first-served sense. It also avoids using the Timeout class (which spawns a dedicated thread for each connection checkout attempt that needs to to wait with a timeout).

I've added two tests that fail before applying this patch. The first ensures the connection pool is fair by queuing up several waiting threads and ensuring they acquire connections in order as connections are made available. The connections are trickled in one by one because we don't actually care when two connections become available that order is strictly preserved.

The second queues up two groups of waiting threads, then checks in enough connections for group 1 all at once. The test ensures that only the group 1 threads acquired connections.

A third test is the money test but was removed because it was unreliable. It attempted to test latency in a Java-servlet-like environment by setting up a thread pool and having each thread check connections in and out.

Instead, I used ar_test.rb to obtain histograms of the time per simulated request (checkout connection; sleep 0.01s; checkin connection).

Tests with ar_test.rb and JRuby 1.6.6

ActiveRecord 3.2.3, JRuby 1.6.6

$ bundle exec jruby --1.9 ar_test.rb
10000 reqs in 24.666 = 405.4 req/s
min/max/mean 0.01/2.37/0.08
histogram
[0.00,0.03) ooooooooooooooo
[0.03,0.05) o
[0.05,0.08) oooooooooooooooooooooooooooooooo
[0.08,0.10) ooooo
[0.10,0.13) o
[0.13,0.15) oooooo
[0.15,0.18) o
[0.18,0.20) oo
[0.20,0.22) o
[0.22,0.25) oo

ActiveRecord edge without fairness patch JRuby 1.6.6

Gave up after long time.  Many errors like this (notice time much longer than 5 s timeout):

took 120.067s to acquire and release
execution expired /home/pat/dev/ruby/rails/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb:106:in `await'

ActiveRecord edge with fairness patch, JRuby 1.6.6

$ bundle exec jruby --1.9 ar_test.rb
ActiveRecord::VERSION 4.0.0.beta
10000 reqs in 24.8 = 403.7 req/s
min/max/mean 0.06/1.48/0.08
histogram
[0.00,0.03) 
[0.03,0.05) 
[0.05,0.08) ooooooooooooooooooooooooooooooooooooooooooooooooo
[0.08,0.10) ooooooooooo
[0.10,0.13) o
[0.13,0.15) o
[0.15,0.18) o
[0.18,0.20) o
[0.20,0.22) o
[0.22,0.25) o

Tests with ar_test_mri.rb and JRuby 1.6.6

ActiveRecord 3.2.3, Ruby 1.9.2-p290

Sometimes errors like this:

could not obtain a database connection within 5 seconds (waited 5.000125524999999 seconds). The max pool size is currently 5; consider increasing it. /home/pat/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/activerecord-3.2.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:248:in `block (2 levels) in checkout'

$ bundle exec ruby ar_test_mri.rb  # ruby 1.9.2-p290
ActiveRecord::VERSION::STRING 3.2.3
10000 reqs in 21.0 = 475.7 req/s
min/max/mean 0.01/4.74/0.07
histogram
[0.00,0.03) ooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
[0.03,0.05) o
[0.05,0.08) o
[0.08,0.10) 
[0.10,0.12) o
[0.12,0.15) o
[0.15,0.17) o
[0.17,0.20) o
[0.20,0.22) o
[0.22,0.25) ooo

ActiveRecord edge without fairness patch, Ruby 1.9.2-p290

Again, many "execution expired" errors.

ActiveRecord edge with fairness patch, Ruby 1.9.2-p290

$ bundle exec ruby ar_test_mri.rb   # ruby 1.9.2-p290
ActiveRecord::VERSION::STRING 4.0.0.beta
10000 reqs in 21.147658059 = 472.86559921202286 req/s
min/max/mean 0.01/0.12/0.07
histogram
[0.00,0.03) o
[0.03,0.05) o
[0.05,0.08) ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
[0.08,0.10) o
[0.10,0.12) o
[0.12,0.15) 
[0.15,0.17) 
[0.17,0.20) 
[0.20,0.22) 
[0.22,0.25) 

So in both Ruby 1.9.2 and JRuby, ActiveRecord 3.2.3 has perhaps a better median but more variation with a few requests taking significantly longer than the others.

I'm only including ar_test_mri.rb here. The ar_test.rb script has only had cosmetic changes and is available in this other pull request: #6398

require 'active_record'
require 'thread'

SHUTDOWN = Object.new.freeze    # used to shutdown threads in pool

config = {
  'adapter' => 'sqlite3',
  'database' => ':memory:'
}

ActiveRecord::Base.establish_connection(config)
ActiveRecord::Base.clear_active_connections!

times = []
times.extend(MonitorMixin)

work = Queue.new

thread_pool_size = 32
nrequests = 10000

threads = (1..thread_pool_size).map do |i|
  Thread.new do
    while work.pop != SHUTDOWN
      t0 = Time.now
      begin
        ActiveRecord::Base.connection_pool.connection
        sleep 0.010
        ActiveRecord::Base.clear_active_connections!
      rescue => err
        puts "#{err.message} #{err.backtrace.join("\n\t")}"
      ensure
        times.synchronize { times << (Time.now - t0) }
      end
    end
  end
end

t0 = Time.now

nrequests.times { |i| work.push i }
thread_pool_size.times { work.push SHUTDOWN }
threads.each(&:join)

elapsed = Time.now - t0

puts "ActiveRecord::VERSION::STRING #{ActiveRecord::VERSION::STRING}"
puts "#{nrequests} reqs in #{elapsed} = #{nrequests/elapsed} req/s"
puts "min/max/mean %0.2f/%0.2f/%0.2f" % [times.min, times.max, times.reduce(&:+)/times.size]

def clamp(n, min, max)
  [[n, min].max, max].min
end

def hist(values, nbuckets = 10, min = values.min, max = values.max)
  step = (max - min) / nbuckets.to_f
  buckets = Array.new(nbuckets, 0)
  values.each do |v|
    i = clamp(((v - min) / step).floor, 0, nbuckets-1)
    buckets[i] += 1
  end

  start = 0
  buckets.each do |count|
    print "[%0.2f,%0.2f) " % [start, start + step]
    ntics = (count.to_f / values.size * 60).ceil # 60 screen cols
    puts 'o' * ntics
    start += step
  end
end

puts 'histogram'
hist(times, 10, 0.00, 0.25)
Steve Klabnik
Collaborator

This is awesome. In general. Thanks for the detailed write-up. I'm sure @tenderlove and @jonleighton will want to check this out.

Aaron Patterson
Owner

Awesome! It looks great! :+1:

Aaron Patterson tenderlove merged commit d2901f0 into from
Rafael Mendonça França

@pmahoney this commit broke the build with PostgreSQL. Could you take a look?

Rafael Mendonça França rafaelfranca referenced this pull request from a commit
Rafael Mendonça França rafaelfranca Revert "Merge pull request #6416 from pmahoney/threadsafe-connection-…
…pool"

This reverts commit d2901f0, reversing
changes made to 525839f.

Conflicts:
	activerecord/test/cases/connection_pool_test.rb

Reason: This change broke the build (http://travis-ci.org/#!/rails/rails/builds/1391490)
and we don't have any solution until now. I asked the author to try to
fix it and open a new pull request.
7c69e2d
Rafael Mendonça França

I reverted d2901f0 since it broke the build one day ago and we don't have any solution until now. Please try to fix it and open a new pull request.

pmahoney

That's... interesting. I can reproduce that locally, so that's good. I refactored the code a bit, still get the same errors, but now this happens:

Run eager_test.rb alone, test passes:

$ ARCONN=postgresql bundle exec ruby -Itest test/cases/associations/eager_test.rb 
Finished tests in 4.363282s, 31.3984 tests/s, 93.9660 assertions/s.
137 tests, 410 assertions, 0 failures, 0 errors, 0 skips

Run with connection_pool.rb alone, test passes:

$ ARCONN=postgresql bundle exec ruby -Itest test/cases/connection_pool_test.rb
Finished tests in 7.867991s, 2.1607 tests/s, 5.2110 assertions/s.
17 tests, 41 assertions, 0 failures, 0 errors, 0 skips

Run them together, tests pass:

$ ARCONN=postgresql bundle exec ruby -Itest -e '["test/cases/connection_pool_test.rb", "test/cases/associations/eager_test.rb"].each { |f| load f }'
Finished tests in 13.071680s, 11.7812 tests/s, 34.5021 assertions/s.
154 tests, 451 assertions, 0 failures, 0 errors, 0 skips

But run the all the tests with the rake command, and I get those same failures.

Any ideas? I suppose I can try running larger and larger subsets of the tests to try to narrow it down... Or better, find out what rake is doing that plain ruby isn't.

(We've backported this patch to the 3.0 series, but this will be a blocker to us upgrading to 4.x, and I assume this will be true for many JRuby folks.)

Rafael Mendonça França

I have no clue now. @yahonda tried to fix it as he described in the issue #6443 but I think he doesn't have success too. I'll try to investigate later too.

pmahoney

I've gotten the failing case down to this:

$ ARCONN=postgresql ruby -w -I"lib:test" -e '["test/cases/associations/eager_test.rb", "test/cases/associations/join_model_test.rb", "test/cases/base_test.rb"].each { |f| load f }'

This also fails with sqlite3. Going to poke around a bit more...

pmahoney

For what it's worth, that same command fails against master (cb92efb) with both sqlite3 and postgresql:

$ ruby -w -I"lib:test" -e '["test/cases/associations/eager_test.rb", "test/cases/associations/join_model_test.rb", "test/cases/base_test.rb"].each { |f| load f }'
Using sqlite3
/home/pat/.rbenv/versions/1.9.2-p290/lib/ruby/1.9.1/rexml/doctype.rb:255: warning: `+' after local variable is interpreted as binary operator
/home/pat/.rbenv/versions/1.9.2-p290/lib/ruby/1.9.1/rexml/doctype.rb:255: warning: even though it seems like unary operator
Run options: --seed 6037

# Running tests:

....................................................................................................................................................................................................................................................S.....................................................................FF.........................................................................................................

Finished tests in 8.663860s, 48.5927 tests/s, 129.8497 assertions/s.

  1) Failure:
test_eager_loading_with_conditions_on_join_model_preloads(EagerAssociationTest) [test/cases/associations/eager_test.rb:1015]:
3 instead of 2 queries were executed.
Queries:
SELECT "authors".* FROM "authors" INNER JOIN "posts" ON "posts"."author_id" = "authors"."id" INNER JOIN "comments" ON "comments"."post_id" = "posts"."id" WHERE (posts.title like 'Welcome%')
          SELECT name
          FROM sqlite_master
          WHERE type = 'table' AND NOT name = 'sqlite_sequence'
 AND name = "author_addresses"
SELECT "author_addresses".* FROM "author_addresses"  WHERE "author_addresses"."id" IN (1).
Expected: 2
  Actual: 3

  2) Failure:
test_eager_loading_with_conditions_on_joined_table_preloads(EagerAssociationTest) [test/cases/associations/eager_test.rb:977]:
6 instead of 2 queries were executed.
Queries:
          SELECT name
          FROM sqlite_master
          WHERE type = 'table' AND NOT name = 'sqlite_sequence'
 AND name = "taggings"
PRAGMA table_info("taggings")
          SELECT name
          FROM sqlite_master
          WHERE type = 'table' AND NOT name = 'sqlite_sequence'
 AND name = "tags"
PRAGMA table_info("tags")
SELECT "posts".* FROM "posts" INNER JOIN "taggings" ON "taggings"."taggable_id" = "posts"."id" AND "taggings"."taggable_type" = 'Post' INNER JOIN "tags" ON "tags"."id" = "taggings"."tag_id" WHERE (tags.name = 'General') ORDER BY posts.id
SELECT "authors".* FROM "authors"  WHERE "authors"."id" IN (1).
Expected: 2
  Actual: 6

I think the problem is unrelated to my patch. It seems the tests are making assumption about what sort of table metadata has already been cached (sorry, I'm not too familiar with the inner workings of ActiveRecord)? If I modify the test test_eager_loading_with_conditions_on_join_model_preloads to run the query-under-test once outside the assert_queries(2), then a second time within assert_queries(2), the test passes.

Yasuo Honda

Thanks for the investigation. I agree that your patch is not cause of these failures.
I've made a very temporary fix

pmahoney

Thanks for looking into this! I was about to open an issue, but I'll hold off unless you think otherwise?

Yasuo Honda

Hi, I think these 2 failures should be discussed in a separate issue. Please go ahead to open a new one.

Jonathan Rochkind

yep, this patch is fixing something we fixed slightly differnetly in 3-2-stable already, but master had diverged at that point, and is something I'm running into too.

I don't completely understand the intended semantics of this patch, exactly what it's intended to fix how. But let me know if I can help, I'm becoming increasingly familiar with ConnectionPool's concurrency semantics (somewhat sadly :) ).

Have you really managed to enforce "fairness" so when threads are waiting on a connection, it's first-in/first-out? I can't quite figure out how your code does that, but I'm impressed if it does, I've been racking my brain and unable to come up with any way to do that with ruby's own mutex semantics.

Please let me know if/how I can help get this resolved (I'm interested in back-porting to 3.2.x, even if only in my own personal monkey patches, once it's in master. I am having trouble with these issues in 3.2)

pmahoney

@jrochkind it's much easier to see what's going on in this refactoring: pmahoney@ff1adc0 This is what I'm going to resubmit after a few more cleanups.

Take a look at the Queue class. Yes, it uses condition wait/signal. I have found (and have tests to prove it) that if you have, say 10 threads waiting on a condition var, then do 10.times { cond.signal; sleep 1 }, the waiting threads will wake up in the order they started waiting, consistent with the documentation of ConditionVariable).

I have also found that if 5 threads start waiting, then later a second group of 5 start waiting, then calling 5.times { cond.signal } (no sleep), the 5 threads in group1 will wake up, and none in group2. The group1 threads will not necessarily wake up in order, but that's fine for this use case.

(For what it's worth, I did implement a strictly fair queue where each waiting thread put itself into an Array and on wakeup, made sure it was first in line, or else re-signaled and began waiting again).

If that's true, then the way to prevent a new thread (one that has not been waiting) from "stealing" a connection from a thread that has been waiting is to check @num_waiting. When a thread is waiting, it increments this number, and that holds true even when it releases the lock during the call to #wait. I can easily modify Queue class to allow "stealing", in which case the histogram or response times looks a lot different (there are quite a few requests that take a longish time to acquire a connection). I may blog about this and make some nicer graphs, because I found it interesting.

Do those test results conflict with things you have seen?

Jonathan Rochkind

@pmahoney Well, hm. Here's what I've seen (and even have a test I could dig up to demonstrate).

  • thread1 wants a connection, and none are available so it waits.
  • Later another thread checks the connection back in, so it #signals.
  • thread1 wakes up -- inside the mutex -- but finds that there's still no connection available. Somehow, some other thread has grabbed it in between the checkin/signal, and when thread1 actually gets scheduled inside the mutex again.

Now, the thread that 'stole' the mutex out of order may not have been a thread that was actually waiting. It may be that in between the signal (when thread1 becomes 'runnable'), and when thread1 is actually scheduled, some completely different thread (that was not previously waiting), manages to get scheduled, tries to a checkout -- finds that, hey, there's a connection available (because thread1 hasn't been scheduled yet so hasn't had a chance to take it), and grabs it without waiting at all. I believe that is in fact what was happening.

If your code doesn't solve that issue --and I don't think it does -- then I'm not sure I understand what issue your code does solve, vs. the state before your patch. pre-your-patch, threads were already waiting on a ConditionVariable, and would premably already get woken up in order, right? Although your test results certainly seem to show that things are improved, I just don't understand why/how. :) (update q: Can you confirm that under your new code, as demonstrated in the tests, no thread will ever wait (much) longer than it's timeout? That's the most alarming feature of master pre-patch, I think we should make sure that never happens -- better to have a thread timeout because it had a connection 'stolen' then to have a thread that can wait without timing out with no upper bound).

I had a test that sort of demonstrated this, but it was kind of a crazy hacky test -- the sort of race condition it is, the test basically needed to run like 100 threads, and maybe 5% of them or what have you would get their connection 'stolen'. The test demonstrated this under an old version of 3-2-stable, and it was demonstrated by some threads winding up raising the TimeoutException long before the @timeout value -- because they got woken up, but then found there was still no connection available, and in the code at that time, that would cause them to raise. But that exact test woudln't demonstrate against current 3-2-stable, because it's been patched so when this happens, the thread just goes to sleep again without waiting. But the underlying problem, of other threads coming in and stealing connections, is still there. The test would need to be rejiggered a bit to show that in current 3-2-stable of master (which works somewhat differently already) -- it's tricky to figure out how to demonstrate these kinds of race conditions, but I could try to work it out -- but I couldn't figure out any way to solve it.

Jonathan Rochkind

@pmahoney Ah, wait, I think I see how you've solved it.

In my case, the thread 'stealing' the connection, let's call it ThreadX, wasn't waiting at all. It managed to get scheduled, and do a checkout, in between the signal and when the first waiting thread was actually scheduled.

In your new fix, that ThreadX might still get scheduled in between the signal and when thread1 actually gets scheduled to get the connection it's due -- but it'll see that @num_waiting isn't zero, and go to sleep itself with a wait, going to the end of the line, instead of 'stealing' the connection.

Genius!

I still don't understand how your original patch, the one actually in this pull, managed to improve things -- although I believe your test results that it did, I just don't understand it!

But I understand how your new fix manages to improve things quite a bit with @num_waiting, nice! Once this gets into master, I'm really going to want to see if there's a way to backport it to 3-2-stable, to see if it solves my problems in my 3-2 app.

pmahoney

Resubmitted here: #6488

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
120 activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb
View
@@ -2,7 +2,6 @@
require 'monitor'
require 'set'
require 'active_support/core_ext/module/deprecation'
-require 'timeout'
module ActiveRecord
# Raised when a connection could not be obtained within the connection
@@ -92,21 +91,6 @@ def run
attr_accessor :automatic_reconnect, :timeout
attr_reader :spec, :connections, :size, :reaper
- class Latch # :nodoc:
- def initialize
- @mutex = Mutex.new
- @cond = ConditionVariable.new
- end
-
- def release
- @mutex.synchronize { @cond.broadcast }
- end
-
- def await
- @mutex.synchronize { @cond.wait @mutex }
- end
- end
-
# Creates a new ConnectionPool object. +spec+ is a ConnectionSpecification
# object which describes database connection information (e.g. adapter,
# host name, username, password, etc), as well as the maximum size for
@@ -128,9 +112,25 @@ def initialize(spec)
# default max pool size to 5
@size = (spec.config[:pool] && spec.config[:pool].to_i) || 5
- @latch = Latch.new
@connections = []
@automatic_reconnect = true
+
+ # connections available to be checked out
+ @available = []
+
+ # number of threads waiting to check out a connection
+ @num_waiting = 0
+
+ # signal threads waiting
+ @cond = new_cond
+ end
+
+ # Hack for tests to be able to add connections. Do not call outside of tests
+ def insert_connection_for_test!(c)
+ synchronize do
+ @connections << c
+ @available << c
+ end
end
# Retrieve the connection associated with the current thread, or call
@@ -188,6 +188,7 @@ def disconnect!
conn.disconnect!
end
@connections = []
+ @available = []
end
end
@@ -202,6 +203,9 @@ def clear_reloadable_connections!
@connections.delete_if do |conn|
conn.requires_reloading?
end
+ @available.delete_if do |conn|
+ conn.requires_reloading?
+ end
end
end
@@ -225,23 +229,19 @@ def clear_stale_cached_connections! # :nodoc:
# Raises:
# - PoolFullError: no connection can be obtained from the pool.
def checkout
- loop do
- # Checkout an available connection
- synchronize do
- # Try to find a connection that hasn't been leased, and lease it
- conn = connections.find { |c| c.lease }
-
- # If all connections were leased, and we have room to expand,
- # create a new connection and lease it.
- if !conn && connections.size < size
- conn = checkout_new_connection
- conn.lease
- end
+ synchronize do
+ conn = nil
+
+ if @num_waiting == 0
+ conn = acquire_connection
+ end
- return checkout_and_verify(conn) if conn
+ unless conn
+ conn = wait_until(@timeout) { acquire_connection }
end
- Timeout.timeout(@timeout, PoolFullError) { @latch.await }
+ conn.lease
+ checkout_and_verify(conn)
end
end
@@ -257,8 +257,10 @@ def checkin(conn)
end
release conn
+
+ @available.unshift conn
+ @cond.signal
end
- @latch.release
end
# Remove a connection from the connection pool. The connection will
@@ -266,12 +268,14 @@ def checkin(conn)
def remove(conn)
synchronize do
@connections.delete conn
+ @available.delete conn
# FIXME: we might want to store the key on the connection so that removing
# from the reserved hash will be a little easier.
release conn
+
+ @cond.signal # can make a new connection now
end
- @latch.release
end
# Removes dead connections from the pool. A dead connection can occur
@@ -283,12 +287,60 @@ def reap
connections.dup.each do |conn|
remove conn if conn.in_use? && stale > conn.last_use && !conn.active?
end
+ @cond.broadcast # may violate fairness
end
- @latch.release
end
private
+ # Take an available connection or, if possible, create a new
+ # one, or nil.
+ #
+ # Monitor must be held while calling this method.
+ #
+ # Returns: a newly acquired connection.
+ def acquire_connection
+ if @available.any?
+ @available.pop
+ elsif connections.size < size
+ checkout_new_connection
+ end
+ end
+
+ # Wait on +@cond+ until the block returns non-nil. Note that
+ # unlike MonitorMixin::ConditionVariable#wait_until, this method
+ # does not test the block before the first wait period.
+ #
+ # Monitor must be held when calling this method.
+ #
+ # +timeout+: Integer timeout in seconds
+ #
+ # Returns: the result of the block
+ #
+ # Raises:
+ # - PoolFullError: timeout elapsed before +&block+ returned a connection
+ def wait_until(timeout, &block)
+ @num_waiting += 1
+ begin
+ t0 = Time.now
+ loop do
+ elapsed = Time.now - t0
+ if elapsed >= timeout
+ msg = 'could not obtain a database connection within %0.3f seconds (waited %0.3f seconds)' %
+ [timeout, elapsed]
+ raise PoolFullError, msg
+ end
+
+ @cond.wait(timeout - elapsed)
+
+ conn = yield
+ return conn if conn
+ end
+ ensure
+ @num_waiting -= 1
+ end
+ end
+
def release(conn)
thread_id = if @reserved_connections[current_connection_id] == conn
current_connection_id
2  activerecord/test/cases/connection_adapters/abstract_adapter_test.rb
View
@@ -36,7 +36,7 @@ def test_expire_mutates_in_use
def test_close
pool = ConnectionPool.new(ConnectionSpecification.new({}, nil))
- pool.connections << adapter
+ pool.insert_connection_for_test! adapter
adapter.pool = pool
# Make sure the pool marks the connection in use
115 activerecord/test/cases/connection_pool_test.rb
View
@@ -200,6 +200,121 @@ def test_checkout_behaviour
end.join
end
+ # The connection pool is "fair" if threads waiting for
+ # connections receive them the order in which they began
+ # waiting. This ensures that we don't timeout one HTTP request
+ # even while well under capacity in a multi-threaded environment
+ # such as a Java servlet container.
+ #
+ # We don't need strict fairness: if two connections become
+ # available at the same time, it's fine of two threads that were
+ # waiting acquire the connections out of order.
+ #
+ # Thus this test prepares waiting threads and then trickles in
+ # available connections slowly, ensuring the wakeup order is
+ # correct in this case.
+ #
+ # Try a few times since it might work out just by chance.
+ def test_checkout_fairness
+ 4.times { setup; do_checkout_fairness }
+ end
+
+ def do_checkout_fairness
+ expected = (1..@pool.size).to_a.freeze
+ # check out all connections so our threads start out waiting
+ conns = expected.map { @pool.checkout }
+ mutex = Mutex.new
+ order = []
+ errors = []
+
+ threads = expected.map do |i|
+ t = Thread.new {
+ begin
+ conn = @pool.checkout # never checked back in
+ mutex.synchronize { order << i }
+ rescue => e
+ mutex.synchronize { errors << e }
+ end
+ }
+ Thread.pass until t.status == "sleep"
+ t
+ end
+
+ # this should wake up the waiting threads one by one in order
+ conns.each { |conn| @pool.checkin(conn); sleep 0.1 }
+
+ threads.each(&:join)
+
+ raise errors.first if errors.any?
+
+ assert_equal(expected, order)
+ end
+
+ # As mentioned in #test_checkout_fairness, we don't care about
+ # strict fairness. This test creates two groups of threads:
+ # group1 whose members all start waiting before any thread in
+ # group2. Enough connections are checked in to wakeup all
+ # group1 threads, and the fact that only group1 and no group2
+ # threads acquired a connection is enforced.
+ #
+ # Try a few times since it might work out just by chance.
+ def test_checkout_fairness_by_group
+ 4.times { setup; do_checkout_fairness_by_group }
+ end
+
+ def do_checkout_fairness_by_group
+ @pool.instance_variable_set(:@size, 10)
+ # take all the connections
+ conns = (1..10).map { @pool.checkout }
+ mutex = Mutex.new
+ successes = [] # threads that successfully got a connection
+ errors = []
+
+ make_thread = proc do |i|
+ t = Thread.new {
+ begin
+ conn = @pool.checkout # never checked back in
+ mutex.synchronize { successes << i }
+ rescue => e
+ mutex.synchronize { errors << e }
+ end
+ }
+ Thread.pass until t.status == "sleep"
+ t
+ end
+
+ # all group1 threads start waiting before any in group2
+ group1 = (1..5).map(&make_thread)
+ group2 = (6..10).map(&make_thread)
+
+ # checkin n connections back to the pool
+ checkin = proc do |n|
+ n.times do
+ c = conns.pop
+ @pool.checkin(c)
+ end
+ end
+
+ checkin.call(group1.size) # should wake up all group1
+
+ loop do
+ sleep 0.1
+ break if mutex.synchronize { (successes.size + errors.size) == group1.size }
+ end
+
+ winners = mutex.synchronize { successes.dup }
+ checkin.call(group2.size) # should wake up everyone remaining
+
+ group1.each(&:join)
+ group2.each(&:join)
+
+ assert_equal((1..group1.size).to_a, winners.sort)
+
+ if errors.any?
+ raise errors.first
+ end
+ end
+
def test_automatic_reconnect=
pool = ConnectionPool.new ActiveRecord::Base.connection_pool.spec
assert pool.automatic_reconnect
Something went wrong with that request. Please try again.