Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More connection locking #28777

Closed
wants to merge 1 commit into from

Conversation

matthewd
Copy link
Member

Address two failure modes for connection pool thread locking (yay) by making it more complicated (boo).

First, a deadlock between the pool lock and the connection lock, due to inconsistent lock acquisition order: clear_query_cache requires the connection lock, and is invoked by a checkin callback while holding the pool lock. (I don't think we should actually be invoking callbacks while holding the pool lock, but that's a matter for another time.)

Second, the less proximate cause of the above failure: the main thread could checkin its connection while another thread was still using it. The second thread would then unexpectedly change connections between two queries -- even while inside a transaction on the first one.


We now keep track of who has borrowed the connection, and don't complete the "unlock" until they have released it. This also means that the lock only applies to connection acquisition: after that (and until it attempts to release it) the borrowing thread treats the connection as its own.

@matthewd matthewd added this to the 5.1.0 milestone Apr 17, 2017
@matthewd
Copy link
Member Author

@KeithP this should hopefully tame that last deadlock you were seeing

@matthewd matthewd self-assigned this Apr 17, 2017
@eileencodes
Copy link
Member

Hrm something isn't quite right. I'm seeing my test app run one test, do a rollback and then just sit there. No raise or anything. All the tests in my app should run, but only one is before it stalls and just sits there. Here's the logs.

   (0.3ms)  SET NAMES utf8,  @@SESSION.sql_mode = CONCAT(CONCAT(@@sql_mode, ',STRICT_ALL_TABLES'), ',NO_AUTO_VALUE_ON_ZERO'),  @@SESSION.sql_auto_is_null = 0, @@SESSION.wait_timeout = 2147483
   (0.2ms)  SELECT `schema_migrations`.`version` FROM `schema_migrations` ORDER BY `schema_migrations`.`version` ASC
   (0.2ms)  SELECT @@FOREIGN_KEY_CHECKS
   (0.1ms)  SET FOREIGN_KEY_CHECKS = 0
   (0.1ms)  BEGIN
  Fixture Delete (0.2ms)  DELETE FROM `companies`
  Fixture Insert (0.1ms)  INSERT INTO `companies` (`title`, `created_at`, `updated_at`, `id`) VALUES ('My Company', '2017-04-17 12:07:29', '2017-04-17 12:07:29', 980190962)
  Fixture Insert (0.1ms)  INSERT INTO `companies` (`title`, `created_at`, `updated_at`, `id`) VALUES ('Your Company', '2017-04-17 12:07:29', '2017-04-17 12:07:29', 298486374)
   (0.2ms)  COMMIT
   (0.1ms)  SET FOREIGN_KEY_CHECKS = 1
   (0.1ms)  BEGIN
-------------------------------------------------------
User::IndexTest: test_users_index_with_rails_assertions
-------------------------------------------------------
Started GET "/users" for 127.0.0.1 at 2017-04-17 08:07:30 -0400
Processing by UsersController#index as HTML
  Rendering users/index.html.erb within layouts/application
  User Load (0.4ms)  SELECT `users`.* FROM `users`
  Rendered users/index.html.erb within layouts/application (4.2ms)
Completed 200 OK in 60ms (Views: 51.7ms | ActiveRecord: 2.1ms)
Started GET "/assets/application-6f9276714fb0c092a05913d8250b6a6804ddc8b919759bf9af95cd7ff9f1b453.css" for 127.0.0.1 at 2017-04-17 08:07:31 -0400
Started GET "/assets/application-c1b7cd6555db8d4c769b8cb3024a3e73566a1a1212b5c99c233f8bdf6e835bcb.js" for 127.0.0.1 at 2017-04-17 08:07:31 -0400
   (0.2ms)  ROLLBACK

checkin conn
if conn.owner != owner_thread && @sharing_threads[conn.owner]
synchronize do
@sharing_threads[conn.owner].delete(Thread.current)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be .delete(owner_thread).

@@ -392,7 +417,14 @@ def active_connection?
# #checkout will not be automatically released.
def release_connection(owner_thread = Thread.current)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel there should be a hard check and a raise (akin to what is in the remove()) method. And I'd want it to happen before @thread_cached_conns.delete(...):

def release_connection(owner_thread = Thread.current)
  raise "Cannot release a shared connection" if @locked_thread == owner_thread
  if conn = @thread_cached_conns.delete(connection_cache_key(owner_thread))
    # ...

@@ -392,7 +417,14 @@ def active_connection?
# #checkout will not be automatically released.
def release_connection(owner_thread = Thread.current)
if conn = @thread_cached_conns.delete(connection_cache_key(owner_thread))
checkin conn
if conn.owner != owner_thread && @sharing_threads[conn.owner]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a small comment (unless I'm wrong on this one)

# if this branch is taken, then it must be that `@sharing_thread[conn.owner].include?(owner_thread)` # => true

@KeithP
Copy link
Contributor

KeithP commented Apr 18, 2017

@matthewd looks like we get the same failure as Eileen.

Failure/Error: @cond.wait(@monitor.instance_variable_get(:@mon_mutex), timeout)
fatal: No live threads left. Deadlock?

.# /home/keithp/.rvm/gems/ruby-2.3.0@my_app/bundler/gems/rails-5de1e4f2c45e/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb:374:in `block in lock_thread='
     .# /home/keithp/.rvm/gems/ruby-2.3.0@my_app/bundler/gems/rails-5de1e4f2c45e/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb:368:in `lock_thread='
     .# /home/keithp/.rvm/gems/ruby-2.3.0@my_app/bundler/gems/rails-5de1e4f2c45e/activerecord/lib/active_record/fixtures.rb:1012:in `block in teardown_fixtures'
     .# /home/keithp/.rvm/gems/ruby-2.3.0@my_app/bundler/gems/rails-5de1e4f2c45e/activerecord/lib/active_record/fixtures.rb:1010:in `each'
     .# /home/keithp/.rvm/gems/ruby-2.3.0@my_app/bundler/gems/rails-5de1e4f2c45e/activerecord/lib/active_record/fixtures.rb:1010:in `teardown_fixtures'
     .# /home/keithp/.rvm/gems/ruby-2.3.0@my_app/bundler/gems/rails-5de1e4f2c45e/activerecord/lib/active_record/fixtures.rb:857:in `after_teardown'

@matthewd
Copy link
Member Author

I've pulled out the deadlock fix in ce2abff and pushed that to master / 5-1-stable. We'll punt the larger change here to 5.1.1; I think I've just fixed the problem @eileencodes and @KeithP described above, but this is too risky to drop in post-rc2, when we're not expecting many users to encounter the original issue.

@matthewd matthewd modified the milestones: 5.1.1, 5.1.0 Apr 26, 2017
@rafaelfranca rafaelfranca modified the milestone: 5.1.2 Jun 20, 2017
@rails-bot
Copy link

rails-bot bot commented Dec 18, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@rails-bot rails-bot bot added the stale label Dec 18, 2019
@rails-bot rails-bot bot closed this Dec 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants