ActiveRecord::ConnectionAdapters::ConnectionPool lacks proper synchronization #7955

Closed
xb opened this Issue Oct 15, 2012 · 7 comments

Projects

None yet

4 participants

@xb
xb commented Oct 15, 2012

Following exception appears when having load on a rails 3.2.8 app under the newest JRuby version using a multi-core machine.

Detected invalid hash contents due to unsynchronized modifications with concurrent users
org/jruby/RubyHash.java:1362:in `keys'
/home/user/.rvm/gems/jruby-1.7.0.RC2/gems/activerecord-3.2.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:299:in `release'
/home/user/.rvm/rubies/jruby-1.7.0.RC2/lib/ruby/1.9/monitor.rb:211:in `__ensure__'
/home/user/.rvm/rubies/jruby-1.7.0.RC2/lib/ruby/1.9/monitor.rb:210:in `mon_synchronize'
/home/user/.rvm/gems/jruby-1.7.0.RC2/gems/activerecord-3.2.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:293:in `release'
/home/user/.rvm/gems/jruby-1.7.0.RC2/gems/activerecord-3.2.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:286:in `checkin'
/home/user/.rvm/rubies/jruby-1.7.0.RC2/lib/ruby/1.9/monitor.rb:211:in `__ensure__'
/home/user/.rvm/rubies/jruby-1.7.0.RC2/lib/ruby/1.9/monitor.rb:210:in `mon_synchronize'
/home/user/.rvm/gems/jruby-1.7.0.RC2/gems/activerecord-3.2.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:280:in `checkin'
/home/user/.rvm/gems/jruby-1.7.0.RC2/gems/activerecord-3.2.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:114:in `release_connection'
/home/user/.rvm/gems/jruby-1.7.0.RC2/gems/activerecord-3.2.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:381:in `clear_active_connections!'
org/jruby/RubyHash.java:1235:in `each_value'
/home/user/.rvm/gems/jruby-1.7.0.RC2/gems/activerecord-3.2.8/lib/active_record/connection_adapters/abstract/connection_pool.rb:381:in `clear_active_connections!'
/home/user/.rvm/gems/jruby-1.7.0.RC2/gems/activerecord-3.2.8/lib/active_record/connection_adapters/abstract/connection_specification.rb:183:in `clear_active_connections!'

The bug is probably that in https://github.com/rails/rails/blob/39087068c2e3c85f6839ea51eab4480673138a2b/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L118 (method release_connection()) as well as in https://github.com/rails/rails/blob/39087068c2e3c85f6839ea51eab4480673138a2b/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L206 (method clear_stale_cached_connections!()), the Hash @reserved_connections is accessed without synchronizing to the correct mutex.

This bug is probably related to #6464 , but the fix there is apparently insufficient.

@robin850
Ruby on Rails member

I allow myself to ping @tenderlove.

@tenderlove
Ruby on Rails member

clear_stale_cached_connections! is always called inside a synchronize here and here, so I don't think we need another synchronize block inside that method. release_connection can definitely be called without synchronization, so I think adding a synchronize inside that method will fix it up.

@xb
xb commented Oct 15, 2012

As you can see from the stack trace, clear_stale_cached_connections!() is not only called from inside connection_pool.rb, but also from inside connection_specification.rb, as ActiveRecord::Base.clear_active_connections!(). This method is public and is allowed to be called by any thread. You will need to make this method private if you think that you will not need a synchronized block inside clear_stale_cached_connections!(). However, it seems that ActiveRecord::Base.clear_active_connections!() is public for a reason, else long-running threads accessing ActiveRecord (threads separate from main Ruby threads, e.g. dedicaded threads for writing into the database asynchronously) have no chance to release resources allocated while writing to the database.

@tenderlove
Ruby on Rails member

@xb I don't see that code on 3-2-stable. Can you point me at it?

@xb
xb commented Oct 15, 2012

I'm sorry, this is my error. clear_stale_cached_connections!() is indeed only called within connection_pool.rb, so I believe your commit fixes the bug.

However, I'd suggest making clear_stale_cached_connections!() private anyway. This way, the effort is higher to call this method without proper synchronization (which is likely to be a bug), and such confusion I had is a little bit less likely. ;-)

@tenderlove
Ruby on Rails member

@xb totally agree. I'll make the method private!

@jrochkind

clear_stale_cached_connections! has long been listed in the API docs for ConnectionPool.

I would argue that it is thus part of ConnectionPool's API, and making it private is a backwards-incompatible break (in a patch release!) -- also that, as part of the public API, it ought to have the synchronize itself, so people can call it safely. Does it really effect performance very much to have an extra nested synchronize for expected use case where it's called by methods already synchronized? But the benefit is making it, well, actually work reliably, when called as part of the public API, which it is listed as.

If you are going to break public api in a patch release, rather than fixing it, at least I suggest advertising this very clearly in the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment