Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

connection pool: undefined method `in_use?' #5491

Closed
plentz opened this Issue · 38 comments
@plentz

under heavy load(running apache benchmark), I saw a lot of stacktraces like this one. Maybe it's worth mentioning that I'm using Jruby.

https://gist.github.com/2068742

@plentz

oh, there's one more thing: I'm using threadsafe mode.

@plentz

well, that's cool. the bug was here, but it's already fixed in master here, since the new reap method is synchronized

:heart: @tenderlove :heart:

@plentz plentz closed this
@aselder

@tenderlove Will the fix for this bug be backported to the 3.x branch?

@xaviershay

I'm also experiencing this problem on 3.2.6, still trying to figure out a good work around.

@freerobby

@xaviershay Did you ever find a workaround? We're seeing it in 3.2.6 as well.

@xaviershay

I monkey patched it O_o

@freerobby

@xaviershay Would you mind posting your patch?

@plentz @tenderlove Can we re-open this since we're still seeing it on 3.2.8? We're on JRuby, same issue referenced here: #5330 (comment)

@steveklabnik
Collaborator

I don't think that the fix ever got backported. Since you're seeing it on 3.2.8, I'll re-open.

@steveklabnik steveklabnik reopened this
@xaviershay

Note that while this fixes this specific exception, we are still seeing connection leakage in our app :(

if Rails.version != "3.2.8"
  raise "Please verify whether this monkey-patch is still required"
end

# This works around an observed bug, but is probably fixed in Rails 4. Check
# the documentation when it is released to see what it says about connection
# pooling.
#
# For background (it's gnarly), start reading here:
#   https://github.com/rails/rails/issues/5491
module ActiveRecord
  module ConnectionAdapters
    class ConnectionPool
      # This method is a copy of the existing method, with an extra nil check
      # added.
      def clear_stale_cached_connections!
        keys = @reserved_connections.keys - Thread.list.find_all { |t|
          t.alive?
        }.map { |thread| thread.object_id }
        keys.each do |key|
          conn = @reserved_connections[key]
          # CHANGED: Sometimes `conn` is nil. This could happen if a Thread
          # finishes between fetching the keys list above, and iterating
          # through them here.  This code has been rewritten in Rails 4, so
          # hopefully will no longer be broken.
          if conn
            ActiveSupport::Deprecation.warn(<<-eowarn) if conn.in_use?
  Database connections will not be closed automatically, please close your
  database connection at the end of the thread by calling `close` on your
  connection.  For example: ActiveRecord::Base.connection.close
            eowarn
            checkin conn
          end
          # This assumes that threads do not re-use object IDs. It's probably
          # already been deleted but I haven't been able to trace the code
          # through and want to make sure.
          @reserved_connections.delete(key)
        end
      end
    end
  end
end
@freerobby

@xaviershay Thanks for that. We're still leaking connections, too. It's so frustrating because every deploy is basically a ticking time bomb until we run out of them and then every request will just timeout on our visitors (not to mention the failures we'll see in background tasks). Have you guys found any approach to dealing with this issue while rails gets sorted out?

@freerobby

I'm told this fixes the thread safety issue; I'll be trying soon.

https://gist.github.com/3715360

@xaviershay

We fixed our connection leakage issue - it was because our Rails.cache was doing DB access, and it gets used earlier in the middleware stack than the connection management code. Switched them around and everything is good.

@steveklabnik
Collaborator

@freerobby did you get the chance to try that out yet?

@freerobby

@steveklabnik I tried it to no avail. We're still leaking connections left and right. Each server fields 10 rps or fewer, our database pool is set to 110, and I have to restart the servers every two days lest they run out of connections.

We do have background jobs that run as well, but I make sure to call ActiveRecord::Base.clear_active_connections! in every one of them. Frustrating.

@freerobby

Can somebody please explain to me why the ActiveRecord contract is such that ActiveRecord is not responsible for checking back in the connections it checks out? I understand why that would be the case if you checkout a connection and then make an AR call since AR merely makes use of the existing connection. But if ActiveRecord checks out the connection itself, why does it not check it back in when it's done?

@freerobby

We're hosting our app on Torquebox. I found that even for a simple HTTP request, the database connections are not getting checked back into the pool after completion of the request. I thought that for web requests ActionPack was supposed to take care of this, but maybe I've misunderstood, or maybe there's some library deeper in our stack (like in @xaviershay's middleware case) that is the culprit.

In an effort to track this more thoroughly, I wrote a monkey patch to show verbose logging when connections get checked in and checked out, and the results indicate to me the error is on the rails side of things:

# https://gist.github.com/3715360 fixes https://github.com/rails/rails/issues/5491#issuecomment-8530227
raise "Please verify whether this monkey-patch is still required" if Rails.version != "3.2.8"
module ActiveRecord
  module ConnectionAdapters
    class ConnectionPool
      # Debug ActiveRecord connections management
      def checkout_with_logging
        Rails.logger.info "Checking out a new ActiveRecord connection. used=#{@connections.count}, available=#{@size}"
        conn = checkout_without_logging
        Rails.logger.info "Checkout successful. used=#{@connections.count}, available=#{@size}"
        conn
      end
      alias_method_chain :checkout, :logging

      def checkin_with_logging(conn)
        Rails.logger.info "Checking in ActiveRecord connection. used=#{@connections.count}, available=#{@size}"
        c = checkin_without_logging(conn)
        Rails.logger.info "Checkin successful. used=#{@connections.count}, available=#{@size}"
        c
      end
      alias_method_chain :checkin, :logging
    end
  end
end

Here was a typical log output from that patch:

Started GET "/api/services/for_social_history" for 196.12.230.134 at 2012-10-05 14:31:30 +0000
14:31:30,894 INFO  [ShareaholicOnRails] (http-/10.191.202.8:8080-4) Checking out a new ActiveRecord connection. used=188, available=200
14:31:30,895 INFO  [ShareaholicOnRails] (http-/10.191.202.8:8080-4) Checkout successful. used=188, available=200
14:31:30,897 INFO  [ShareaholicOnRails] (http-/10.191.202.8:8080-4) Processing by Api::Share::ApiController#for_social_history as HTML
14:31:30,898 INFO  [ShareaholicOnRails] (http-/10.191.202.8:8080-4) Completed 200 OK in 0ms (Views: 0.0ms | ActiveRecord: 0.0ms)
14:31:30,900 INFO  [ShareaholicOnRails] (http-/10.191.202.8:8080-4) Checking in ActiveRecord connection. used=188, available=200
14:31:30,900 INFO  [ShareaholicOnRails] (http-/10.191.202.8:8080-4) Checkin successful. used=188, available=200
14:31:31,185 INFO  [ShareaholicOnRails] (http-/10.191.202.8:8080-4) 

Notice that the connection gets checked back in, but is still marked as used by the connection pool. It is possible that my logging statements are not threadsafe, but used=n only increments as the application runs; it never goes down even when logging a connection checkin.

What makes us lose these connections indefinitely is when they don't get checked back in and then a thread terminates. At that point the connection is lost for good. This explains why even though we only have 10 threads fielding requests at any given time, we ran out of our 200+ database connections. Threads get decommissioned and recommissioned, and every time the former happens we lose a database connection.

I was able to work around this issue by setting a discreet number of threads that run in perpetuity. This way the leak is controlled to exactly one connection per thread, and those threads last forever so they will always hold that single connection. It's an inelegant fix, but it is predictable and safe so long as our database connection pool is as big as our number of perpetual threads, and there are no other resources pulling from that same thread pool.

While this workaround will hold us over indefinitely, I would still appreciate an explanation of why ActiveRecord was designed to work this way. It seems to me a very peculiar and irresponsible architectural decision to hold that ActiveRecord is not responsible for returning its own connections to its own pool. It violates very basic principles of OOP and thread safety and I have not seen an explanation anywhere of why it was built to work like this.

@steveklabnik
Collaborator

Iiiiinteresting. @tenderlove or @jonleighton ?

@rabidpraxis

Over the weekend we ran into this issue as well. Using JRuby and Torquebox.

@freerobby

@rabidpraxis Have you set org.torquebox.web.http.maxThreads in your standalone[-ha].xml? If not, you may wish to try setting that to the max number of simultaneous requests that you anticipate having to serve, and be sure you have that many database connections (+ a couple extra) in your pool. Depending on how you load balance you run the risk of returning a 502 gateway error if there are no available boxes, but at least you'll know whether you're actually running out of threads, and you could perhaps add some logging to figure out where the threads are getting held up.

@rabidpraxis

@freerobby I did not. I don't know how much this changes things, but this issue came not from using the web component of Torquebox but while processing queues. I had concurrency set to 100 and a pool size of 200.

@freerobby

@rabidpraxis Try wrapping all of your backgrounded functions/jobs with an ensure block that contains:

  ensure
    ActiveRecord::Base.clear_active_connections!
  end

ActiveRecord does not check in the connections it checks out.

I've hit this wall every way imaginable and usually hang out in #torquebox on freenode if you want to have a more synchronous conversation on it.

@rabidpraxis

@freerobby I will give that a go, thanks for the help!

@shawndeprey

@freerobby

I came across the same error today, and, I am using this exact call as you suggest in this thread.

after do
    ActiveRecord::Base.clear_active_connections!
end

I am using jruby + sinatra + Active Record. Obviously this issue goes deeper than connections not being closed. Thoughts/updates on this issue?

@freerobby

@shawndeprey Can you provide some more context? Is this an after :each block in rspec or some place else? Also is it running in multithreaded mode? I assume you're running out of connections despite doing this?

@steveklabnik
Collaborator

after do is a synonym for after(:each).

@shawndeprey

@freerobby Hey thanks for getting back so quickly. Yes, I am running out of connections, but, I have been running this on my site for a few months now without issue. I am guessing the issue never arose because I never had enough connections at once to hit the connection limit.

My app isn't running in multithreaded mode. This after do block runs after each request to ensure connections are closed. Essentially, it's the last thing that happens when a user makes a request to my app.

@shawndeprey

@freerobby
I think I finally have a work around on this issue. Here is what I did to repro and remedy my issue.

I ran 30 threads(simulated users) against my app server which I set to use only 1 connection in its pool. That usually is enough to trigger this error. While using ONLY ActiveRecord::Base.clear_active_connections! after my connection as described above, my server had a hard time recovering from the bombardment of the simulated users, and I usually saw the error stick.

My work around is to do the following.

Before establishing a connection:

ActiveRecord::Base.clear_active_connections!

At the end of my connection do

after do
    ActiveRecord::Base.clear_active_connections!
    ActiveRecord::Base.connection.close
end

I don't know if this is the best solution, but, I have spent the last few hours bombarding my server, reproing this issue, and watching it work with my fix.

Please let me know what you think of my findings on this issue, and, I hope my findings help other AR users to remedy this issue.

@shawndeprey

@freerobby
@steveklabnik
@plentz

Hey again. It looks like my above post did help with the connection issue, but, after some much heavier load testing I am now running into the exact error in this post, and, it is easy to replicate. All you need to do is bombard your server with enough requests, and, Active Record breaks permanently with the following error on every request.

https://gist.github.com/4157459

I really hope that some repro on this helps to nail down the cause of the issue. Even if the issue itself isn't remedied, I think this paints a pretty clear picture of what is happening. Somehow connections are allowed to be cleared, and, when AR tries to see if the connection is in use, (i.e. '.in_use?'), then an error is thrown because it is trying to access a function of a nil object.

Please do let me know what you think about this issue along with possible remedies. If possible, could someone pass this thread to some other AR devs? I don't actually know who contributes to the project.

@tenderlove
Owner

@freerobby can you post the simple app you're using to reproduce the connection leaking?

AR checks out a connection for you so that AR::Base.connection will Just Work™. But since AR has no way of knowing when you are done with the connection, it can't just return connections to the pool (as you may still be using the connection).

The ConnectionManagement middleware will clear the connections at the end of a request, but if you're using the connection outside the request lifecycle, or inside a thread you've spawned yourself, then yes connections will absolutely leak.

A simple app to reproduce the issue would go a long way here. Thanks.

@freerobby

@tenderlove Unfortunately I only have our production app with some expanded logging via monkey patches (as posted above). I will try to put a sample app together after I deal with a few other pressing things on my plate. (I realize it makes a world of difference when troubleshooting).

Separately, with respect to ActiveRecord, is the concern that because AR publicly exposes its connection object, that it has no guarantee that the caller is not also using that connection object and therefore it is not save to release it? If so, do you think it makes sense to track and release the connection automatically iff ActiveRecord::Base.connection is never called outside of ActiveRecord? e.g.:

module ActiveRecord
  class Base
    def initialize
      @safe_to_relase = true
    end

    def connection
      @safe_to_release = false
      _connection
    end

    def some_internal_method_that_needs_a_connection
      conn = _connection
      # ... use the connection ...
    end

    private
    def _connection
      retrieve_connection
    end
  end
end
@shawndeprey

@freerobby
Small update on this. Currently to remedy our issues we increased our connection pool. It obviously isn't a fix, but, for the time being it is the only way we could figure to dance around this issue. I will continue to monitor this thread and provide any help/info I can to fix this weird bug.

@jeremy-cc

We are also running into this issue on Sinatra, TorqueBox 2.1.2 and AR 3.2.9+ - pretty much identical issues to freerobby. Will be trying the workaround suggested by shawndeprey above i.e. forcing AR to close connections. with_connection has not worked.

@warmwater

I have experienced the same problem on Rails, Trinidad and JRuby...

@thedarkone

Any chance of a small reproducible app/gist test? Is this still a problem on master (Rails 4)?

@warmwater

my temporary solution is to enable pool option in database.yml, so far so good...maybe not a good idea for large scale web service though

@schneems
Collaborator

If i understand correctly what is happening correctly: we are seeing a connection is being created in a new thread, and then for some reason that thread isn't being re-used so that connection is never re-used, and since that thread doesn't know the connection won't be used again it doesn't check it back into the pool. This sounds like expected behavior of the current implementation. Please correct me if i'm mis-understanding something.

In Rails 4 connection pools are on and set to 5 by default and reaping is turned on which (i believe) will help with this problem unless you're running out of connections faster then every 10 seconds. You can configure that setting like this https://devcenter.heroku.com/articles/concurrency-and-database-connections#bad-connections.

If i'm understanding the problem correctly, this issue is only open as a request for different unspecified behavior of Active Record connection pool. If we can get an application with a reproducible bug, we should open up a new issue, and reference this one...as of now it doesn't appear this thread is making any progress.

@plentz plentz added the stale label
@rafaelfranca
Owner

This issue has been automatically marked as stale because it has not been commented on for at least
three months.

The resources of the Rails team are limited, and so we are asking for your help.

If you can still reproduce this error on the 4-1-stable, 4-0-stable branches or on master,
please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions.

@rails-bot rails-bot closed this
@rails-bot
Collaborator

This issue has been automatically closed because of inactivity.

If you can still reproduce this error on the 4-1-stable, 4-0-stable branches or on master,
please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.