Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent ActiveRecord::ConnectionNotEstablished error #47

Closed
english opened this issue Jun 30, 2014 · 15 comments
Closed

Intermittent ActiveRecord::ConnectionNotEstablished error #47

english opened this issue Jun 30, 2014 · 15 comments

Comments

@english
Copy link

english commented Jun 30, 2014

I have Que 0.7.3 in a Rails 4.1.1 application on MRI running Thin and using all default settings except for config.que.worker_count = 1 in my config/environments/development.rb. My ActiveRecord connection pool is set to 5 connections.

When I run rails s, on my last attempt, the app successfully started up 1 in 10 tries. For the other 9, I received the following output:

/Users/jamieenglish/projects/ee2e/.bundle/gems/activerecord-4.1.1/lib/active_record/connection_handling.rb:109:in `connection_pool': ActiveRecord::ConnectionNotEstablished (ActiveRecord::ConnectionNotEstablished)
        from /Users/jamieenglish/projects/ee2e/.bundle/gems/que-0.7.3/lib/que/adapters/active_record.rb:30:in `checkout_activerecord_adapter'
        from /Users/jamieenglish/projects/ee2e/.bundle/gems/que-0.7.3/lib/que/adapters/active_record.rb:5:in `checkout'
        from /Users/jamieenglish/projects/ee2e/.bundle/gems/que-0.7.3/lib/que/job.rb:82:in `work'
        from /Users/jamieenglish/projects/ee2e/.bundle/gems/que-0.7.3/lib/que/worker.rb:76:in `block in work_loop'
        from /Users/jamieenglish/projects/ee2e/.bundle/gems/que-0.7.3/lib/que/worker.rb:73:in `loop'
        from /Users/jamieenglish/projects/ee2e/.bundle/gems/que-0.7.3/lib/que/worker.rb:73:in `work_loop'
        from /Users/jamieenglish/projects/ee2e/.bundle/gems/que-0.7.3/lib/que/worker.rb:17:in `block in initialize'

This seems like a race condition, as if I add sleep 2 to the top of Que::Worker#work_loop all is fine.

Happy to provide a fix if pointed in the right direction.

@chanks
Copy link
Collaborator

chanks commented Jun 30, 2014

It's definitely not a good idea for Que to try to start working jobs before ActiveRecord has established the connection. It's strange that I never experienced that in my testing, I wonder if there's something in your setup that's causing this to happen.

Regardless, I think the appropriate thing to do is for the Railtie to ensure that the background workers are not started up until the connection is established. Hopefully ActiveRecord provides some kind of hook to do that, but I'm not sure. If you have time to investigate it that'd be great, otherwise I should have time to look into it tonight. Thanks for the report!

@english
Copy link
Author

english commented Jun 30, 2014

No problem! It is strange... I'm trying to pin down the exact circumstances where I trigger the error, but it seems quite random at the moment.

I'll have a go this evening also, although don't wait for me; I need to familiarise myself with the codebase first, so it could take a while.

@english
Copy link
Author

english commented Jun 30, 2014

So it seems the only way I could replicate this issue was as mentioned above; setting config.que.worker_count = 1 in config/environments/development.rb.

As I understand things, calling worker_count= here, this will have the side effect of instantiating a Worker which will in turn kick off its work_loop. This will lead to Job.work checking out a connection, that may not have been established yet. If you run work_loop inline, without a thread, then you will see this error consistently.

Its not nice, but the following works for me. In lib/que/adapters/active_record.rb:

def checkout_activerecord_adapter(&block)
  Timeout.timeout(1) do
    begin
      ::ActiveRecord::Base.connection_pool.with_connection(&block)
    rescue ::ActiveRecord::ConnectionNotEstablished => e
      puts "couldn't connect to db, retrying..."
      sleep 0.1
      retry
    end
  end
end

I can't see a nice place to handle this that's specific to the ActiveRecord adapter.

@chanks
Copy link
Collaborator

chanks commented Jun 30, 2014

Your understanding is right, but I think it's the responsibility of the railtie to be delaying the worker pool appropriately, when necessary. ActiveRecord does have an initializer that connects to the DB:

https://github.com/rails/rails/blob/v4.1.2/activerecord/lib/active_record/railtie.rb#L117

So, I'm thinking the clean solution to this is to make sure that the worker pool only starts up after that initializer fires (when ActiveRecord is being used at all, of course). I/We will have to dig more into Rails' initializer system to figure out the best way to do this.

I'm debating between:

  1. Change worker_count= to not start up the worker pool immediately - it only sets the size of the pool once it is activated. Then the railtie can set mode = :async and actually start the worker pool once the AR initializer has fired safely.
  2. Change the documentation to encourage people to only set worker_count in their config as a callback after that initializer fires. I'm not sure whether the initializer system is meant to be used by app config files, or just by gem railties.

I'm leaning towards number 1, I think.

@chanks
Copy link
Collaborator

chanks commented Jun 30, 2014

BTW, changing the behavior of worker_count won't be trivial, I think, so please don't feel obligated to submit a patch - I can take care of it. I mention the initializer solution because it might make for a cleaner stopgap solution for you.

@chanks
Copy link
Collaborator

chanks commented Jul 1, 2014

I started digging into this and it turned out to be thornier than I thought. I don't think I'll be able to get it into a releasable state before the weekend. In the meantime, if you need any help with your temporary fix, please let me know!

@english
Copy link
Author

english commented Jul 1, 2014

Thanks for looking into this issue, its really appreciated!

@joevandyk
Copy link
Contributor

I vote for option 1. It was confusing to see that setting worker_count starts the job processing. I'd be happy to see a Que.start method that starts the workers.

@chanks
Copy link
Collaborator

chanks commented Jul 6, 2014

Alright, would you try out master and see if it works for you? I've basically just made it so that mode and worker_count settings don't affect each other. So the worker pool won't hit the DB until the mode is set to :async, but the railtie will still so that for you. You should be able to set worker_count in your app configuration (like you were trying to do before) and everything should just work.

If this works I'll release it as 0.8.0, since it's a (minor) API change.

@english
Copy link
Author

english commented Jul 7, 2014

Thanks for the quick work. I've just tied against master and sadly I'm still seeing the problem. I either get ActiveRecord::ConnectionNotEstablished on start up or the worker simply does not pull jobs off the queue.

I'll spend some time this evening putting together a minimal Rails app to reproduce this error and point you to that. Hopefully that will rule out anything weird on my end and give us something solid to discuss.

@chanks
Copy link
Collaborator

chanks commented Jul 7, 2014

Alright, thanks!

english added a commit to english/que-issue-47 that referenced this issue Jul 7, 2014
@english
Copy link
Author

english commented Jul 7, 2014

Ok, so ignore what I said before about still seeing ActiveRecord::ConnectionNotEstablished errors, that has now gone 😄

However... if I set worker_count to 4 (the default) or less, then the workers do not start. https://github.com/jamienglish/que-issue-47 should demonstrate this issue. You also get the same problem (no workers starting) even if you don't set config.que.worker_count and keep to the defaults. The problem seems to be around https://github.com/chanks/que/blob/master/lib/que/worker.rb#L137-L141. Seems the workers only start when calling worker_count=(count) with count or 5 or greater.

@chanks
Copy link
Collaborator

chanks commented Jul 8, 2014

Alright, it turned out I hadn't put enough thought into how the worker pool
would be instantiated on startup. c3d131e should fix your problem - it
worked for me locally using your example app.

I don't want to make a release until I spend some time figuring out how
robust these changes are for forking setups, which I don't think I'll be
able to do until this weekend. So I hope you don't mind staying on edge for
a little while.

On Mon, Jul 7, 2014 at 12:55 PM, Jamie English notifications@github.com
wrote:

Ok, so ignore what I said before about still seeing
ActiveRecord::ConnectionNotEstablished errors, that has now gone [image:
😄]

However... if I set worker_count to 4 (the default) or less, then the
workers do not start. https://github.com/jamienglish/que-issue-47 should
demonstrate this issue. You also get the same problem (no workers starting)
even if you don't set config.que.worker_count and keep to the defaults.
The problem seems to be around
https://github.com/chanks/que/blob/master/lib/que/worker.rb#L137-L141.
Seems the workers only start when calling worker_count=(count) with count
or 5 or greater.


Reply to this email directly or view it on GitHub
#47 (comment).

@english
Copy link
Author

english commented Jul 9, 2014

That one did the trick! The app I was working on that had the original issue is now using rake for the workers, so the current release is fine for me. Cheers.

@chanks
Copy link
Collaborator

chanks commented Jul 13, 2014

I experimented with Unicorn a bit and things look good, so I released 0.8.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants