Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsure if PostgreSQL config is working as expected #27

Closed
AaronRustad opened this issue Nov 12, 2013 · 17 comments · Fixed by #87
Closed

Unsure if PostgreSQL config is working as expected #27

AaronRustad opened this issue Nov 12, 2013 · 17 comments · Fixed by #87

Comments

@AaronRustad
Copy link
Contributor

I've largely got Makara configured and working as expected, but while testing the fail-over capabilities of the slaves, I'm not seeing master take over. It's my understanding that if all the slaves become blacklisted, master should take over.

I'm able to successfully run the application using a single master and a single slave. I can see that when I modify records, the master database is used, and when I make reads, the slave is used. I'm able to bring down master and continue to read from the slave, and when I try to write, those commands fail.

However, if I leave master running and bring down my single slave, all reads fail. I believe all requests should be issued against master in this case, correct?

EDIT: When I say "bring down" I'm actually asking Postgres to gracefully shutdown....if that matters at all.

My config is as follows:


---
staging:
  adapter: makara_postgresql
  encoding: utf8
  username: cp
  password: xxxxxxxxxxx
  database: cp_staging
  makara:
    connections:
    - role: master
      host: i.mztest2.domain.com
      name: cp_master
    - role: slave
      host: i.mztest3.domain.com
      name: cp_slave

The Error I'm seeing:

could not connect to server: Connection refused Is the server running on host "i.mztest3.domain.com" (10.206.42.32) and accepting TCP/IP connections on port 5432?

activerecord (4.0.1) lib/active_record/connection_adapters/postgresql_adapter.rb:831:in `initialize'
activerecord (4.0.1) lib/active_record/connection_adapters/postgresql_adapter.rb:831:in `new'
activerecord (4.0.1) lib/active_record/connection_adapters/postgresql_adapter.rb:831:in `connect'
activerecord (4.0.1) lib/active_record/connection_adapters/postgresql_adapter.rb:548:in `initialize'
activerecord (4.0.1) lib/active_record/connection_adapters/postgresql_adapter.rb:41:in `new'
activerecord (4.0.1) lib/active_record/connection_adapters/postgresql_adapter.rb:41:in `postgresql_connection'
/mnt/srv/client_portal/shared/bundle/ruby/2.0.0/bundler/gems/makara-c9b067b5ada7/lib/active_record/connection_adapters/makara_postgresql_adapter.rb:35:in `connection_for'
/mnt/srv/client_portal/shared/bundle/ruby/2.0.0/bundler/gems/makara-c9b067b5ada7/lib/makara/proxy.rb:175:in `block in instantiate_connections'
/mnt/srv/client_portal/shared/bundle/ruby/2.0.0/bundler/gems/makara-c9b067b5ada7/lib/makara/proxy.rb:174:in `each'
/mnt/srv/client_portal/shared/bundle/ruby/2.0.0/bundler/gems/makara-c9b067b5ada7/lib/makara/proxy.rb:174:in `instantiate_connections'
/mnt/srv/client_portal/shared/bundle/ruby/2.0.0/bundler/gems/makara-c9b067b5ada7/lib/makara/proxy.rb:50:in `initialize'
/mnt/srv/client_portal/shared/bundle/ruby/2.0.0/bundler/gems/makara-c9b067b5ada7/lib/active_record/connection_adapters/makara_abstract_adapter.rb:58:in `initialize'
/mnt/srv/client_portal/shared/bundle/ruby/2.0.0/bundler/gems/makara-c9b067b5ada7/lib/active_record/connection_adapters/makara_postgresql_adapter.rb:8:in `new'
/mnt/srv/client_portal/shared/bundle/ruby/2.0.0/bundler/gems/makara-c9b067b5ada7/lib/active_record/connection_adapters/makara_postgresql_adapter.rb:8:in `makara_postgresql_connection'
activerecord (4.0.1) lib/active_record/connection_adapters/abstract/connection_pool.rb:440:in `new_connection'
activerecord (4.0.1) lib/active_record/connection_adapters/abstract/connection_pool.rb:450:in `checkout_new_connection'
activerecord (4.0.1) lib/active_record/connection_adapters/abstract/connection_pool.rb:421:in `acquire_connection'
activerecord (4.0.1) lib/active_record/connection_adapters/abstract/connection_pool.rb:356:in `block in checkout'
/usr/local/rvm/rubies/ruby-2.0.0-p0/lib/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'
activerecord (4.0.1) lib/active_record/connection_adapters/abstract/connection_pool.rb:355:in `checkout'
activerecord (4.0.1) lib/active_record/connection_adapters/abstract/connection_pool.rb:265:in `block in connection'
/usr/local/rvm/rubies/ruby-2.0.0-p0/lib/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'
activerecord (4.0.1) lib/active_record/connection_adapters/abstract/connection_pool.rb:264:in `connection'
activerecord (4.0.1) lib/active_record/connection_adapters/abstract/connection_pool.rb:546:in `retrieve_connection'
activerecord (4.0.1) lib/active_record/connection_handling.rb:79:in `retrieve_connection'
activerecord (4.0.1) lib/active_record/connection_handling.rb:53:in `connection'
activerecord (4.0.1) lib/active_record/query_cache.rb:51:in `restore_query_cache_settings'
activerecord (4.0.1) lib/active_record/query_cache.rb:43:in `rescue in call'
activerecord (4.0.1) lib/active_record/query_cache.rb:32:in `call'
activerecord (4.0.1) lib/active_record/connection_adapters/abstract/connection_pool.rb:626:in `call'
actionpack (4.0.1) lib/action_dispatch/middleware/callbacks.rb:29:in `block in call'
activesupport (4.0.1) lib/active_support/callbacks.rb:373:in `_run__3248633690540873971__call__callbacks'
activesupport (4.0.1) lib/active_support/callbacks.rb:80:in `run_callbacks'
actionpack (4.0.1) lib/action_dispatch/middleware/callbacks.rb:27:in `call'
/mnt/srv/client_portal/current/gems/apoc/lib/apoc/geo_ip/country_lookup.rb:62:in `call'
actionpack (4.0.1) lib/action_dispatch/middleware/remote_ip.rb:76:in `call'
airbrake (3.1.14) lib/airbrake/rails/middleware.rb:13:in `call'
actionpack (4.0.1) lib/action_dispatch/middleware/debug_exceptions.rb:17:in `call'
actionpack (4.0.1) lib/action_dispatch/middleware/show_exceptions.rb:30:in `call'
railties (4.0.1) lib/rails/rack/logger.rb:38:in `call_app'
railties (4.0.1) lib/rails/rack/logger.rb:22:in `call'
actionpack (4.0.1) lib/action_dispatch/middleware/request_id.rb:21:in `call'
rack (1.5.2) lib/rack/methodoverride.rb:21:in `call'
rack (1.5.2) lib/rack/runtime.rb:17:in `call'
activesupport (4.0.1) lib/active_support/cache/strategy/local_cache.rb:83:in `call'
actionpack (4.0.1) lib/action_dispatch/middleware/static.rb:64:in `call'
rack (1.5.2) lib/rack/sendfile.rb:112:in `call'
airbrake (3.1.14) lib/airbrake/user_informer.rb:16:in `_call'
airbrake (3.1.14) lib/airbrake/user_informer.rb:12:in `call'
railties (4.0.1) lib/rails/engine.rb:511:in `call'
railties (4.0.1) lib/rails/application.rb:97:in `call'
railties (4.0.1) lib/rails/railtie/configurable.rb:30:in `method_missing'
unicorn (4.6.3) lib/unicorn/http_server.rb:552:in `process_client'
unicorn (4.6.3) lib/unicorn/http_server.rb:632:in `worker_loop'
newrelic_rpm (3.6.7.152) lib/new_relic/agent/instrumentation/unicorn_instrumentation.rb:22:in `call'
newrelic_rpm (3.6.7.152) lib/new_relic/agent/instrumentation/unicorn_instrumentation.rb:22:in `block (4 levels) in <top (required)>'
unicorn (4.6.3) lib/unicorn/http_server.rb:500:in `spawn_missing_workers'
unicorn (4.6.3) lib/unicorn/http_server.rb:142:in `start'
unicorn (4.6.3) bin/unicorn_rails:209:in `<top (required)>'
/mnt/srv/client_portal/shared/bundle/ruby/2.0.0/bin/unicorn_rails:23:in `load'
/mnt/srv/client_portal/shared/bundle/ruby/2.0.0/bin/unicorn_rails:23:in `<main>'

Makara Gem: v0.2.0.beta6
Postgres : 9.2
Rails : 4.0.1
Ruby : 2.0.0

Thanks for your help!

@mnelson
Copy link
Contributor

mnelson commented Nov 12, 2013

Give https://github.com/taskrabbit/makara/releases/tag/v0.2.0.beta7 a try. Might be as simple as not handling that error (string) correctly. Let me know, thanks for the help in testing.

@AaronRustad
Copy link
Contributor Author

Thanks, unfortunately, it didn't help. It doesn't look like this code (connection_message?) actually gets called.

@mnelson
Copy link
Contributor

mnelson commented Nov 13, 2013

Oh, interesting. It's instantiating a new connection rather than invoking a reconnect on the current connection. I'll see if I can figure out a solution but the short story is that Makara is not handling connection issues upon initialization of the adapter, only after connected.

@mnelson
Copy link
Contributor

mnelson commented Nov 13, 2013

@AaronRustad are you able to see what the original exception is in your country_lookup.rb? I just need the error#message value. It doesn't look like it's being caught and as a result it is caught in the query_cache middleware which attempts to reconnect.

So as I see it there are two issues 1) Makara is not handling your original error properly 2) failure of reconnection on slave nodes should not blow up (I'll likely fix this via a makara config value).

@AaronRustad
Copy link
Contributor Author

I believe this may be the 'original' error:

PG::AdminShutdown: FATAL:  terminating connection due to administrator command FATAL:  terminating connection due to administrator command

Then it is followed by

PG::ConnectionBad could not connect to server: Connection refused Is the server running on host "i.mztest3.domain.com" (10.206.42.32) and accepting TCP/IP connections on port 5432?

@mnelson
Copy link
Contributor

mnelson commented Nov 13, 2013

1c444bd and 44bc082 may take care of these issues. I'll work on trying to produce similar real world situations on the mysql side of things. Still unsure of how to deal with nodes which are unable to connect. Currently if :rescue_connection_failures is true in the config I just skip them.

@mnelson
Copy link
Contributor

mnelson commented Jan 9, 2014

Closing due to inactivity and potential fixes have been merged.

@bmorton
Copy link

bmorton commented Oct 24, 2014

I was pretty stoked when I came across this gem, but I'm running into similar problems as here with PostgreSQL. On v0.3.0.rc3, I can't seem to get failover working properly either direction (losing primary or losing replica). If I roll back to the commit right before caeb8dc and set the rescue_connection_failures flag to true, I can lose my replica and those queries fail over to the primary. Losing the primary still throws connection exceptions and won't route requests to the replica.

I was setting up my primary/replica with Docker and the latest PostgreSQL. I have a simple, newly generated Rails app that I was using to replicate this. Will that stuff help? Is there somewhere else I can dig in and do some investigation? I'd love to help get this fixed for PostgreSQL however I can!

@mnelson
Copy link
Contributor

mnelson commented Oct 24, 2014

When you say failover isn't working properly are you talking about the initial connection or later in the lifecycle?

@bmorton
Copy link

bmorton commented Oct 24, 2014

I've been trying this a ton, so I might need to confirm which combination of things I was trying. I've seen it on initialization and after having a booted, running, and working app, killing the primary docker container and I don't see it continuing to read from the replica.

I'll get some concrete steps down for you today.

@mnelson
Copy link
Contributor

mnelson commented Oct 24, 2014

The first thing to try is a running app with one master and >1 slaves. Then bring one of the slaves down. Reads should continue from the other slave. If not, try to get the exact error message which Makara saw and ignored.

Makara no longer attempts to handle initial connection failures, as that turns into a can of worms.

@moneill
Copy link

moneill commented Oct 24, 2014

@mnelson I've been working with @bmorton on this. Here's a repo that contains steps to reproduce the issue he described above: https://github.com/moneill/makara-repro. In short, we start the app with the master and replicas running and working, then shut one down and see the following:

PG::ConnectionBad (could not connect to server: Connection refused
    Is the server running on host "192.168.59.103" and accepting
    TCP/IP connections on port 6432?
):
  activerecord (4.1.6) lib/active_record/connection_adapters/postgresql_adapter.rb:888:in `initialize'
  activerecord (4.1.6) lib/active_record/connection_adapters/postgresql_adapter.rb:888:in `new'
  activerecord (4.1.6) lib/active_record/connection_adapters/postgresql_adapter.rb:888:in `connect'
  activerecord (4.1.6) lib/active_record/connection_adapters/postgresql_adapter.rb:568:in `initialize'
  activerecord (4.1.6) lib/active_record/connection_adapters/postgresql_adapter.rb:41:in `new'
  activerecord (4.1.6) lib/active_record/connection_adapters/postgresql_adapter.rb:41:in `postgresql_connection'
...

@mnelson
Copy link
Contributor

mnelson commented Oct 24, 2014

This is happening with Rails.env == production? I'm basically wondering why post-startup and initial read pg_adapter#initialize is called. IIRC the pg adapter calls connect! within the initializer which is a problem I ran into in previous versions of Makara.

@bmorton
Copy link

bmorton commented Oct 24, 2014

Yeah, we tried it in both development and production modes to the same result.

@bmorton
Copy link

bmorton commented Oct 25, 2014

I've been doing some more digging and it looks like when Rails loses connection to any of the database connections, it goes and tries to instantiate a new ActiveRecord::ConnectionAdapters::MakaraPostgreSQLAdapter, which causes all the connections/configurations to be initialized again, which then causes the PG connect! call.

Playing with a couple potential solutions, but currently getting bit by ActiveRecord::ConnectionAdapters::ConnectionPool. Digging through Rails source trying to wrap my head around it.

@mnelson
Copy link
Contributor

mnelson commented Oct 25, 2014

Appreciate the digging. There's potentially a rails version issue here, as some people are using Makara + PG with success.

@bmorton
Copy link

bmorton commented Oct 28, 2014

Just followed down that path to see if I could identify some ActiveRecord differences that might point to something, but I can reproduce the same issue with Rails 3.2.19 and Rails 4.0.10 using the same boilerplate stuff that was added to the app that @moneill linked.

To be clear, everything appears to work fine if all the Postgres instances are up, but if one of them is down, requests fail no matter which of the Postgres instances are taken down.

I spot checked a couple different versions of the pg gem too, but nothing changed.

You mentioned that handling initial connection failures was a can of worms. From what I can tell from the history, it looks like that was supported at one point and then removed. I think that's when Postgres failover support stopped working. Would you mind talking a bit more about the initial connection failures stuff? Given the way the ActiveRecord adapter for Postgres works, I'm not sure there's a way around handling the initial connection failures.

The potential solution that I mentioned above is pretty hacky still. It basically makes ActiveRecord::ConnectionAdapters::MakaraPostgreSQLAdapter a singleton by saving the first instantiated object to a class variable and returning that upon subsequent instantiations. I'm not sure that's the right approach, but that allows things to keep working when a Postgres instance goes away.

The part that isn't solved with this approach (because of another hack I had to do to the valid? check on the adapter) is that when the instance comes back up, we never attempt to reconnect to the instance again. The blacklist timeout stuff works and we attempt to query it again, but we try to write to the same socket that closed and don't attempt to reconnect (again, because of some hacky things I had to put in place).

Ideas or thoughts? I'd love to help get this fixed up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants