Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapter not failing over when db server unavailable (PostgreSQL) #32

Closed
jdowning opened this issue Jan 29, 2014 · 3 comments
Closed

Adapter not failing over when db server unavailable (PostgreSQL) #32

jdowning opened this issue Jan 29, 2014 · 3 comments

Comments

@jdowning
Copy link

First, @mnelson nice work on this adapter!

Similar to #27 I am using PostgreSQL 9.2 with a master and slave database server. The issue I'm running into is testing the adapter's ability to switch to the master when the slave becomes unavailable.

From the README:

If all slave nodes are blacklisted, the master connection will begin receiving read queries as if it were a slave.

The scenario I'm testing is starting the application with both database servers available:

  1. Request /sign_up
  2. Stop the slave database server
  3. Refresh /sign_up
Started GET "/sign_up" for 10.0.2.2 at 2014-01-29 16:18:00 +0000
Processing by UsersController#new as HTML
  [slave-db2] User Exists (1.0ms)  SELECT 1 AS one FROM "users" WHERE "users"."registration_ip" = '10.0.2.2' LIMIT 1
PG::ConnectionBad: PQconsumeInput() SSL connection has been closed unexpectedly
: SELECT  1 AS one FROM "users"  WHERE "users"."registration_ip" = '10.0.2.2' LIMIT 1
  Rendered users/_form.html.haml (10.1ms)
  Rendered users/new.html.haml within layouts/application (11.9ms)
Completed 500 Internal Server Error in 16ms

ActionView::Template::Error (PG::ConnectionBad: PQconsumeInput() SSL connection has been closed unexpectedly
: SELECT  1 AS one FROM "users"  WHERE "users"."registration_ip" = '10.0.2.2' LIMIT 1):
    14:   = form.error_messages
    15: 
    16: .register
    17:   %ul{ id: @user.ip_exists? ? :recaptcha : nil }
    18:     - if !@user.password_optional?
    19:       %li
    20:         = form.text_field :email, placeholder: "Email Address", id: "email"
  app/models/user/auth.rb:73:in `ip_exists?'
  app/views/users/_form.html.haml:17:in `_app_views_users__form_html_haml___4520667989552407035_70094826645900'
  app/views/users/new.html.haml:28:in `block in _app_views_users_new_html_haml__3771270572460248575_70094827057120'
  app/views/users/new.html.haml:25:in `_app_views_users_new_html_haml__3771270572460248575_70094827057120'

After I start the slave database server again, the query works okay:

Started GET "/sign_up" for 10.0.2.2 at 2014-01-29 16:18:52 +0000
  [master-db1] SCHEMA (0.8ms)  SET client_min_messages TO 'warning'
  [slave-db2] SCHEMA (0.8ms)  SET client_min_messages TO 'warning'
  [slave-db2] SCHEMA (0.7ms)  SHOW client_min_messages
  [master-db1] SCHEMA (0.8ms)  SET client_min_messages TO 'panic'
  [slave-db2] SCHEMA (0.8ms)  SET client_min_messages TO 'panic'
  [master-db1] SCHEMA (0.7ms)  SET standard_conforming_strings = on
  [slave-db2] SCHEMA (0.6ms)  SET standard_conforming_strings = on
  [master-db1] SCHEMA (0.5ms)  SET client_min_messages TO 'warning'
  [slave-db2] SCHEMA (0.6ms)  SET client_min_messages TO 'warning'
  [master-db1] SCHEMA (0.6ms)  SET time zone 'UTC'
  [slave-db2] SCHEMA (0.6ms)  SET time zone 'UTC'
Processing by UsersController#new as HTML
  [slave-db2] User Exists (2.9ms)  SELECT 1 AS one FROM "users" WHERE "users"."registration_ip" = '10.0.2.2' LIMIT 1
  [master-db1] CACHE (0.0ms)  SELECT 1 AS one FROM "users" WHERE "users"."registration_ip" = '10.0.2.2' LIMIT 1
  Rendered users/_form.html.haml (12.0ms)
  Rendered users/new.html.haml within layouts/application (14.5ms)
  Rendered layouts/_legacy_ie_styles.html.erb (0.3ms)
  Rendered layouts/_at_channels_nav.html.erb (0.2ms)
  Rendered layouts/_site_bar.html.haml (1.4ms)
  Rendered layouts/_navigation.html.haml (1.0ms)
  Rendered shared/_flash_messages.html.haml (0.2ms)
  Rendered layouts/_analytics_in_body.html.erb (0.2ms)
Completed 200 OK in 50ms (Views: 45.7ms | ActiveRecord: 3.0ms)

Here is my database.yml:

production:
  database: community_production
  adapter:  makara_postgresql
  username: ******
  password: ******
  encoding: unicode
  makara:
    blacklist_duration: 5
    master_ttl: 5
    sticky: true
    rescue_connection_failures: true

    connections:
      - role: master
        host: db1
        name: master-db1
      - role: slave
        host: db2
        name: slave-db2

One other issue is the rescue_connection_failures parameter.

  1. Start the application with the slave database down
  2. Request /sign_up and receive 200 OK
  3. Stop the master database
  4. Start the slave database
  5. Request /sign_up again
Started GET "/sign_up" for 10.0.2.2 at 2014-01-29 16:15:02 +0000
Processing by UsersController#new as HTML
  Rendered users/_form.html.haml (4.2ms)
  Rendered users/new.html.haml within layouts/application (5.6ms)
Completed 500 Internal Server Error in 8ms

ActionView::Template::Error (undefined method `_makara_blacklisted?' for nil:NilClass):
    14:   = form.error_messages
    15: 
    16: .register
    17:   %ul{ id: @user.ip_exists? ? :recaptcha : nil }
    18:     - if !@user.password_optional?
    19:       %li
    20:         = form.text_field :email, placeholder: "Email Address", id: "email"
  app/models/user/auth.rb:73:in `ip_exists?'
  app/views/users/_form.html.haml:17:in `_app_views_users__form_html_haml__606789934656786232_70264339597080'
  app/views/users/new.html.haml:28:in `block in _app_views_users_new_html_haml___3704897839609099056_70264339940800'
  app/views/users/new.html.haml:25:in `_app_views_users_new_html_haml___3704897839609099056_70264339940800'

Let me know if you need more data, testing. Thanks!

mnelson pushed a commit that referenced this issue Jan 29, 2014
@mnelson
Copy link
Contributor

mnelson commented Jan 29, 2014

The connection issue should be fixed in 1dea246. Can you get me a full trace from the _makara_blacklisted? issue? I'll work on recreating that scenario in a test.

@jdowning
Copy link
Author

@mnelson Nevermind that comment (I deleted here) since it was a bad config on my side.

@jdowning
Copy link
Author

It looks like 1dea246 did the trick for failing over. 😄

I'm going to close this and open a new issue to track the rescue connection bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants