Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 8: Multi db improvements, Adds basic automatic database switching to Rails #35073

Merged
merged 1 commit into from Jan 30, 2019

Conversation

@eileencodes
Copy link
Member

commented Jan 28, 2019

The following PR adds behavior to Rails to allow an application to
automatically switch it's connection from the primary to the replica.

A request will be sent to the replica if:

  • The request is a read request (GET or HEAD)
  • AND It's been 5 seconds since the last write to the database (because
    we don't want to send a user to a replica if the write hasn't made it
    to the replica yet)

A request will be sent to the primary if:

  • It's not a GET/HEAD request (ie is a POST, PATCH, etc)
  • Has been less than 5 seconds since the last write to the database

The implementation that decides when to switch reads (the 5 seconds) is
"safe" to use in production but not recommended without adequate testing
with your infrastructure. At GitHub in addition to the 5 seconds since
last write we have a curcuit breaker that checks the replication delay
and will send the query to a replica before the 5 seconds has passed.
This is specific to our application and therefore not something Rails
should be doing for you. You'll need to test and implement more robust
handling of when to switch based on your infrastructure. The auto
switcher in Rails is meant to be a basic implementation / API that acts
as a guide for how to implement autoswitching.

The impementation here is meant to be strict enough that you know how to
implement your own resolver and operations classes but flexible enough
that we're not telling you how to do it.

The middleware is not included automatically and can be installed in
your application with the classes you want to use for the resolver and
operations passed in. If you don't pass any classes into the middleware
the Rails default Resolver and Session classes will be used.

The Resolver decides what parameters define when to
switch, Operations sets timestamps for the Resolver to read from. For
exmaple you may want to use cookies instead of a session so you'd
implement a Resolver::Cookies class and pass that into the middleware
like so:

config.middleware.use ActiveRecord::Middleware::DatabaseSelector,
Resolver, Resolver::Cookies

Your classes can inherit from the existing classes and reimplement the
methods (or implement more methods) that you need to do the switching.
You only need to implement methods that you want to change. For example
if you wanted to set the session token for the last read from a replica
you would reimplement the read_from_replica method in your resolver
class and implement a method that updates a new timestamp in your
operations class.

cc/ @tenderlove @rafaelfranca @matthewd @dhh

@eileencodes eileencodes added this to the 6.0.0 milestone Jan 28, 2019

@eileencodes eileencodes force-pushed the eileencodes:db-selection branch from 7781389 to edbcfe3 Jan 28, 2019

Show resolved Hide resolved activerecord/CHANGELOG.md Outdated

@eileencodes eileencodes force-pushed the eileencodes:db-selection branch from edbcfe3 to 62eae4b Jan 28, 2019

@kamipo

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

Not caused by this PR, but I'm concerned about using mixed :writing and :reading roles in the same request, since :reading (replica? is true) connection isn't clear the query cache implicitly even if the :writing connection of the reading is executed update queries.

Is it the user responsibility?

@eileencodes

This comment has been minimized.

Copy link
Member Author

commented Jan 28, 2019

@kamipo I'm not sure I follow. Do you mean if during a request that was a get/head we did a write to the database the query cache wouldn't get cleared for that write? Something like this maybe

# get /something/download
def download
  @file = file_download
   ActiveRecord::Base.connected_to(role: :writing) do
     @file.update_download_count
   end

   send_data file
end

I think we have this type of code in the GitHub application and I don't think we have an issue, but let me know if that's the kind of thing you're worried about and I can try to get a test together (or we can solve this later / tell the user they should be responsible for it depending on how serious of an issue we think it is).

@kamipo

This comment has been minimized.

Copy link
Member

commented Jan 29, 2019

I'm worried about the following case. I suppose that people would be confused by the staled cache, since disabling query cache makes working all things as expected. kamipo@855f59c

kamipo/query_cache_test@3cb54dd

diff --git a/app/controllers/posts_controller.rb b/app/controllers/posts_controller.rb
index 90b34dd..1fa3439 100644
--- a/app/controllers/posts_controller.rb
+++ b/app/controllers/posts_controller.rb
@@ -26,7 +26,19 @@ class PostsController < ApplicationController
 
   # PATCH/PUT /posts/1
   def update
+    # (2)
+    #
+    # The following `@post.update(post_params)` is happened on the `:writing` connection,
+    # and then clear the cache on the `:writing` connection.
+    # The cache on the `:reading` connection is still remained.
     if @post.update(post_params)
+      # (3)
+      #
+      # If use the `:reading` connection after update queries are executed on the `:writing` connection,
+      # the query cache on the `:reading` connection is already staled,
+      # people need to care about that case by themselves for now.
+      set_post
+
       render json: @post
     else
       render json: @post.errors, status: :unprocessable_entity
@@ -41,7 +53,15 @@ class PostsController < ApplicationController
   private
     # Use callbacks to share common setup or constraints between actions.
     def set_post
-      @post = Post.find(params[:id])
+      # (1)
+      #
+      # All connection pools are enabling query cache by default.
+      # So the following `Post.find(params[:id])` makes the cache on the `:reading` connection.
+      #
+      # https://github.com/rails/rails/blob/536a190ab3690810a3b342b897f2585c4971229d/activerecord/lib/active_record/query_cache.rb#L31-L33
+      Post.connected_to(role: :reading) do
+        @post = Post.find(params[:id])
+      end
     end
 
     # Only allow a trusted parameter "white list" through.
diff --git a/test/controllers/posts_controller_test.rb b/test/controllers/posts_controller_test.rb
index a35e2e1..b85121d 100644
--- a/test/controllers/posts_controller_test.rb
+++ b/test/controllers/posts_controller_test.rb
@@ -24,8 +24,9 @@ class PostsControllerTest < ActionDispatch::IntegrationTest
   end
 
   test "should update post" do
-    patch post_url(@post), params: { post: { body: @post.body, title: @post.title } }, as: :json
+    patch post_url(@post), params: { post: { body: @post.body, title: "Updated" } }, as: :json
     assert_response 200
+    assert_match %r/"title":"Updated"/, @response.body
   end
 
   test "should destroy post" do
diff --git a/test/test_helper.rb b/test/test_helper.rb
index 0ff12e7..5d82a8b 100644
--- a/test/test_helper.rb
+++ b/test/test_helper.rb
@@ -10,4 +10,11 @@ class ActiveSupport::TestCase
   fixtures :all
 
   # Add more helper methods to be used by all tests here...
+
+  # `enlist_fixture_connections` replaces connection pools in non-default handlers
+  # by default writer connection pool.
+  # We can't test `:reading` connection unless suppressing the effect of the method for now.
+  def enlist_fixture_connections
+    []
+  end
 end
@eileencodes

This comment has been minimized.

Copy link
Member Author

commented Jan 29, 2019

Thanks for the example @kamipo. That is something users shouldn't do. You don't want to wrap the set_post in a reading only block because you don't know whether the replicas will be caught up or not. The request and middleware should handle that particular case for you.

Users should only use the reading block in their application when they know they absolutely want to send all traffic regardless of recent writes to the replicas. We do that in some places in our application as well but we're really careful about when we use it.

@kamipo

This comment has been minimized.

Copy link
Member

commented Jan 29, 2019

Thanks for your explaining, I understand that using the reading block carefully is the user responsibility.

I was encountered the staled query cache issue in our app before. In that time, I thought that the easiest way to solve that issue is not enabling query cache.

But I realized that we have no official way to not enabling query cache, since the QueryCache does two things, removing the enabling query cache causes removing the releasing active connection to pool too.

ActiveRecord::Base.connection_handlers.each do |key, handler|
pools << handler.connection_pool_list.reject { |p| p.query_cache_enabled }.each { |p| p.enable_query_cache! }
end

ActiveRecord::Base.connection_handlers.each do |_, handler|
handler.connection_pool_list.each do |pool|
pool.release_connection if pool.active_connection? && !pool.connection.transaction_open?
end

So people need to care about that the query cache in the replica connections isn't cleared in a request implicitly for now.

@sponomarev

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

Hey @eileencodes! Thanks for the awesome idea and the implementation! Looking forward to getting it merged. Do you think that it makes sense to add out-of-the-box resolvers for MySQL and PostgreSQL which take into account actual lag between primary/replica?

@grosser

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

This looks pretty good, but it won't work for api calls that don't use a session, right ?
We use something similar, but we cache last-write based on user+ip to avoid that issue,
so would be great if it's easy to plug in a different "decider" later on and maybe call out this downside in the docs.

@tenderlove

This comment has been minimized.

Copy link
Member

commented Jan 29, 2019

@grosser the DatabaseSelector middleware is designed such that you can configure it with a strategy for determining "last write". In your config.ru just provide the DatabaseSelector with a classes that implement the strategy you would like to use. We just default the strategy to use the session.

Show resolved Hide resolved activerecord/CHANGELOG.md Outdated
Show resolved Hide resolved activerecord/CHANGELOG.md Outdated
@NikolayS

This comment has been minimized.

Copy link

commented Jan 29, 2019

Cool and highly desired feature for many projects! Great to see it being implemented inside Rails.

But why 5 sec is hard-coded as a constant? I'm sure many would like to be able to configure it (raising to 30 seconds or even higher).

@fj

This comment has been minimized.

Copy link

commented Jan 29, 2019

Cool and highly desired feature for many projects! Great to see it being implemented inside Rails.

But why 5 sec is hard-coded as constant? I'm sore many would like to be able configure it (perhaps to higher values).

The default is hardcoded, but the actual value used is determined by this method:

https://github.com/rails/rails/pull/35073/files#diff-40ebc1f4683641b47eb0c9a447c08998R77

so one could simply write their own Resolver with whatever behavior is appropriate for their case, e.g.,

# ...
class VeryPatientResolver < Resolver
  def send_to_replica_wait_time
    60.seconds
  end
end

All that's actually needed is to have your own definition of read_from_primary?, though, like the docs say.

@NikolayS

This comment has been minimized.

Copy link

commented Jan 29, 2019

@fj thanks for the answer. Well, it's good that it can be changed without code patching, but with this way, it's still hard to change, it's not so clear.

I think that it would be much better if this parameter can be tuned via changing configuration files, not via coding. In big projects code changes and config changes are two very separated things.

@eileencodes eileencodes force-pushed the eileencodes:db-selection branch from 62eae4b to fd6d4ed Jan 29, 2019

@rails-bot rails-bot bot added the railties label Jan 29, 2019

@eileencodes eileencodes force-pushed the eileencodes:db-selection branch from fd6d4ed to 68e2c2a Jan 29, 2019

@eileencodes eileencodes force-pushed the eileencodes:db-selection branch from 0a3782c to 722405c Jan 29, 2019

@eileencodes

This comment has been minimized.

Copy link
Member Author

commented Jan 29, 2019

As requested by @dhh in the Rails chat I've added configuration options for the delay timeout, and setting the resolver and operations classes.

@eileencodes

This comment has been minimized.

Copy link
Member Author

commented Jan 29, 2019

To answer other unanswered questions here:

Scope creep is real 😄 and while I think it would be awesome to have Rails calculate replica lag it's too much for this PR. This is a good base for Rails 6.

Improving built in switching strategies is definitely something I want to see in the future. We're also experimenting with using MySQL GTID's for switching, but we're not there yet. We're not done improving multiple database and Active Record, and we have a lot more stuff to upstream from GitHub.

Adds basic automatic database switching to Rails
The following PR adds behavior to Rails to allow an application to
automatically switch it's connection from the primary to the replica.

A request will be sent to the replica if:

* The request is a read request (`GET` or `HEAD`)
* AND It's been 2 seconds since the last write to the database (because
we don't want to send a user to a replica if the write hasn't made it
to the replica yet)

A request will be sent to the primary if:

* It's not a GET/HEAD request (ie is a POST, PATCH, etc)
* Has been less than 2 seconds since the last write to the database

The implementation that decides when to switch reads (the 2 seconds) is
"safe" to use in production but not recommended without adequate testing
with your infrastructure. At GitHub in addition to the a 5 second delay
we have a curcuit breaker that checks the replication delay
and will send the query to a replica before the 5 seconds has passed.
This is specific to our application and therefore not something Rails
should be doing for you. You'll need to test and implement more robust
handling of when to switch based on your infrastructure. The auto
switcher in Rails is meant to be a basic implementation / API that acts
as a guide for how to implement autoswitching.

The impementation here is meant to be strict enough that you know how to
implement your own resolver and operations classes but flexible enough
that we're not telling you how to do it.

The middleware is not included automatically and can be installed in
your application with the classes you want to use for the resolver and
operations passed in. If you don't pass any classes into the middleware
the Rails default Resolver and Session classes will be used.

The Resolver decides what parameters define when to
switch, Operations sets timestamps for the Resolver to read from. For
example you may want to use cookies instead of a session so you'd
implement a Resolver::Cookies class and pass that into the middleware
via configuration options.

```
config.active_record.database_selector = { delay: 2.seconds }
config.active_record.database_resolver = MyResolver
config.active_record.database_operations = MyResolver::MyCookies
```

Your classes can inherit from the existing classes and reimplment the
methods (or implement more methods) that you need to do the switching.
You only need to implement methods that you want to change. For example
if you wanted to set the session token for the last read from a replica
you would reimplement the `read_from_replica` method in your resolver
class and implement a method that updates a new timestamp in your
operations class.

@eileencodes eileencodes force-pushed the eileencodes:db-selection branch from 722405c to 0abcec4 Jan 30, 2019

@eileencodes eileencodes changed the title Adds basic automatic database switching to Rails Part 8: Multi db improvements, Adds basic automatic database switching to Rails Jan 30, 2019

@eileencodes eileencodes merged commit 8ca6bd2 into rails:master Jan 30, 2019

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
codeclimate All good!
Details

@eileencodes eileencodes deleted the eileencodes:db-selection branch Jan 30, 2019

@l33z3r

This comment has been minimized.

Copy link

commented Apr 2, 2019

Hi @eileencodes, really great work on the multi DB support. I am wondering, is it possible to support more than one read replica, and how you would go about it?

@eileencodes

This comment has been minimized.

Copy link
Member Author

commented Apr 2, 2019

You could do it by connecting to multiple read replicas with custom handlers but the autoswitching won't work for that (yet) since it only switches between 2 roles (writing and reading). At GitHub we have a separate layer that load balances the replicas so we only configure it once and they're all on the reading handler/role.

Can you tell me more about what strategy you'd use to choose a replica? It's something we could implement for Rails but since we don't use it at GitHub I'm not sure what the best strategy should be for a new Rails app.

@l33z3r

This comment has been minimized.

Copy link

commented Apr 3, 2019

I know AWS Aurora has the ability to load balance replicas, but we are using a standard Postgres DB at the moment, and I am looking to set up a couple of read replicas to handle the load we have. The only way I can think to do this at the moment would be to implement some sort of round robin load balancing at the application layer. I guess it could be done at the controller level, but I was wondering if it would be nice to include such a thing in Rails, where you can configure a cluster that can be referenced by handler.

I would imagine you would have some setup in your database.yml that allows you to define a cluster under a namespace. Then this cluster namespace could be passed into connected_to

You would then have to implement a way of determining how the connections are iterated over in the cluster to ensure that each database gets roughly the same amount of traffic. This bit is not so easy, but maybe you could do it based off of the current user ID (or some other ID, maybe session??) modulo the number of replicas in the pool. Anyway, this would be something you could opt to configure in an initialiser. This is just something I have been considering implementing for our own application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.