Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 8: Multi db improvements, Adds basic automatic database switching to Rails #35073

Merged
merged 1 commit into from
Jan 30, 2019

Conversation

eileencodes
Copy link
Member

The following PR adds behavior to Rails to allow an application to
automatically switch it's connection from the primary to the replica.

A request will be sent to the replica if:

  • The request is a read request (GET or HEAD)
  • AND It's been 5 seconds since the last write to the database (because
    we don't want to send a user to a replica if the write hasn't made it
    to the replica yet)

A request will be sent to the primary if:

  • It's not a GET/HEAD request (ie is a POST, PATCH, etc)
  • Has been less than 5 seconds since the last write to the database

The implementation that decides when to switch reads (the 5 seconds) is
"safe" to use in production but not recommended without adequate testing
with your infrastructure. At GitHub in addition to the 5 seconds since
last write we have a curcuit breaker that checks the replication delay
and will send the query to a replica before the 5 seconds has passed.
This is specific to our application and therefore not something Rails
should be doing for you. You'll need to test and implement more robust
handling of when to switch based on your infrastructure. The auto
switcher in Rails is meant to be a basic implementation / API that acts
as a guide for how to implement autoswitching.

The impementation here is meant to be strict enough that you know how to
implement your own resolver and operations classes but flexible enough
that we're not telling you how to do it.

The middleware is not included automatically and can be installed in
your application with the classes you want to use for the resolver and
operations passed in. If you don't pass any classes into the middleware
the Rails default Resolver and Session classes will be used.

The Resolver decides what parameters define when to
switch, Operations sets timestamps for the Resolver to read from. For
exmaple you may want to use cookies instead of a session so you'd
implement a Resolver::Cookies class and pass that into the middleware
like so:

config.middleware.use ActiveRecord::Middleware::DatabaseSelector,
Resolver, Resolver::Cookies

Your classes can inherit from the existing classes and reimplement the
methods (or implement more methods) that you need to do the switching.
You only need to implement methods that you want to change. For example
if you wanted to set the session token for the last read from a replica
you would reimplement the read_from_replica method in your resolver
class and implement a method that updates a new timestamp in your
operations class.

cc/ @tenderlove @rafaelfranca @matthewd @dhh

activerecord/CHANGELOG.md Outdated Show resolved Hide resolved
@kamipo
Copy link
Member

kamipo commented Jan 28, 2019

Not caused by this PR, but I'm concerned about using mixed :writing and :reading roles in the same request, since :reading (replica? is true) connection isn't clear the query cache implicitly even if the :writing connection of the reading is executed update queries.

Is it the user responsibility?

@eileencodes
Copy link
Member Author

eileencodes commented Jan 28, 2019

@kamipo I'm not sure I follow. Do you mean if during a request that was a get/head we did a write to the database the query cache wouldn't get cleared for that write? Something like this maybe

# get /something/download
def download
  @file = file_download
   ActiveRecord::Base.connected_to(role: :writing) do
     @file.update_download_count
   end

   send_data file
end

I think we have this type of code in the GitHub application and I don't think we have an issue, but let me know if that's the kind of thing you're worried about and I can try to get a test together (or we can solve this later / tell the user they should be responsible for it depending on how serious of an issue we think it is).

@kamipo
Copy link
Member

kamipo commented Jan 29, 2019

I'm worried about the following case. I suppose that people would be confused by the staled cache, since disabling query cache makes working all things as expected. kamipo@855f59c

kamipo/query_cache_test@3cb54dd

diff --git a/app/controllers/posts_controller.rb b/app/controllers/posts_controller.rb
index 90b34dd..1fa3439 100644
--- a/app/controllers/posts_controller.rb
+++ b/app/controllers/posts_controller.rb
@@ -26,7 +26,19 @@ class PostsController < ApplicationController
 
   # PATCH/PUT /posts/1
   def update
+    # (2)
+    #
+    # The following `@post.update(post_params)` is happened on the `:writing` connection,
+    # and then clear the cache on the `:writing` connection.
+    # The cache on the `:reading` connection is still remained.
     if @post.update(post_params)
+      # (3)
+      #
+      # If use the `:reading` connection after update queries are executed on the `:writing` connection,
+      # the query cache on the `:reading` connection is already staled,
+      # people need to care about that case by themselves for now.
+      set_post
+
       render json: @post
     else
       render json: @post.errors, status: :unprocessable_entity
@@ -41,7 +53,15 @@ class PostsController < ApplicationController
   private
     # Use callbacks to share common setup or constraints between actions.
     def set_post
-      @post = Post.find(params[:id])
+      # (1)
+      #
+      # All connection pools are enabling query cache by default.
+      # So the following `Post.find(params[:id])` makes the cache on the `:reading` connection.
+      #
+      # https://github.com/rails/rails/blob/536a190ab3690810a3b342b897f2585c4971229d/activerecord/lib/active_record/query_cache.rb#L31-L33
+      Post.connected_to(role: :reading) do
+        @post = Post.find(params[:id])
+      end
     end
 
     # Only allow a trusted parameter "white list" through.
diff --git a/test/controllers/posts_controller_test.rb b/test/controllers/posts_controller_test.rb
index a35e2e1..b85121d 100644
--- a/test/controllers/posts_controller_test.rb
+++ b/test/controllers/posts_controller_test.rb
@@ -24,8 +24,9 @@ class PostsControllerTest < ActionDispatch::IntegrationTest
   end
 
   test "should update post" do
-    patch post_url(@post), params: { post: { body: @post.body, title: @post.title } }, as: :json
+    patch post_url(@post), params: { post: { body: @post.body, title: "Updated" } }, as: :json
     assert_response 200
+    assert_match %r/"title":"Updated"/, @response.body
   end
 
   test "should destroy post" do
diff --git a/test/test_helper.rb b/test/test_helper.rb
index 0ff12e7..5d82a8b 100644
--- a/test/test_helper.rb
+++ b/test/test_helper.rb
@@ -10,4 +10,11 @@ class ActiveSupport::TestCase
   fixtures :all
 
   # Add more helper methods to be used by all tests here...
+
+  # `enlist_fixture_connections` replaces connection pools in non-default handlers
+  # by default writer connection pool.
+  # We can't test `:reading` connection unless suppressing the effect of the method for now.
+  def enlist_fixture_connections
+    []
+  end
 end

@eileencodes
Copy link
Member Author

Thanks for the example @kamipo. That is something users shouldn't do. You don't want to wrap the set_post in a reading only block because you don't know whether the replicas will be caught up or not. The request and middleware should handle that particular case for you.

Users should only use the reading block in their application when they know they absolutely want to send all traffic regardless of recent writes to the replicas. We do that in some places in our application as well but we're really careful about when we use it.

@kamipo
Copy link
Member

kamipo commented Jan 29, 2019

Thanks for your explaining, I understand that using the reading block carefully is the user responsibility.

I was encountered the staled query cache issue in our app before. In that time, I thought that the easiest way to solve that issue is not enabling query cache.

But I realized that we have no official way to not enabling query cache, since the QueryCache does two things, removing the enabling query cache causes removing the releasing active connection to pool too.

ActiveRecord::Base.connection_handlers.each do |key, handler|
pools << handler.connection_pool_list.reject { |p| p.query_cache_enabled }.each { |p| p.enable_query_cache! }
end

ActiveRecord::Base.connection_handlers.each do |_, handler|
handler.connection_pool_list.each do |pool|
pool.release_connection if pool.active_connection? && !pool.connection.transaction_open?
end

So people need to care about that the query cache in the replica connections isn't cleared in a request implicitly for now.

@sponomarev
Copy link
Contributor

Hey @eileencodes! Thanks for the awesome idea and the implementation! Looking forward to getting it merged. Do you think that it makes sense to add out-of-the-box resolvers for MySQL and PostgreSQL which take into account actual lag between primary/replica?

@grosser
Copy link
Contributor

grosser commented Jan 29, 2019

This looks pretty good, but it won't work for api calls that don't use a session, right ?
We use something similar, but we cache last-write based on user+ip to avoid that issue,
so would be great if it's easy to plug in a different "decider" later on and maybe call out this downside in the docs.

@tenderlove
Copy link
Member

@grosser the DatabaseSelector middleware is designed such that you can configure it with a strategy for determining "last write". In your config.ru just provide the DatabaseSelector with a classes that implement the strategy you would like to use. We just default the strategy to use the session.

activerecord/CHANGELOG.md Outdated Show resolved Hide resolved
activerecord/CHANGELOG.md Outdated Show resolved Hide resolved
@NikolayS
Copy link

NikolayS commented Jan 29, 2019

Cool and highly desired feature for many projects! Great to see it being implemented inside Rails.

But why 5 sec is hard-coded as a constant? I'm sure many would like to be able to configure it (raising to 30 seconds or even higher).

@fj
Copy link

fj commented Jan 29, 2019

Cool and highly desired feature for many projects! Great to see it being implemented inside Rails.

But why 5 sec is hard-coded as constant? I'm sore many would like to be able configure it (perhaps to higher values).

The default is hardcoded, but the actual value used is determined by this method:

https://github.com/rails/rails/pull/35073/files#diff-40ebc1f4683641b47eb0c9a447c08998R77

so one could simply write their own Resolver with whatever behavior is appropriate for their case, e.g.,

# ...
class VeryPatientResolver < Resolver
  def send_to_replica_wait_time
    60.seconds
  end
end

All that's actually needed is to have your own definition of read_from_primary?, though, like the docs say.

@NikolayS
Copy link

@fj thanks for the answer. Well, it's good that it can be changed without code patching, but with this way, it's still hard to change, it's not so clear.

I think that it would be much better if this parameter can be tuned via changing configuration files, not via coding. In big projects code changes and config changes are two very separated things.

@eileencodes
Copy link
Member Author

As requested by @dhh in the Rails chat I've added configuration options for the delay timeout, and setting the resolver and operations classes.

@eileencodes
Copy link
Member Author

To answer other unanswered questions here:

Scope creep is real 😄 and while I think it would be awesome to have Rails calculate replica lag it's too much for this PR. This is a good base for Rails 6.

Improving built in switching strategies is definitely something I want to see in the future. We're also experimenting with using MySQL GTID's for switching, but we're not there yet. We're not done improving multiple database and Active Record, and we have a lot more stuff to upstream from GitHub.

The following PR adds behavior to Rails to allow an application to
automatically switch it's connection from the primary to the replica.

A request will be sent to the replica if:

* The request is a read request (`GET` or `HEAD`)
* AND It's been 2 seconds since the last write to the database (because
we don't want to send a user to a replica if the write hasn't made it
to the replica yet)

A request will be sent to the primary if:

* It's not a GET/HEAD request (ie is a POST, PATCH, etc)
* Has been less than 2 seconds since the last write to the database

The implementation that decides when to switch reads (the 2 seconds) is
"safe" to use in production but not recommended without adequate testing
with your infrastructure. At GitHub in addition to the a 5 second delay
we have a curcuit breaker that checks the replication delay
and will send the query to a replica before the 5 seconds has passed.
This is specific to our application and therefore not something Rails
should be doing for you. You'll need to test and implement more robust
handling of when to switch based on your infrastructure. The auto
switcher in Rails is meant to be a basic implementation / API that acts
as a guide for how to implement autoswitching.

The impementation here is meant to be strict enough that you know how to
implement your own resolver and operations classes but flexible enough
that we're not telling you how to do it.

The middleware is not included automatically and can be installed in
your application with the classes you want to use for the resolver and
operations passed in. If you don't pass any classes into the middleware
the Rails default Resolver and Session classes will be used.

The Resolver decides what parameters define when to
switch, Operations sets timestamps for the Resolver to read from. For
example you may want to use cookies instead of a session so you'd
implement a Resolver::Cookies class and pass that into the middleware
via configuration options.

```
config.active_record.database_selector = { delay: 2.seconds }
config.active_record.database_resolver = MyResolver
config.active_record.database_operations = MyResolver::MyCookies
```

Your classes can inherit from the existing classes and reimplment the
methods (or implement more methods) that you need to do the switching.
You only need to implement methods that you want to change. For example
if you wanted to set the session token for the last read from a replica
you would reimplement the `read_from_replica` method in your resolver
class and implement a method that updates a new timestamp in your
operations class.
@eileencodes eileencodes changed the title Adds basic automatic database switching to Rails Part 8: Multi db improvements, Adds basic automatic database switching to Rails Jan 30, 2019
@eileencodes eileencodes merged commit 8ca6bd2 into rails:master Jan 30, 2019
@eileencodes eileencodes deleted the db-selection branch January 30, 2019 19:04
@l33z3r
Copy link

l33z3r commented Apr 2, 2019

Hi @eileencodes, really great work on the multi DB support. I am wondering, is it possible to support more than one read replica, and how you would go about it?

@eileencodes
Copy link
Member Author

You could do it by connecting to multiple read replicas with custom handlers but the autoswitching won't work for that (yet) since it only switches between 2 roles (writing and reading). At GitHub we have a separate layer that load balances the replicas so we only configure it once and they're all on the reading handler/role.

Can you tell me more about what strategy you'd use to choose a replica? It's something we could implement for Rails but since we don't use it at GitHub I'm not sure what the best strategy should be for a new Rails app.

@l33z3r
Copy link

l33z3r commented Apr 3, 2019

I know AWS Aurora has the ability to load balance replicas, but we are using a standard Postgres DB at the moment, and I am looking to set up a couple of read replicas to handle the load we have. The only way I can think to do this at the moment would be to implement some sort of round robin load balancing at the application layer. I guess it could be done at the controller level, but I was wondering if it would be nice to include such a thing in Rails, where you can configure a cluster that can be referenced by handler.

I would imagine you would have some setup in your database.yml that allows you to define a cluster under a namespace. Then this cluster namespace could be passed into connected_to

You would then have to implement a way of determining how the connections are iterated over in the cluster to ensure that each database gets roughly the same amount of traffic. This bit is not so easy, but maybe you could do it based off of the current user ID (or some other ID, maybe session??) modulo the number of replicas in the pool. Anyway, this would be something you could opt to configure in an initialiser. This is just something I have been considering implementing for our own application.

end

def reading_request?(request)
request.get? || request.head?
Copy link
Contributor

@bf4 bf4 Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eileencodes These methods are private, but is it otherwise a 'public' interface for building database selector subclasses? (I've read the through the docs and PRS. Apologies if I missed something or if would have been better to ask on one of the ruby on rails lists. Commenting here, even though code has since changed in other ways, because here it where it was introduced.)

My use case: I want to specify that a particular route is always connected to the leader, and adjusting reading_request? seems the right place to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, on Rails 6.0, this is what I ended up doing

config/application.rb

    require Rails.root.join 'lib', 'extensions', 'active_record', 'middleware_database_selector_patch'
    ActiveRecord::Middleware::DatabaseSelector.prepend(MiddlewareDatabaseSelectorPatch)
    if enable_request_db_selection
      config.active_record.database_selector = { delay: 2.seconds }
      config.active_record.database_resolver = ActiveRecord::Middleware::DatabaseSelector::Resolver
      config.active_record.database_resolver_context = ActiveRecord::Middleware::DatabaseSelector::Resolver::Session
    end

where

module MiddlewareDatabaseSelectorPatch

  def self.prepended(base)
    super
    base.class_eval do
      class_attribute :leader_only_paths, instance_writer: false, default: Set.new
    end
  end

  private

  # https://github.com/rails/rails/blob/6-0-stable/activerecord/lib/active_record/middleware/database_selector.rb
  def reading_request?(request)
    if (is_reading = super)
      is_leader_only_path = leader_only_paths.any? {|leader_only_path|
        request.path&.start_with?(leader_only_path)
      }
      !is_leader_only_path
    else
      is_reading
    end
  end
end

so in config/routes I can

  ActiveRecord::Middleware::DatabaseSelector.leader_only_paths << "/pghero"

I didn't test perf on it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar use case here. We want to send GraphQL queries to our read replica, but GQL queries always go over POST. A public API for this would be great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw @tenderlove 's comment that kind of answers this #35073 (comment)

I'm not sure I understand the config.ru part, is there an example floating around anywhere?

Copy link
Member

@ghiculescu ghiculescu Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking through this more, I think a simple API would be to move the reading_request?(request) method into the Resolver class.

That way, when you create a custom resolver you have two options:

  • Override reading_request? to send request to the read/write database based on request (eg. based on request method, path, or headers).
  • Override read_from_primary? to send queries to the primary/replica database based on the resolver context (or whatever else you like).

In our case we'd mostly be overriding reading_request?.

@eileencodes would you accept a PR that adds this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the classes are designed to be overwritten. If you want reading_request? to behave differently write your own DatabaseSelector and include that in the configuration. I don't see a reason to move this at the moment. If you want to discuss this more lets use http://discuss.rubyonrails.org/ or a new issue. Otherwise the conversation gets stuck here and it's harder to include others.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment too. Thanks @eileencodes for writing this great feature - this question is my only issue with something that's been very useful for us otherwise ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.