-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement granular role and shard swapping #40370
Implement granular role and shard swapping #40370
Conversation
f44e574
to
f7c9127
Compare
1b937b6
to
95e0b0e
Compare
yield | ||
end | ||
else | ||
connection_handler.while_preventing_writes(prevent_writes) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bug here. We need to move the write prevention into the shard stack, otherwise we can switch connections but the writes prevention remains global. Alternatively it could be moved back to the connection level.
@eileencodes Hi, I have a query on how things would look for an application I'm working on. We have 2 types of abstract classes for AR. One is We have a multitenant architecture and all data of a particular tenant will always live in a single shard. So I plan on doing a connection switch at middlewares - rack / sidekiq middlewares. # frozen_string_literal: true
module Middleware
class Multitenancy
def initialize(app)
@app = app
end
def call(env)
shard = Shard.find_by(tenant_id_from_request_or_job_params)
ApplicationRecord.connected_to(shard: shard.handler, role: :primary) do
@app.call(env)
end
end
end
end However, when interacting with any table from the ApplicationRecord.connected_to(shard: :shard1, role: :primary) do
User.first # reads from ApplicationRecord shard1 primary
Shard.first # reads from GlobalRecord default primary
end Or should I tweak my middleware to always establish a connection to the global shard? GlobalRecord.connected_to(shard: :default, role: :primary) do
shard = Shard.find_by(tenant_id_from_request_or_job_params)
ApplicationRecord.connected_to(shard: shard.handler, role: :primary) do
@app.call(env)
end
end |
@ritikesh not at this time and this PR isn't going to aim to accomplish that. I'd like to keep commentary on this PR to the problem I'm trying to fix here and review of this code. Thanks! |
Fair enough @eileencodes. Apologies for spamming. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me. I'm testing this patch in our app as I write this comment.
I only have a little concern about performance since we are running an extra conditional every time we get a connection handler. I'm not sure if that makes a lot of difference, but I'm wondering if we can't push all this conditional code to an object that we cache in the first time we call the first method. But of course, in order to justify this optimization we need to measure to know if there is any impact or not, so I'm not sure if it worth doing it right now.
connection_handler.while_preventing_writes(prevent_writes) do | ||
self.role_and_shard_stack << { role: role, shard: shard, klass: self } | ||
return_value = yield | ||
return_value.load if return_value.is_a? ActiveRecord::Relation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the connection handler need to know about relation? Not that I have a better place to put this logic but I'm curious to know why we need to care about this after this new logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to get this test to pass https://github.com/rails/rails/blob/8f5f17cb62af2c616e763125547ea82e1c161ac6/activerecord/test/cases/connection_adapters/connection_handlers_multi_db_test.rb#L75-L84. We also use it in swap_connection_handler
for the legacy version. It was implemented in #38339. Otherwise the return value of the connected to block won't be loaded. We had a few situations at GitHub where teams though it was loading the relation but it wasn't. This resulted in loading a relation from the primary/writer when the author meant to load from the replica/reader database.
I tested in our application that also does heavy usage of the API in the connection handler and after flipping the flag to false all tests passed. |
37bf742
to
8f5f17c
Compare
There was a bug with the preventing writes code that's now been fixed. It took awhile to get the tests working because the legacy vs not legacy code was leaking in the tests. This wouldn't happen in a real app because the legacy handling is either on or off. The diff is a bit bigger now to accommodate for the duplicate testing around the 2 types of handling in preventing writes. For these I split the tests out into their own files. I also found other tests I had missed while doing that. I talked with @rafaelfranca in Basecamp a bit about whether the deprecations were really needed. The deprecations are more like deprecation as documentation since the behavior of the method changes if you're using legacy vs not using legacy. We could drop these but my concern is that they would then silently change behavior without notice. Most apps don't use these methods but a lot of ActiveRecord gems do. For now I'd like to leave them unless they prove problematic. Later today @seejohnrun and I will look into the potential performance issue now that the prevent writes code is fixed and tested. |
ca3920e
to
51092af
Compare
There was a semi-serious performance issue that we tracked down to be in the deprecations themselves for I'll merge tomorrow or early Thursday after doing some additional testing on our app. |
activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb
Outdated
Show resolved
Hide resolved
activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb
Outdated
Show resolved
Hide resolved
@@ -164,6 +164,7 @@ def load_defaults(target_version) | |||
|
|||
if respond_to?(:active_record) | |||
active_record.has_many_inversing = true | |||
active_record.legacy_connection_handling = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be an entry in new_framework_defaults_6_1.rb
for this setting. It's not obvious from the deprecation warnings where legacy_connection_handling = false
should be added, and it's probably a better experience if the user is prompted by the upgrade guide to enable it.
It's also not clear from the name alone what legacy_connection_handling = false
does. What do you think of naming it granular_connection_handling = true
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also not clear from the name alone what legacy_connection_handling = false does
I went with legacy_connection_handling
because it reads like something you don't want to use anymore, whereas granular_connection_handling
sounds like a nice feature to have. Additionally we maintained global ActiveRecord::Base behavior so using granular_connection_handling
as the config option reads like we're changing behavior when 99% of apps don't actually use multi-db handling so they don't need to worry about their connections.
think there should be an entry in new_framework_defaults_6_1.rb for this setting.
I'll add it, thanks.
and it's probably a better experience if the user is prompted by the upgrade guide to enable it.
We'll add that when we write the upgrade guide or if there's already a PR I'll add it there. We don't usually update the upgrade guide in master for each feature added.
51092af
to
df6a8e0
Compare
This change allows for a connection to be swapped on role or shard for a class. Previously calling `connected_to` would swap all the connections to a particular role or shard. Granular connection swapping is useful for swapping one connection to reading while leaving all other connection classes on writing. The public methods on connection handler have been updated to behave the same as they did previously on the different handlers. The difference however is instead of calling `ActiveRecord::Base.connection_handlers[:reading].clear_all_connections!` you now call `ActiveRecord::Base.connection_handler.clear_all_connections!` which will clear based on current role set by a `connected_to` block. Outside the context of a `connected_to` block, `clear_all_connections!` can take an optional parameter to clear specific connections by role. The major changes in this PR are: * We introduced a `legacy_connection_handling` configuration option that is set to true by default. It will be set to `false` for all new applications. * In the new connection handling there will be one only connection handler. Previously there was a connection handler for each role. Now the role is stored in the `PoolManager`. In order to maintain backwards compatibility we introduced a `LegacyPoolManager` to avoid duplicate conditionals. See diagram in PR body for changes to connection management. * `connected_to` will now use a stacked concurrent map to keep track of the connection for each class. For each opened block the `class`, `role`, and `shard` will be added to the stack, when the block is exited the `class`, `role`, `shard` array will be removed from the stack. * With these changes `ActiveRecord::Base.connected_to` will remain global. If called all connections in the block will use the `role` and `shard` that was switched to. If called with a parent class like `AnimalsRecord.connected_to` only models under `AnimalsRecord` will be switched and everything else will remain the same. Examples: Given an application we have a `User` model that inherits from `ApplicationRecord` and a `Dog` model that inherits from `AnimalsRecord`. `AnimalsRecord` and `ApplicationRecord` have writing and reading connections as well as shard `default`, `one`, and `two`. ```ruby ActiveRecord::Base.connected_to(role: :reading) do User.first # reads from default replica Dog.first # reads from default replica AnimalsRecord.connected_to(role: :writing, shard: :one) do User.first # reads from default replica Dog.first # reads from shard one primary end User.first # reads from default replica Dog.first # reads from default replica ApplicationRecord.connected_to(role: :writing, shard: :two) do User.first # reads from shard two primary Dog.first # reads from default replica end end ``` Things this PR does not solve: * Currently there is no API for swapping more than one but not all connections. Apps with many primaries may want to swap 3 but not all 10 connections. We plan to build an API for that in a followup PR. * The middleware remains the same and is using the global switching methods. Therefore at this time to use this new feature applications must manually switch connections. We will also address this in a followup PR. * The `schema_cache` is currently on the `PoolConfig`. We plan on trying to move this up to the `PoolManager` or elsewhere later on so each `PoolConfig` doesn't need to hold a reference to the `schema_cache`. Co-authored-by: John Crepezzi <john.crepezzi@gmail.com>
df6a8e0
to
31461d8
Compare
Spent some time testing this today on our app and found a couple of small issues. Mainly the deprecations around the methods on connection handler were incorrect - we have some code in fixtures that expects to not call |
2a30666
to
69a5fa9
Compare
This is a separate commit because I want it to be easy to revert if we change our minds. After some discussion I think it is confusing that you could swap shard but not role granularly in legacy mode. This change forces users to always either have global swapping until moved off legacy mode. This will prevent a situation where `AnimalsBase` can change the shard granularly but the role globally.
69a5fa9
to
6b110d7
Compare
Implement docs for rails#40370
Now that we have implemented granular connection swapping in rails#40370 we need a new API that will allow connections to multiple databases. The reason we need this API is it will prevent deep nesting in cases where we know that we want 3 of our 5 databases to connect to reading and leave the rest on writing. With this API, instead of writing: ```ruby AnimalsRecord.connected_to(role: :reading) do MealsRecord.connected_to(role: :reading) do Dog.first # read from animals replica Dinner.first # read from meals replica Person.first # read from primary writer end end ``` This API would allow you to write: ```ruby ActiveRecord::Base.connected_to_many([AnimalsRecord, MealsRecord], role: :reading) do Dog.first # read from animals replica Dinner.first # read from meals replica Person.first # read from primary writer end ``` This would come in especially handy for deeper nesting past 2 databases. Co-authored-by: John Crepezzi <john.crepezzi@gmail.com>
This PR implements the ability for an application to switch a role or shard for a specific owner class rather than for all classes globally. We maintained backwards compatibility for all applications currently using multiple databases by ensuring that
ActiveRecord::Base
remains global. While many applications don't use the public methods on connection_handler (we do at GitHub but it's not a common pattern) we implemented deprecation warnings since the behavior of what the methods are returning will changed. Most applications can simply setlegacy_connection_handling
tofalse
and no other changes would be required. We duplicated the tests for connection handling in order to test the legacy handling and new handling for backwards compatibility. This PR does not yet contain documentation. We'll add that once we've had some discussion on the new implementation.The changes are relatively minimal. The majority of the diff in this PR are updates to the tests as well as deprecation warnings / silencing warnings on framework level code. The new handling uses a single connection handler and stores the writing/reading/other roles in the
PoolManager
which allows us to switch a single parent class but not all classes. We chose not to implement the ability to switch individual models at this time. Only abstract models can switch connections.Previously connection handling worked like this diagram below:
With the changes here the new implementation works like this:
Commit Message
This change allows for a connection to be swapped on role or shard for a
class. Previously calling
connected_to
would swap all the connectionsto a particular role or shard. Granular connection swapping is useful
for swapping one connection to reading while leaving all other
connection classes on writing.
Any method with a public interface change or a method that changed the
return values throws a deprecation warning in order to preserve previous
behavior. These methods are silenced internally so that the framework
does not throw deprecation warnings.
The major changes in this PR are:
legacy_connection_handling
configuration option thatis set to true by default. It will be set to
false
for all newapplications.
handler. Previously there was a connection handler for each role. Now
the role is stored in the
PoolManager
. In order to maintain backwardscompatibility we introduced a
LegacyPoolManager
to avoid duplicateconditionals. See diagram in PR body for changes to connection
management.
connected_to
will now use a stacked concurrent map to keep track ofthe connection for each class. For each opened block the
class
,role
, andshard
will be added to the stack, when the block is exitedthe
class
,role
,shard
array will be removed from the stack.ActiveRecord::Base.connected_to
will remainglobal. If called all connections in the block will use the
role
andshard
that was switched to. If called with a parent class likeAnimalsRecord.connected_to
only models underAnimalsRecord
will beswitched and everything else will remain the same.
Examples:
Given an application we have a
User
model that inherits fromApplicationRecord
and aDog
model that inherits fromAnimalsRecord
.AnimalsRecord
andApplicationRecord
have writingand reading connections as well as shard
default
,one
, andtwo
.Things this PR does not solve:
connections. Apps with many primaries may want to swap 3 but not all 10
connections. We plan to build an API for that in a followup PR.
methods. Therefore at this time to use this new feature applications
must manually switch connections. We will also address this in a
followup PR.
schema_cache
is currently on thePoolConfig
. We plan on tryingto move this up to the
PoolManager
or elsewhere later on so eachPoolConfig
doesn't need to hold a reference to theschema_cache
.Co-authored-by: John Crepezzi john.crepezzi@gmail.com
cc/ @seejohnrun @rafaelfranca @matthewd @tenderlove @jhawthorn @casperisfine