Part 4: Multi db improvements, Basic API for connection switching #34052
Conversation
def connected_to(database: nil, handler: nil, &blk) | ||
if database && handler | ||
raise ArgumentError, "connected_to can only accept handler or database, but not both arguments." | ||
end |
tenderlove
Oct 2, 2018
Member
Should we raise an exception if both are nil
?
Should we raise an exception if both are nil
?
I like that we can use
|
What would happen if two models have the same handler name for two different databases? Should we support that? Say:
Also say we have that scenario, how would the following code work? Dog.connected_to(hander: :reading) do
Dog.create!
Book.create!
end If I got the implementation correctly |
That's exactly what I want to support because that's how we do it at GitHub. We have 10 connections that belong to a In GitHub we'd don't write this: Dog.connected_to(handler: :reading) do
Dog.create! # explode from Dog bc doing a write on a read
Book.create! # isn't called but not because it's Dog's handler, you told Rails what handler to use - `reading`.
end Instead we write this (but with GitHub instead of Ar Base bc Rails doesn't support this yet) ActiveRecord::Base.connected_to(handler: :reading) do
Dog.create!
Book.create!
end If we want to write to multiple dbs we can do that by using the writing handler: ActiveRecord::Base.connected_to(handler: :writing) do
Dog.create! # success
Book.create! # success
end
|
@rafaelfranca I think we should support your first scenario, but if you want to switch both models you need to do it at |
I like this. A few notes: If you're connecting directly to a specific database, you shouldn't have to declare the role: ModelInPrimary.connected_to(database: :primary_replica_slow) do
ModelInPrimary.do_something_thats_slow
end Re: handler, I don't really like that word much. I'd prefer to use "role". That would connect with the future 3-tier database.yml configuration setup as well. So it would be: ActiveRecord::Base.connected_to(role: :reading) do
Dog.create!
Book.create! # Will raise if a :reading role isn't found on Book
end @tenderlove I'd be curious to see how many instances of connection switching you have in the code? I was initially partial to having some syntatic sugar, but I don't think switching roles mid-flight is going to be a super common action. And if it isn't, then I'd rather be as clear as possible about what's going on. On the larger topic of r/w splitting, @eileencodes, you're working towards a place where AR automatically will pick the :writing role when AR is doing INSERTs and :reading role when AR is doing SELECTs, right? I thought there was some confusion about whether that's within this initial scope of work when discussing with @matthewd in the earlier thread. |
I can change this. For background handler makes sense to me since it is switching on the connection_handler - but perhaps that's too much for the user to need to know.
Yes but this is further down the line (ie not for this PR). Rails needs to be able to switch connections before it can know what to switch to.
We actually do this quite a bit since we default to the replicas, expect in certain circumstances where we need to explicitly call readonly.
|
More than I thought. I was counting and then @eileencodes finished before me. |
So your default replicas are not readonlys? If we get AR to do the automatic r/w splitting, would you still need as many explicit calls? Or would you only need it when using slow-read dbs? |
A question: Can this switching be used as a failover? Let's say, a primary connection failed, switch to secondary (backup) one. |
@dhh we default to read but switch on the request type (GET == read, POST == write) rather than the sql query. So in some cases we need to switch back to the read or to the write in order to handle that. I assume we will need less of those if we have Rails auto switch based on SQL rather than request type. I think that if we really do need the helper methods we can add those later. @deepj No. We're quite a bit aways from something like that. |
Yes, we use the multiple handlers, but it seems the implementation store the name of the handlers in Would not:
fail because |
Ok I think I get it. The handler is the same for all models but it holds a connection pool for each model with a different I agree with DHH's suggestions for the API. |
Yup! That's exactly how it works. I'm writing up some tests and will be pushing up later this weekend or early next week. I think we're almost ready to merge this (with DHH's changes). That will unblock a lot of the future work. Also @matthewd originally had some concerns about threads but we paired today and found it's not a problem. The connection handler is thread local so we're good there |
291b558
to
008a3e6
|
008a3e6
to
7a609db
I think That's why I like |
Wow. I swear your post said |
7a609db
to
72f7bb9
|
This PR adds the ability to 1) connect to multiple databases in a model, and 2) switch between those connections using a block. To connect a model to a set of databases for writing and reading use the following API. This API supercedes `establish_connection`. The `writing` and `reading` keys represent handler / role names and `animals` and `animals_replica` represents the database key to look up the configuration hash from. ``` class AnimalsBase < ApplicationRecord connects_to database: { writing: :animals, reading: :animals_replica } end ``` Inside the application - outside the model declaration - we can switch connections with a block call to `connected_to`. If we want to connect to a db that isn't default (ie readonly_slow) we can connect like this: Outside the model we may want to connect to a new database (one that is not in the default writing/reading set) - for example a slow replica for making slow queries. To do this we have the `connected_to` method that takes a `database` hash that matches the signature of `connects_to`. The `connected_to` method also takes a block. ``` AcitveRecord::Base.connected_to(database: { slow_readonly: :primary_replica_slow }) do ModelInPrimary.do_something_thats_slow end ``` For models that are already loaded and connections that are already connected, `connected_to` doesn't need to pass in a `database` because you may want to run queries against multiple databases using a specific role/handler. In this case `connected_to` can take a `role` and use that to swap on the connection passed. This simplies queries - and matches how we do it in GitHub. Once you're connected to the database you don't need to re-connect, we assume the connection is in the pool and simply pass the handler we'd like to swap on. ``` ActiveRecord::Base.connected_to(role: :reading) do Dog.read_something_from_dog ModelInPrimary.do_something_from_model_in_primary end ```
72f7bb9
to
31021a8
Is this multi-DB support only for primary-replica setups? Curious to know if there are any plans for supporting migrations / rake tasks on different databases? In our case, we do all business logic using regular AR models connected to our main DB but we have a separate DB for logs (which don’t require ACID guarantees, can live on a DB with lower CPU / RAM specs, etc). What we currently do is pretty simple: we keep a To support this, our only custom piece of code is a mapping of paths in require_relative 'boot'
# ...
module Missive
class Application < Rails::Application
#
# Support rake tasks on separate logs DB
#
if ENV['DB'] == 'logs'
{
'config/database' => 'config/database_logs.yml',
'db' => 'db_logs',
'db/migrate' => 'db_logs/migrate',
'db/seeds.rb' => 'db_logs/seeds.rb',
}.each do |a, b|
config.paths[a] = Rails.root.join(b)
end
end
# ...
end
end This lets us run migrations on the logs DB by adding # main DB
bundle exec rake db:migrate:status
# logs DB
bundle exec rake db:migrate:status DB=logs The following part isn’t relevant to the question but for completeness, we configure log models (for web requests and background workers) like so: # app/models/log_record.rb
class LogRecord < ApplicationRecord
self.abstract_class = true
establish_connection begin
erb = File.read(Rails.root.join('config/database_logs.yml'))
YAML.load(ERB.new(erb).result)[Rails.env]
end
end
# app/models/request_log.rb
class RequestLog < LogRecord
# ...
end
# app/models/worker_log.rb
class WorkerLog < LogRecord
# ...
end As you can see, it was pretty trivial for us to implement this pattern, yet the |
…ishes connection Related to rails#34052
Ensure that the method raises with both `database` and `role` arguments Ensure that the method raises without `database` and `role` Related to rails#34052
Ensure that the method returns an array of established connections Related to rails#34052
Since both methods are public API I think it makes sense to add these tests in order to prevent any regression in the behavior of those methods after the 6.0 release. Exercise `connected_to` - Ensure that the method raises with both `database` and `role` arguments - Ensure that the method raises without `database` and `role` Exercise `connects_to` - Ensure that the method returns an array of established connections(as mentioned in the docs of the method) Related to rails#34052
It seems that
or am I missing something @eileencodes ? |
No you're not missing something, Rails does not yet handle joining across separate databases. We're working on supporting the ability for Rails to recognize the connections are different and to split up the queries into 2 selects but the join syntax isn't going to be possible across 2 machines. |
Can I suggest something? During
what do you think @eileencodes ? |
Perhaps a stupid question but does this enable:
Is there a way to add and or create databases at runtime? From what i understand now you have to append db config to databasy.yml I could not find a conclusive post on this. |
Nobody? |
I mean does rails 6 support multi tennant. Entity universities with 1 database per entity. or can we better use other not to be named solutions? |
As the docs note, no, not yet, Rails doesn't support sharding. |
Hi! May I ask what is the current status? I would like to be able to safely split reads/writes between master and slaves in a MySQL replication, automatically. Is this possible yet? Thanks in advance. |
Part 4: Multi db improvements, Basic API for connection switching rails/rails#34052
Awesome work! |
I had one question. Is it possible to specify a series of replicas in database.yml for reading so that when you do something like this
it can go to one of the different dbs you have specified in that group ... vs just one specific db? |
ActiveRecord::Base.connected_to(database: :key_logs) do . I want manual db switching. connected_to method is looking for adaptor details in database.yml. I have multiple database with same details. I don't want multiple schema . Rails 6 is working on schema switching? |
This PR implements the basic API requirements laid out in #33877 by DHH. The PR aims to focus only on implementing the
connects_to
andconnected_to
API. For now it does not tackle any configuration changes (we can hash that out in future PRs). If this API is acceptable I will add tests.cc/ @dhh @matthewd @rafaelfranca @tenderlove
This PR adds the ability to 1) connect to multiple databases in a model,
and 2) switch between those connections using a block.
To connect a model to a set of databases for writing and reading use
the following API. This API supersedes
establish_connection
. Thewriting
andreading
keys represent handler / mode names andanimals
andanimals_replica
represents the database key to look upthe configuration hash from.
Inside the application - outside the model declaration - we can switch
connections with a block call to
connected_to
.If we want to connect to a db that isn't default (ie readonly_slow) we
can connect like this:
Outside the model we may want to connect to a new database (one that is
not in the default writing/reading set) - for example a slow replica for
making slow queries. To do this we have the
connected_to
method thattakes a
database
hash that matches the signature ofconnects_to
. Theconnected_to
method also takes a block.For models that are already loaded and connections that are already
connected,
connected_to
doesn't need to pass in adatabase
becauseyou may want to run queries against multiple databases using a specific
mode/handler.
In this case
connected_to
can take ahandler
and use that to swap onthe connection passed. This simplies queries - and matches how we do it
in GitHub. Once you're connected to the database you don't need to
re-connect, we assume the connection is in the pool and simply pass the
handler we'd like to swap on.