New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 4: Multi db improvements, Basic API for connection switching #34052

Merged
merged 1 commit into from Oct 10, 2018

Conversation

Projects
None yet
9 participants
@eileencodes
Member

eileencodes commented Oct 2, 2018

This PR implements the basic API requirements laid out in #33877 by DHH. The PR aims to focus only on implementing the connects_to and connected_to API. For now it does not tackle any configuration changes (we can hash that out in future PRs). If this API is acceptable I will add tests.

cc/ @dhh @matthewd @rafaelfranca @tenderlove


This PR adds the ability to 1) connect to multiple databases in a model,
and 2) switch between those connections using a block.

To connect a model to a set of databases for writing and reading use
the following API. This API supersedes establish_connection. The
writing and reading keys represent handler / mode names and
animals and animals_replica represents the database key to look up
the configuration hash from.

class AnimalsBase < ApplicationRecord
  connects_to database: { writing: :animals, reading: :animals_replica }
end

Inside the application - outside the model declaration - we can switch
connections with a block call to connected_to.

If we want to connect to a db that isn't default (ie readonly_slow) we
can connect like this:

Outside the model we may want to connect to a new database (one that is
not in the default writing/reading set) - for example a slow replica for
making slow queries. To do this we have the connected_to method that
takes a database hash that matches the signature of connects_to. The
connected_to method also takes a block.

ModelInPrimary.connected_to(database: { slow_readonly: :primary_replica_slow }) do
  ModelInPrimary.do_something_thats_slow
end

For models that are already loaded and connections that are already
connected, connected_to doesn't need to pass in a database because
you may want to run queries against multiple databases using a specific
mode/handler.

In this case connected_to can take a handler and use that to swap on
the connection passed. This simplies queries - and matches how we do it
in GitHub. Once you're connected to the database you don't need to
re-connect, we assume the connection is in the pool and simply pass the
handler we'd like to swap on.

ActiveRecord::Base.connected_to(hander: :reading) do
  Dog.read_something_from_dog
  ModelInPrimary.do_something_from_model_in_primary
end

@eileencodes eileencodes added this to the 6.0.0 milestone Oct 2, 2018

@eileencodes eileencodes changed the title from Basic API for connection switching to WIP: Basic API for connection switching Oct 2, 2018

def connected_to(database: nil, handler: nil, &blk)
if database && handler
raise ArgumentError, "connected_to can only accept handler or database, but not both arguments."
end

This comment has been minimized.

@tenderlove

tenderlove Oct 2, 2018

Member

Should we raise an exception if both are nil?

@tenderlove

This comment has been minimized.

Member

tenderlove commented Oct 2, 2018

I like that we can use ActiveRecord::Base.connected_to(handler: :writing) { ... } to switch handlers, but it seems really long to write. I'm guessing that most folks will just have read/write replicas and probably name them "read" and "write" (that's what we do). Could we also introduce something like:

ActiveRecord::Base.for_writing { ... } that's just a synonym for ActiveRecord::Base.connected_to(handler: :writing) { ... }? If we follow a naming convention, it seems like we could shorten the code we have to write.

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Oct 2, 2018

What would happen if two models have the same handler name for two different databases? Should we support that? Say:

class Dog
  connects_to database: { writing: :animals, reading: :animals_replica }
end

class Book
  connects_to database: { writing: :things, reading: :things_replica }
end

Also say we have that scenario, how would the following code work?

Dog.connected_to(hander: :reading) do
  Dog.create!
  Book.create!
end

If I got the implementation correctly Book would use the same connection handler as Dog because all models share the same connection handler in ActiveRecord::Base.

@eileencodes

This comment has been minimized.

Member

eileencodes commented Oct 2, 2018

What would happen if two models have the same handler name for two different databases? Should we support that? Say:

That's exactly what I want to support because that's how we do it at GitHub. We have 10 connections that belong to a writing handler (we call it default) and 10 connections that belong to a reading handler (we call it readonly). I'm a little confused because I thought you said at Shopify you use the multiple handler approach as well? The underlying connection swapping behavior here isn't different from the original PR, just the public API is.

In GitHub we'd don't write this:

Dog.connected_to(handler: :reading) do
  Dog.create! # explode from Dog bc doing a write on a read
  Book.create! # isn't called but not because it's Dog's handler, you told Rails what handler to use - `reading`. 
end

Instead we write this (but with GitHub instead of Ar Base bc Rails doesn't support this yet)

ActiveRecord::Base.connected_to(handler: :reading) do
  Dog.create!
  Book.create!
end

If we want to write to multiple dbs we can do that by using the writing handler:

ActiveRecord::Base.connected_to(handler: :writing) do
  Dog.create! # success
  Book.create! # success
end

Dog and Book know which database they belong to because the model tells them. The connected_to method looks up the handler, and then Rails looks up the connection from that handler with the connection specification name.

@tenderlove

This comment has been minimized.

Member

tenderlove commented Oct 2, 2018

@rafaelfranca I think we should support your first scenario, but if you want to switch both models you need to do it at AR::Base as @eileencodes mentions. Maybe we should raise an exception if you don't call it on AR::Base?

@dhh

This comment has been minimized.

Member

dhh commented Oct 2, 2018

I like this. A few notes:

If you're connecting directly to a specific database, you shouldn't have to declare the role:

ModelInPrimary.connected_to(database: :primary_replica_slow) do
  ModelInPrimary.do_something_thats_slow
end

Re: handler, I don't really like that word much. I'd prefer to use "role". That would connect with the future 3-tier database.yml configuration setup as well. So it would be:

ActiveRecord::Base.connected_to(role: :reading) do
  Dog.create!
  Book.create! # Will raise if a :reading role isn't found on Book
end

@tenderlove I'd be curious to see how many instances of connection switching you have in the code? I was initially partial to having some syntatic sugar, but I don't think switching roles mid-flight is going to be a super common action. And if it isn't, then I'd rather be as clear as possible about what's going on.

On the larger topic of r/w splitting, @eileencodes, you're working towards a place where AR automatically will pick the :writing role when AR is doing INSERTs and :reading role when AR is doing SELECTs, right? I thought there was some confusion about whether that's within this initial scope of work when discussing with @matthewd in the earlier thread.

@eileencodes

This comment has been minimized.

Member

eileencodes commented Oct 2, 2018

If you're connecting directly to a specific database, you shouldn't have to declare the role

👍 I will work on changing this requirement. Currently I have it so it creates a new handler (or role) to organize the connections. But that's not actually necessary now that I think about it so I'll adjust this PR accordingly.

Re: handler, I don't really like that word much

I can change this. For background handler makes sense to me since it is switching on the connection_handler - but perhaps that's too much for the user to need to know.

On the larger topic of r/w splitting, @eileencodes, you're working towards a place where AR automatically will pick the :writing role when AR is doing INSERTs and :reading role when AR is doing SELECTs, right

Yes but this is further down the line (ie not for this PR). Rails needs to be able to switch connections before it can know what to switch to.

I'd be curious to see how many instances of connection switching you have in the code?

We actually do this quite a bit since we default to the replicas, expect in certain circumstances where we need to explicitly call readonly.

  • 296 block calls to readonly db in app and lib
  • 3 block calls to write db in lib
  • 6 block calls to a dynamic switcher that chooses the db in app and lib.
@tenderlove

This comment has been minimized.

Member

tenderlove commented Oct 2, 2018

I'd be curious to see how many instances of connection switching you have in the code?

More than I thought. I was counting and then @eileencodes finished before me. 😊

@dhh

This comment has been minimized.

Member

dhh commented Oct 2, 2018

So your default replicas are not readonlys? If we get AR to do the automatic r/w splitting, would you still need as many explicit calls? Or would you only need it when using slow-read dbs?

@deepj

This comment has been minimized.

Contributor

deepj commented Oct 3, 2018

A question: Can this switching be used as a failover? Let's say, a primary connection failed, switch to secondary (backup) one.

@eileencodes

This comment has been minimized.

Member

eileencodes commented Oct 3, 2018

@dhh we default to read but switch on the request type (GET == read, POST == write) rather than the sql query. So in some cases we need to switch back to the read or to the write in order to handle that. I assume we will need less of those if we have Rails auto switch based on SQL rather than request type. I think that if we really do need the helper methods we can add those later.

@deepj No. We're quite a bit aways from something like that.

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Oct 5, 2018

That's exactly what I want to support because that's how we do it at GitHub. We have 10 connections that belong to a writing handler (we call it default) and 10 connections that belong to a reading handler (we call it readonly). I'm a little confused because I thought you said at Shopify you use the multiple handler approach as well? The underlying connection swapping behavior here isn't different from the original PR, just the public API is.

Yes, we use the multiple handlers, but it seems the implementation store the name of the handlers in ActiveRecord::Base so it means two models can't have the same handler name because the later will override the former definition even if you define in two different models. Am I getting the implementation wrong?

Would not:

ActiveRecord::Base.connected_to(handler: :reading) do
  Dog.create!
  Book.create!
end

fail because Dog will be connected to the same database than Book (things, not animals given Book was define after Dog)?

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Oct 5, 2018

Ok I think I get it. The handler is the same for all models but it holds a connection pool for each model with a different connection_specification_name.

I agree with DHH's suggestions for the API.

👍 from me.

@eileencodes

This comment has been minimized.

Member

eileencodes commented Oct 5, 2018

Ok I think I get it. The handler is the same for all models but it holds a connection pool for each model with a different connection_specification_name.

Yup! That's exactly how it works.

I'm writing up some tests and will be pushing up later this weekend or early next week. I think we're almost ready to merge this (with DHH's changes). That will unblock a lot of the future work. 😄

Also @matthewd originally had some concerns about threads but we paired today and found it's not a problem. The connection handler is thread local so we're good there 👍

@eileencodes eileencodes changed the title from WIP: Basic API for connection switching to Part 4: Multi db improvements, Basic API for connection switching Oct 10, 2018

@eileencodes

This comment has been minimized.

Member

eileencodes commented Oct 10, 2018

  • Changed handler to mode
  • Changed connected_to to take a single database instead of a hash since that's for connecting to a specific database / role.
  • Raise if both mode and handler are nil in connected_to
  • Added tests
  • Added docs (note I don't think we're ready to update the guides yet so I left those. I'd rather be able to tell a cohesive story at the end)
@dhh

This comment has been minimized.

Member

dhh commented Oct 10, 2018

I think mode is an improvement over handler, but I'm not sure it's quite enough. If you think about modes when opening files, it's the mode with which you open the same file that's designated. Not a different file. So the extension here is that mode would refer to different ways of connecting to the same database, not to different databases.

That's why I like role. That implies that there are multiple actors (databases) that play a specific role in the application. And role is also flexible enough to work with things like :statistics or :analytics where mode would be awkward.

@eileencodes

This comment has been minimized.

Member

eileencodes commented Oct 10, 2018

Wow. I swear your post said mode and now I looked back and it says role. I will change it. Not sure how my brain confused those two... 😳

@eileencodes

This comment has been minimized.

Member

eileencodes commented Oct 10, 2018

mode switched to role 👍

Basic API for connection switching
This PR adds the ability to 1) connect to multiple databases in a model,
and 2) switch between those connections using a block.

To connect a model to a set of databases for writing and reading use
the following API. This API supercedes `establish_connection`. The
`writing` and `reading` keys represent handler / role names and
`animals` and `animals_replica` represents the database key to look up
the configuration hash from.

```
class AnimalsBase < ApplicationRecord
  connects_to database: { writing: :animals, reading: :animals_replica }
end
```

Inside the application - outside the model declaration - we can switch
connections with a block call to `connected_to`.

If we want to connect to a db that isn't default (ie readonly_slow) we
can connect like this:

Outside the model we may want to connect to a new database (one that is
not in the default writing/reading set) - for example a slow replica for
making slow queries. To do this we have the `connected_to` method that
takes a `database` hash that matches the signature of `connects_to`. The
`connected_to` method also takes a block.

```
AcitveRecord::Base.connected_to(database: { slow_readonly: :primary_replica_slow }) do
  ModelInPrimary.do_something_thats_slow
end
```

For models that are already loaded and connections that are already
connected, `connected_to` doesn't need to pass in a `database` because
you may want to run queries against multiple databases using a specific
role/handler.

In this case `connected_to` can take a `role` and use that to swap on
the connection passed. This simplies queries - and matches how we do it
in GitHub. Once you're connected to the database you don't need to
re-connect, we assume the connection is in the pool and simply pass the
handler we'd like to swap on.

```
ActiveRecord::Base.connected_to(role: :reading) do
  Dog.read_something_from_dog
  ModelInPrimary.do_something_from_model_in_primary
end
```

@eileencodes eileencodes merged commit fb8bee4 into rails:master Oct 10, 2018

2 checks passed

codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@eileencodes eileencodes deleted the eileencodes:connection-switching branch Oct 10, 2018

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Oct 10, 2018

❤️ 💚 💙 💛 💜

@rafaelfranca

This comment has been minimized.

Member

rafaelfranca commented Oct 10, 2018

Thank you so much @eileencodes for working on this.

# end
#
# Returns an array of established connections.
def connects_to(database: {})

This comment has been minimized.

@ianks

ianks Oct 11, 2018

Contributor

It would be nice to allow the user to explicitly be able to pass in a connection if necessary instead of having to pass in symbols. This would be more friendly to dynamic configurations.

@dreyks

This comment has been minimized.

dreyks commented Oct 12, 2018

cool! great features are coming!

what about stickiness? e.g. when a write occurs temporarily re-route reads to default db too

@jeremy

This comment has been minimized.

Member

jeremy commented Oct 12, 2018

@dreyks Please do start building it! 😍 This is the underlying API needed to implement things like automatic read/write splitting and stickiness.

@eileencodes

This comment has been minimized.

Member

eileencodes commented Oct 12, 2018

@jeremy I'm already working on automatic switching 😄

@pboling pboling referenced this pull request Oct 15, 2018

Open

Is this gem maintained? #490

@rafbm

This comment has been minimized.

Contributor

rafbm commented Oct 15, 2018

Is this multi-DB support only for primary-replica setups? Curious to know if there are any plans for supporting migrations / rake tasks on different databases? In our case, we do all business logic using regular AR models connected to our main DB but we have a separate DB for logs (which don’t require ACID guarantees, can live on a DB with lower CPU / RAM specs, etc).

What we currently do is pretty simple: we keep a /db_logs structure identical to the /db one along with a config/database_logs.yml next to config/database.yml, which have the regular YAML structure (nothing custom).

screen shot 2018-10-14 at 22 15 50

To support this, our only custom piece of code is a mapping of paths in config/application.rb:

require_relative 'boot'

# ...

module Missive
  class Application < Rails::Application
    #
    # Support rake tasks on separate logs DB
    #
    if ENV['DB'] == 'logs'
      {
        'config/database' => 'config/database_logs.yml',
        'db'              => 'db_logs',
        'db/migrate'      => 'db_logs/migrate',
        'db/seeds.rb'     => 'db_logs/seeds.rb',
      }.each do |a, b|
        config.paths[a] = Rails.root.join(b)
      end
    end

    # ...
  end
end

This lets us run migrations on the logs DB by adding DB=logs on the command line:

# main DB
bundle exec rake db:migrate:status

# logs DB
bundle exec rake db:migrate:status DB=logs

The following part isn’t relevant to the question but for completeness, we configure log models (for web requests and background workers) like so:

# app/models/log_record.rb
class LogRecord < ApplicationRecord
  self.abstract_class = true

  establish_connection begin
    erb = File.read(Rails.root.join('config/database_logs.yml'))
    YAML.load(ERB.new(erb).result)[Rails.env]
  end
end

# app/models/request_log.rb
class RequestLog < LogRecord
  # ...
end

# app/models/worker_log.rb
class WorkerLog < LogRecord
  # ...
end

As you can see, it was pretty trivial for us to implement this pattern, yet the config.paths bit in config/application.rb does feel like a little hack and it would be nice to have an official way to deal with migrations affecting separate DBs.

@rafbm

This comment has been minimized.

Contributor

rafbm commented Oct 15, 2018

Ah, silly me! I hadn’t thought of searching for other “Part X” pull requests from Eileen. Answer to my question is #32274. I don’t believe the preliminary changelog mentions these rake tasks; could be a good thing to add.

bogdanvlviv added a commit to bogdanvlviv/rails that referenced this pull request Nov 12, 2018

bogdanvlviv added a commit to bogdanvlviv/rails that referenced this pull request Nov 14, 2018

Exercise `connected_to`
Ensure that the method raises with both `database` and `role` arguments
Ensure that the method raises without `database` and `role`

Related to rails#34052

bogdanvlviv added a commit to bogdanvlviv/rails that referenced this pull request Nov 14, 2018

Exercise `connects_to`
Ensure that the method returns an array of established connections

Related to rails#34052

bogdanvlviv added a commit to bogdanvlviv/rails that referenced this pull request Nov 15, 2018

Exercise `connected_to` and `connects_to` methods
Since both methods are public API I think it makes sense to add these tests
in order to prevent any regression in the behavior of those methods after the 6.0 release.

Exercise `connected_to`
  - Ensure that the method raises with both `database` and `role` arguments
  - Ensure that the method raises without `database` and `role`

Exercise `connects_to`
  - Ensure that the method returns an array of established connections(as mentioned
    in the docs of the method)

Related to rails#34052
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment