Part 1 Easy Multi db in Rails: Add basic rake tasks for multi db setup #32274
Conversation
Passing around and parsing hashes is easy if you know that it's a two tier config and each key will be named after the environment and each value will be the config for that environment key. This falls apart pretty quickly with three-tier configs. We have no idea what the second tier will be named (we know the first is primary but we don't know the second), we have no easy way of figuring out how deep a hash we have without iterating over it, and we'd have to do this a lot throughout the code since it breaks all of Active Record's assumptions regarding configurations. These methods allow us to pass around objects instead. This will allow us to more easily parse the configs for the rake tasks. Evenually I'd like to replace the Active Record connection management that passes around config hashes to use these methods as well but that's much farther down the road. `walk_configs` takes an environment, specification name, and a config and turns them into DatabaseConfig struct objects so we can ask the configs questions like: ``` db_config.spec_name => animals db_config.env_name => development db_config.config { :adapter => mysql etc } ``` `db_configs` loops through all given configurations and returns an array of DatabaseConfig structs for each config in the yaml file. and lastly `configs_for` takes an environment and either returns the spec name and config if a block is given or returns an array of DatabaseConfig structs just for the given environment.
If we have a three-tier yaml file like this: ``` development: primary: database: "development" animals: database: "development_animals" migrations_paths: "db/animals_migrate" ``` This will add db create/drop/and migrate tasks for each level of the config under that environment. ``` bin/rails db:drop:primary bin/rails db:drop:animals bin/rails db:create:primary bin/rails db:create:animals bin/rails db:migrate:primary bin/rails db:migrate:animals ```
`each_current_configuration` is used by create, drop, and other methods to find the configs for a given environment and returning those to the method calling them. The change here allows for the database commands to operate on all the configs in the environment. Previously we couldn't slice the hashes and iterate over them becasue they could be two tier or could be three tier. By using the database config structs we don't need to care whether we're dealing with a three tier or two tier, we can just parse all the configs based on the environment. This makes it possible for us to run `bin/rails db:create` and it will create all the configs for the dev and test environment ust like it does for a two tier - it creates the db for dev and test. Now `db:create` will create `primary` for dev and test, and `animals` for dev and test if our database.yml looks like: ``` development: primary: etc animals: etc test: primary: etc animals: etc ``` This means that `bin/rails db:create`, `bin/rails db:drop`, and `bin/rails db:migrate` will operate on the dev and test env for both primary and animals ds.
Adds the ability to dump the schema or structure files for mulitple databases. Loops through the configs for a given env and sets a filename based on the format, then establishes a connection for that config and dumps into the file.
Moves the configs_for and DatabaseConfig struct into it's own file. I was considering doing this in a future refactoring but our set up forced me to move it now. You see there are `mattr_accessor`'s on the Core module that have default settings. For example the `schema_format` defaults to Ruby. So if I call `configs_for` or any methods in the Core module it will reset the `schema_format` to `:ruby`. By moving it to it's own class we can keep the logic contained and avoid this unfortunate issue. The second change here does a double loop over the yaml files. Bear with me... Our tests dictate that we need to load an environment before our rake tasks because we could have something in an environment that the database.yml depends on. There are side-effects to this and I think there's a deeper bug that needs to be fixed but that's for another issue. The gist of the problem is when I was creating the dynamic rake tasks if the yaml that that rake task is calling evaluates code (like erb) that calls the environment configs the code will blow up because the environment is not loaded yet. To avoid this issue we added a new method that simply loads the yaml and does not evaluate the erb or anything in it. We then use that yaml to create the task name. Inside the task name we can then call `load_config` and load the real config to actually call the code internal to the task. I admit, this is gross, but refactoring can't all be pretty all the time and I'm working hard with `@tenderlove` to refactor much more of this code to get to a better place re connection management and rake tasks.
4e663c1
Can mysql and PGSQL be used together? |
Seeng the question from @huskylengcb, I didn't check the code, but that would be perfect. Being able to use multiple databases on the same server, multiple databases on different servers and even multiple databases with different backends. |
We use different mysql databases across different servers at GitHub already, so it "works", it just is really hard to set up. We're getting there but it will be a long process before multi db is easy in Rails. |
Great work! Are there any preliminary docs around using multi-db with 6.0? |
Thanks @schneems I haven't started docs because I think a lot more is going to change. This revealed some issues with connection management in Rails that I'm working on refactoring. Once I get more of that work done I can start the docs. I didn't want to start them before the code was written because I think it has a lot of potential to change in the coming months. |
Thanks a lot for all the work you put so far in managing multiple db setup A small heads up in case this wasn't thought of; loading the test:
primary:
<<: *default
database: multiple_databases_test
animals:
<<: *default
<% if something %> # This will break as it's invalid YML
database: multiple_databases_test_animals
migrations_paths: "db/animals_migrate"
<% end %>
production:
<% all_shards.each do |shard| %> # This will break as it's invalid YAML
<%= shard.name %>:
username: ...
<% end %> |
sigh I really think we need to limit what's possible in a database.yml. There are far too many edge cases. Do folks do this in their apps? |
That would make our lives easier indeed, but not sure how we can do that since the config accepts erb tags, it should be able to handle any form of erb.
Not entirely sure how broad this is used but I have seen it few times and couple blog posts mentioning it to DRY the configuration. We also do that at Shopify in our config |
Oh wait a minute. We don't evaluate it on the first round in |
Er I see the prod db has different shards and needs the shard name for each task name. Does it actually break and throw an error or just not create the tasks? |
It throws an error when trying to run the task (the yaml can't be parsed)
|
If we're re-thinking the database experience, maybe it makes sense to do a non-yaml based config? Maybe config objects help eliminate some of these ambiguities? |
I think we could eval the YAML before creating the rake tasks. The problem is that some people add conditionals based on the IMO we should just run the YAML through ERB, then through the YAML parser, and if any exceptions happen along the way, we emit a warning and don't create the tasks. Does that sound reasonable? |
It occurs to me that we could perform the eval but warn or raise an exception if people try to access require 'erb'
ENV["RAILS_ENV"] = "omg"
template = ERB.new(DATA.read)
require 'delegate'
class Foo
class EnvDelegate < SimpleDelegator
def [] key
if key == "RAILS_ENV"
warn "do not access RAILS_ENV"
super
else
super
end
end
end
ENV = EnvDelegate.new(::ENV)
def eval_template template
template.result binding
end
end
Foo.new.eval_template(template)
__END__
development:
<% if ENV["RAILS_ENV"] == "omg" %>
- wow
<% else %>
- neat
<% end %> Output:
|
Ah yes ok I understand the problem about the RAILS_ENV, thanks for the snippet. Are we specifically checking the RAILS_ENV because it might be one of the most common use case? development:
<% if something %>
- wow
<% else %>
- neat
<% end %> |
@Edouard-chin ya. As long as the conditional doesn't depend on We don't know what the |
We originally did the whole `load_database_yaml` thing because this test wasn't cooperating and we needed to finish the namespaced rake tasks for multiple databases. However, it turns out that YAML can't eval ERB if you don't tell it it's ERB so you get Pysch parse errors if you're using multi-line ERB or ERB with conditionals. It's a hot mess. After trying a few things and thinking it over we decided that it wasn't worth bandaiding over, the test needed to be improved. The test was added in rails#31135 to test that the env is loaded in these tasks. But it was blowing up because we were trying to read a database name out of the configuration - however that's not the purpose of this change. We want to read environment files in the rake tasks, but not in the config file. In this PR we changed the test to test what the PR was actually fixing. We've also deleted the `load_database_yaml` because it caused more problems than it was worth. This should fix the issues described in rails#32274 (comment). We also had these problems at GitHub. Co-authored-by: alimi <aibrahim2k2@gmail.com>
This change adds a new method that loads the YAML for the database config without parsing the ERB. This may seem odd but bear with me: When we added the ability to have rake tasks for multiple databases we started looping through the configurations to collect the namespaces so we could do `rake db:create:my_second_db`. See rails#32274. This caused a problem where if you had `Rails.config.max_threads` set in your database.yml it will blow up because the environment that defines `max_threads` isn't loaded during `rake -T`. See rails#35468. We tried to fix this by adding the ability to just load the YAML and ignore ERB all together but that caused a bug in GitHub's YAML loading where if you used multi-line ERB the YAML was invalid. That led us to reverting some changes in rails#33748. After trying to resolve this a bunch of ways `@tenderlove` came up with replacing the ERB values so that we don't need to load the environment but we also can load the YAML. This change adds a DummyCompiler for ERB that will replace all the values so we can load the database yaml and create the rake tasks. Nothing else uses this method so it's "safe". DO NOT use this method in your application. Fixes rails#35468
This change adds a new method that loads the YAML for the database config without parsing the ERB. This may seem odd but bear with me: When we added the ability to have rake tasks for multiple databases we started looping through the configurations to collect the namespaces so we could do `rake db:create:my_second_db`. See rails#32274. This caused a problem where if you had `Rails.config.max_threads` set in your database.yml it will blow up because the environment that defines `max_threads` isn't loaded during `rake -T`. See rails#35468. We tried to fix this by adding the ability to just load the YAML and ignore ERB all together but that caused a bug in GitHub's YAML loading where if you used multi-line ERB the YAML was invalid. That led us to reverting some changes in rails#33748. After trying to resolve this a bunch of ways `@tenderlove` came up with replacing the ERB values so that we don't need to load the environment but we also can load the YAML. This change adds a DummyCompiler for ERB that will replace all the values so we can load the database yaml and create the rake tasks. Nothing else uses this method so it's "safe". DO NOT use this method in your application. Fixes rails#35468
This change adds a new method that loads the YAML for the database config without parsing the ERB. This may seem odd but bear with me: When we added the ability to have rake tasks for multiple databases we started looping through the configurations to collect the namespaces so we could do `rake db:create:my_second_db`. See rails#32274. This caused a problem where if you had `Rails.config.max_threads` set in your database.yml it will blow up because the environment that defines `max_threads` isn't loaded during `rake -T`. See rails#35468. We tried to fix this by adding the ability to just load the YAML and ignore ERB all together but that caused a bug in GitHub's YAML loading where if you used multi-line ERB the YAML was invalid. That led us to reverting some changes in rails#33748. After trying to resolve this a bunch of ways `@tenderlove` came up with replacing the ERB values so that we don't need to load the environment but we also can load the YAML. This change adds a DummyCompiler for ERB that will replace all the values so we can load the database yaml and create the rake tasks. Nothing else uses this method so it's "safe". DO NOT use this method in your application. Fixes rails#35468
This change adds a new method that loads the YAML for the database config without parsing the ERB. This may seem odd but bear with me: When we added the ability to have rake tasks for multiple databases we started looping through the configurations to collect the namespaces so we could do `rake db:create:my_second_db`. See rails#32274. This caused a problem where if you had `Rails.config.max_threads` set in your database.yml it will blow up because the environment that defines `max_threads` isn't loaded during `rake -T`. See rails#35468. We tried to fix this by adding the ability to just load the YAML and ignore ERB all together but that caused a bug in GitHub's YAML loading where if you used multi-line ERB the YAML was invalid. That led us to reverting some changes in rails#33748. After trying to resolve this a bunch of ways `@tenderlove` came up with replacing the ERB values so that we don't need to load the environment but we also can load the YAML. This change adds a DummyCompiler for ERB that will replace all the values so we can load the database yaml and create the rake tasks. Nothing else uses this method so it's "safe". DO NOT use this method in your application. Fixes rails#35468
This change adds a new method that loads the YAML for the database config without parsing the ERB. This may seem odd but bear with me: When we added the ability to have rake tasks for multiple databases we started looping through the configurations to collect the namespaces so we could do `rake db:create:my_second_db`. See rails#32274. This caused a problem where if you had `Rails.config.max_threads` set in your database.yml it will blow up because the environment that defines `max_threads` isn't loaded during `rake -T`. See rails#35468. We tried to fix this by adding the ability to just load the YAML and ignore ERB all together but that caused a bug in GitHub's YAML loading where if you used multi-line ERB the YAML was invalid. That led us to reverting some changes in rails#33748. After trying to resolve this a bunch of ways `@tenderlove` came up with replacing the ERB values so that we don't need to load the environment but we also can load the YAML. This change adds a DummyCompiler for ERB that will replace all the values so we can load the database yaml and create the rake tasks. Nothing else uses this method so it's "safe". DO NOT use this method in your application. Fixes rails#35468
For multi db applications you always had to create your own rake tasks which made setting up multi db a major PITA. This PR is Part 1 of a many that adds the initial underpinning for supporting multiple databases through the rake db commands. I'm doing this in small PR's so that reviewing is easier.
This app can be used to test out the features here. Just clone and use the commands below to play with testing re rake tasks for create, migrate, drop, and dump. https://github.com/eileencodes/multiple_databases_demo
Examples below are assuming a three-tier database.yml like this:
bin/rails db:create
,bin/rails db:migrate
,bin/rails db:drop
, andbin/rails db:schema|structure:dump
that tasks are run for all relevant envs and all databases in that env so given the above configbin/rails db:create
will create the dev and test dbs for both primary and animals configs.bin/rails db:create:primary
orbin/rails db:create:animals
bin/rails db:drop:primary
orbin/rails db:drop:animals
bin/rails db:migrate:primary
orbin/rails db:migrate:animals
Future parts of this work will:
cc/ @matthewd @tenderlove @dhh