Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 1 Easy Multi db in Rails: Add basic rake tasks for multi db setup #32274

Merged

Conversation

@eileencodes
Copy link
Member

@eileencodes eileencodes commented Mar 16, 2018

For multi db applications you always had to create your own rake tasks which made setting up multi db a major PITA. This PR is Part 1 of a many that adds the initial underpinning for supporting multiple databases through the rake db commands. I'm doing this in small PR's so that reviewing is easier.

This app can be used to test out the features here. Just clone and use the commands below to play with testing re rake tasks for create, migrate, drop, and dump. https://github.com/eileencodes/multiple_databases_demo

Examples below are assuming a three-tier database.yml like this:

development:
  primary:
    <<: *default
    database: multiple_databases_development
  animals:
    <<: *default
    database: multiple_databases_development_animals
    migrations_paths: "db/animals_migrate"

test:
  primary:
    <<: *default
    database: multiple_databases_test
  animals:
    <<: *default
    database: multiple_databases_test_animals
    migrations_paths: "db/animals_migrate"
  • Creates internal DatabaseConfig objects so it's easier to pass around the configs. We can then ask a DatabaseConfig for it's env or spec or config directly. This will come in handy for a larger refactoring I'm working on.
  • Ensures when running bin/rails db:create, bin/rails db:migrate, bin/rails db:drop, and bin/rails db:schema|structure:dump that tasks are run for all relevant envs and all databases in that env so given the above config bin/rails db:create will create the dev and test dbs for both primary and animals configs.
  • Adds new Rails db tasks that can perform create/migrate/drop on a specific database in an environment
    • bin/rails db:create:primary or bin/rails db:create:animals
    • bin/rails db:drop:primary or bin/rails db:drop:animals
    • bin/rails db:migrate:primary or bin/rails db:migrate:animals

Future parts of this work will:

  • Add more rake task support for multi db as this only does the first 4 major ones
  • Add documentation for multi-db three-tier database.yml applications
  • Add documentation for the rake tasks
  • Refactoring of the connection management to fix the assumption that the config lives directly under development (see #32271 for more info there on what I'm talking about).
  • Eventually porting these to Rails commands
  • Dealing with rw/ro connection setups

cc/ @matthewd @tenderlove @dhh

@eileencodes eileencodes added this to the 6.0.0 milestone Mar 16, 2018
@eileencodes eileencodes changed the title Part 1: Add rake tasks for multi db setup Part 1 Easy Multi db in Rails: Add basic rake tasks for multi db setup Mar 16, 2018
eileencodes added 5 commits Mar 16, 2018
Passing around and parsing hashes is easy if you know that it's a two
tier config and each key will be named after the environment and each
value will be the config for that environment key.

This falls apart pretty quickly with three-tier configs. We have no idea
what the second tier will be named (we know the first is primary but we
don't know the second), we have no easy way of figuring out
how deep a hash we have without iterating over it, and we'd have to do
this a lot throughout the code since it breaks all of Active Record's
assumptions regarding configurations.

These methods allow us to pass around objects instead. This will allow
us to more easily parse the configs for the rake tasks. Evenually I'd
like to replace the Active Record connection management that passes
around config hashes to use these methods as well but that's much
farther down the road.

`walk_configs` takes an environment, specification name, and a config
and turns them into DatabaseConfig struct objects so we can ask the
configs questions like:

```
db_config.spec_name
=> animals

db_config.env_name
=> development

db_config.config
{ :adapter => mysql etc }
```

`db_configs` loops through all given configurations and returns an array
of DatabaseConfig structs for each config in the yaml file.

and lastly `configs_for` takes an environment and either returns the
spec name and config if a block is given or returns an array of
DatabaseConfig structs just for the given environment.
If we have a three-tier yaml file like this:

```
development:
  primary:
    database: "development"
  animals:
    database: "development_animals"
    migrations_paths: "db/animals_migrate"
```

This will add db create/drop/and migrate tasks for each level of the
config under that environment.

```
bin/rails db:drop:primary
bin/rails db:drop:animals

bin/rails db:create:primary
bin/rails db:create:animals

bin/rails db:migrate:primary
bin/rails db:migrate:animals
```
`each_current_configuration` is used by create, drop, and other methods
to find the configs for a given environment and returning those to the
method calling them.

The change here allows for the database commands to operate on all the
configs in the environment. Previously we couldn't slice the hashes and
iterate over them becasue they could be two tier or could be three
tier. By using the database config structs we don't need to care whether
we're dealing with a three tier or two tier, we can just parse all the
configs based on the environment.

This makes it possible for us to run `bin/rails db:create` and it will
create all the configs for the dev and test environment ust like it does
for a two tier - it creates the db for dev and test. Now `db:create`
will create `primary` for dev and test, and `animals` for dev and test
if our database.yml looks like:

```
development:
  primary:
    etc
  animals:
    etc

test:
  primary:
    etc
  animals:
    etc
```

This means that `bin/rails db:create`, `bin/rails db:drop`, and
`bin/rails db:migrate` will operate on the dev and test env for both
primary and animals ds.
Adds the ability to dump the schema or structure files for mulitple
databases. Loops through the configs for a given env and sets a filename
based on the format, then establishes a connection for that config and
dumps into the file.
@eileencodes eileencodes force-pushed the eileencodes:part-1-add-rake-tasks-for-multi-db branch 5 times, most recently Mar 21, 2018
Moves the configs_for and DatabaseConfig struct into it's own file. I
was considering doing this in a future refactoring but our set up forced
me to move it now. You see there are `mattr_accessor`'s on the Core
module that have default settings. For example the `schema_format`
defaults to Ruby. So if I call `configs_for` or any methods in the Core
module it will reset the `schema_format` to `:ruby`. By moving it to
it's own class we can keep the logic contained and avoid this
unfortunate issue.

The second change here does a double loop over the yaml files. Bear with
me...

Our tests dictate that we need to load an environment before our rake
tasks because we could have something in an environment that the
database.yml depends on. There are side-effects to this and I think
there's a deeper bug that needs to be fixed but that's for another
issue. The gist of the problem is when I was creating the dynamic rake
tasks if the yaml that that rake task is calling evaluates code (like
erb) that calls the environment configs the code will blow up because
the environment is not loaded yet.

To avoid this issue we added a new method that simply loads the yaml and
does not evaluate the erb or anything in it. We then use that yaml to
create the task name. Inside the task name we can then call
`load_config` and load the real config to actually call the code
internal to the task. I admit, this is gross, but refactoring can't all
be pretty all the time and I'm working hard with `@tenderlove` to
refactor much more of this code to get to a better place re connection
management and rake tasks.
@eileencodes eileencodes force-pushed the eileencodes:part-1-add-rake-tasks-for-multi-db branch to 4e663c1 Mar 21, 2018
@eileencodes eileencodes merged commit 93e6b5c into rails:master Mar 26, 2018
2 checks passed
2 checks passed
codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@eileencodes eileencodes deleted the eileencodes:part-1-add-rake-tasks-for-multi-db branch Mar 26, 2018
@huskylengcb
Copy link

@huskylengcb huskylengcb commented Apr 4, 2018

Can mysql and PGSQL be used together?

@RouL
Copy link

@RouL RouL commented Apr 4, 2018

Seeng the question from @huskylengcb, I didn't check the code, but that would be perfect. Being able to use multiple databases on the same server, multiple databases on different servers and even multiple databases with different backends.
But I would guess at least, thats how it is meant anyways. :)

@eileencodes
Copy link
Member Author

@eileencodes eileencodes commented Apr 4, 2018

We use different mysql databases across different servers at GitHub already, so it "works", it just is really hard to set up. We're getting there but it will be a long process before multi db is easy in Rails.

@schneems
Copy link
Member

@schneems schneems commented Apr 10, 2018

Great work! Are there any preliminary docs around using multi-db with 6.0?

@eileencodes
Copy link
Member Author

@eileencodes eileencodes commented Apr 10, 2018

Thanks @schneems 😄

I haven't started docs because I think a lot more is going to change. This revealed some issues with connection management in Rails that I'm working on refactoring. Once I get more of that work done I can start the docs. I didn't want to start them before the code was written because I think it has a lot of potential to change in the coming months.

@Edouard-chin
Copy link
Member

@Edouard-chin Edouard-chin commented Apr 25, 2018

Thanks a lot for all the work you put so far in managing multiple db setup ♥️ .

A small heads up in case this wasn't thought of; loading the database.yml without evaluating erb first will break in case the database configuration contains erb loop or conditions

test:
  primary:
    <<: *default
    database: multiple_databases_test
  animals:
    <<: *default
    <% if something %> # This will break as it's invalid YML
    database: multiple_databases_test_animals
    migrations_paths: "db/animals_migrate"
    <% end %>

production:
  <% all_shards.each do |shard| %> # This will break as it's invalid YAML
  <%= shard.name %>:
    username: ...
  <% end %>
@eileencodes
Copy link
Member Author

@eileencodes eileencodes commented Apr 25, 2018

sigh I really think we need to limit what's possible in a database.yml. There are far too many edge cases. Do folks do this in their apps?

@Edouard-chin
Copy link
Member

@Edouard-chin Edouard-chin commented Apr 25, 2018

I really think we need to limit what's possible in a database.yml.

That would make our lives easier indeed, but not sure how we can do that since the config accepts erb tags, it should be able to handle any form of erb.

Do folks do this in their apps?

Not entirely sure how broad this is used but I have seen it few times and couple blog posts mentioning it to DRY the configuration. We also do that at Shopify in our config

@eileencodes
Copy link
Member Author

@eileencodes eileencodes commented Apr 25, 2018

Oh wait a minute. We don't evaluate it on the first round in for_each but we evaluate it once the task is created. The only thing we use the un-eval'd yaml for is to create the namespaces tasks like :animals, but then after that we load the eval'd yaml. So I think this is fine.

@eileencodes
Copy link
Member Author

@eileencodes eileencodes commented Apr 25, 2018

Er I see the prod db has different shards and needs the shard name for each task name. Does it actually break and throw an error or just not create the tasks?

@Edouard-chin
Copy link
Member

@Edouard-chin Edouard-chin commented Apr 25, 2018

It throws an error when trying to run the task (the yaml can't be parsed)

Psych::SyntaxError: (/app/config/database.yml): could not find expected ':' while scanning a simple key at line 387 column 3
@schneems
Copy link
Member

@schneems schneems commented Apr 25, 2018

sigh I really think we need to limit what's possible in a database.yml

If we're re-thinking the database experience, maybe it makes sense to do a non-yaml based config? Maybe config objects help eliminate some of these ambiguities?

@tenderlove
Copy link
Member

@tenderlove tenderlove commented Apr 25, 2018

I think we could eval the YAML before creating the rake tasks. The problem is that some people add conditionals based on the RAILS_ENV in the YAML file. Since we're trying to generate rake tasks independent of the RAILS_ENV, we can't set the RAILS_ENV, so their conditionals could be wrong.

IMO we should just run the YAML through ERB, then through the YAML parser, and if any exceptions happen along the way, we emit a warning and don't create the tasks. Does that sound reasonable?

@tenderlove
Copy link
Member

@tenderlove tenderlove commented Apr 25, 2018

It occurs to me that we could perform the eval but warn or raise an exception if people try to access ENV in the YAML:

require 'erb'

ENV["RAILS_ENV"] = "omg"

template = ERB.new(DATA.read)

require 'delegate'

class Foo
  class EnvDelegate < SimpleDelegator
    def [] key
      if key == "RAILS_ENV"
        warn "do not access RAILS_ENV"
        super
      else
        super
      end
    end
  end

  ENV = EnvDelegate.new(::ENV)

  def eval_template template
    template.result binding
  end
end

Foo.new.eval_template(template)

__END__
development:
  <% if ENV["RAILS_ENV"] == "omg" %>
  - wow
  <% else %>
  - neat
  <% end %>

Output:

[aaron@TC git]$ ruby testing.rb
do not access RAILS_ENV
@Edouard-chin
Copy link
Member

@Edouard-chin Edouard-chin commented Apr 25, 2018

Ah yes ok I understand the problem about the RAILS_ENV, thanks for the snippet.
The problem would happen for any condition tough, not only when checking the RAILS_ENV, right?

Are we specifically checking the RAILS_ENV because it might be one of the most common use case?

development:
  <% if something %>
  - wow
  <% else %>
  - neat
  <% end %>
@tenderlove
Copy link
Member

@tenderlove tenderlove commented Apr 25, 2018

@Edouard-chin ya. As long as the conditional doesn't depend on RAILS_ENV, then it should be fine to eval the ERB. I think we only need to protect against code that changes depending on the RAILS_ENV. In your example, as long as the something method call does not depend on the RAILS_ENV, then it's output should not change regardless of when or where we eval the ERB.

We don't know what the RAILS_ENV should be when defining the Rake tasks. However, as long as the ERB code does the same thing regardless of RAILS_ENV, then it should be OK to execute.

@koic koic mentioned this pull request May 22, 2018
7 of 8 tasks complete
eileencodes added a commit to eileencodes/rails that referenced this pull request Aug 29, 2018
We originally did the whole `load_database_yaml` thing because this test
wasn't cooperating and we needed to finish the namespaced rake tasks for
multiple databases.

However, it turns out that YAML can't eval ERB if you don't tell it it's
ERB so you get Pysch parse errors if you're using multi-line ERB or
ERB with conditionals. It's a hot mess.

After trying a few things and thinking it over we decided that it wasn't
worth bandaiding over, the test needed to be improved. The test was
added in rails#31135 to test that the env is loaded in these tasks. But it
was blowing up because we were trying to read a database name out of the
configuration - however that's not the purpose of this change. We want
to read environment files in the rake tasks, but not in the config
file.

In this PR we changed the test to test what the PR was actually fixing.
We've also deleted the `load_database_yaml` because it caused more
problems than it was worth. This should fix the issues described in
rails#32274 (comment). We
also had these problems at GitHub.

Co-authored-by: alimi <aibrahim2k2@gmail.com>
eileencodes added a commit to eileencodes/rails that referenced this pull request Mar 6, 2019
This change adds a new method that loads the YAML for the database
config without parsing the ERB. This may seem odd but bear with me:

When we added the ability to have rake tasks for multiple databases we
started looping through the configurations to collect the namespaces so
we could do `rake db:create:my_second_db`. See rails#32274.

This caused a problem where if you had `Rails.config.max_threads` set in
your database.yml it will blow up because the environment that defines
`max_threads` isn't loaded during `rake -T`. See rails#35468.

We tried to fix this by adding the ability to just load the YAML and
ignore ERB all together but that caused a bug in GitHub's YAML loading
where if you used multi-line ERB the YAML was invalid. That led us to
reverting some changes in rails#33748.

After trying to resolve this a bunch of ways `@tenderlove` came up with
replacing the ERB values so that we don't need to load the environment
but we also can load the YAML.

This change adds a DummyCompiler for ERB that will replace all the
values so we can load the database yaml and create the rake tasks.
Nothing else uses this method so it's "safe".

DO NOT use this method in your application.

Fixes rails#35468
eileencodes added a commit to eileencodes/rails that referenced this pull request Mar 6, 2019
This change adds a new method that loads the YAML for the database
config without parsing the ERB. This may seem odd but bear with me:

When we added the ability to have rake tasks for multiple databases we
started looping through the configurations to collect the namespaces so
we could do `rake db:create:my_second_db`. See rails#32274.

This caused a problem where if you had `Rails.config.max_threads` set in
your database.yml it will blow up because the environment that defines
`max_threads` isn't loaded during `rake -T`. See rails#35468.

We tried to fix this by adding the ability to just load the YAML and
ignore ERB all together but that caused a bug in GitHub's YAML loading
where if you used multi-line ERB the YAML was invalid. That led us to
reverting some changes in rails#33748.

After trying to resolve this a bunch of ways `@tenderlove` came up with
replacing the ERB values so that we don't need to load the environment
but we also can load the YAML.

This change adds a DummyCompiler for ERB that will replace all the
values so we can load the database yaml and create the rake tasks.
Nothing else uses this method so it's "safe".

DO NOT use this method in your application.

Fixes rails#35468
eileencodes added a commit to eileencodes/rails that referenced this pull request Mar 6, 2019
This change adds a new method that loads the YAML for the database
config without parsing the ERB. This may seem odd but bear with me:

When we added the ability to have rake tasks for multiple databases we
started looping through the configurations to collect the namespaces so
we could do `rake db:create:my_second_db`. See rails#32274.

This caused a problem where if you had `Rails.config.max_threads` set in
your database.yml it will blow up because the environment that defines
`max_threads` isn't loaded during `rake -T`. See rails#35468.

We tried to fix this by adding the ability to just load the YAML and
ignore ERB all together but that caused a bug in GitHub's YAML loading
where if you used multi-line ERB the YAML was invalid. That led us to
reverting some changes in rails#33748.

After trying to resolve this a bunch of ways `@tenderlove` came up with
replacing the ERB values so that we don't need to load the environment
but we also can load the YAML.

This change adds a DummyCompiler for ERB that will replace all the
values so we can load the database yaml and create the rake tasks.
Nothing else uses this method so it's "safe".

DO NOT use this method in your application.

Fixes rails#35468
eileencodes added a commit to eileencodes/rails that referenced this pull request Mar 6, 2019
This change adds a new method that loads the YAML for the database
config without parsing the ERB. This may seem odd but bear with me:

When we added the ability to have rake tasks for multiple databases we
started looping through the configurations to collect the namespaces so
we could do `rake db:create:my_second_db`. See rails#32274.

This caused a problem where if you had `Rails.config.max_threads` set in
your database.yml it will blow up because the environment that defines
`max_threads` isn't loaded during `rake -T`. See rails#35468.

We tried to fix this by adding the ability to just load the YAML and
ignore ERB all together but that caused a bug in GitHub's YAML loading
where if you used multi-line ERB the YAML was invalid. That led us to
reverting some changes in rails#33748.

After trying to resolve this a bunch of ways `@tenderlove` came up with
replacing the ERB values so that we don't need to load the environment
but we also can load the YAML.

This change adds a DummyCompiler for ERB that will replace all the
values so we can load the database yaml and create the rake tasks.
Nothing else uses this method so it's "safe".

DO NOT use this method in your application.

Fixes rails#35468
eileencodes added a commit to eileencodes/rails that referenced this pull request Mar 6, 2019
This change adds a new method that loads the YAML for the database
config without parsing the ERB. This may seem odd but bear with me:

When we added the ability to have rake tasks for multiple databases we
started looping through the configurations to collect the namespaces so
we could do `rake db:create:my_second_db`. See rails#32274.

This caused a problem where if you had `Rails.config.max_threads` set in
your database.yml it will blow up because the environment that defines
`max_threads` isn't loaded during `rake -T`. See rails#35468.

We tried to fix this by adding the ability to just load the YAML and
ignore ERB all together but that caused a bug in GitHub's YAML loading
where if you used multi-line ERB the YAML was invalid. That led us to
reverting some changes in rails#33748.

After trying to resolve this a bunch of ways `@tenderlove` came up with
replacing the ERB values so that we don't need to load the environment
but we also can load the YAML.

This change adds a DummyCompiler for ERB that will replace all the
values so we can load the database yaml and create the rake tasks.
Nothing else uses this method so it's "safe".

DO NOT use this method in your application.

Fixes rails#35468
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.