New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 1 Easy Multi db in Rails: Add basic rake tasks for multi db setup #32274

Merged
merged 6 commits into from Mar 26, 2018

Conversation

Projects
None yet
6 participants
@eileencodes
Member

eileencodes commented Mar 16, 2018

For multi db applications you always had to create your own rake tasks which made setting up multi db a major PITA. This PR is Part 1 of a many that adds the initial underpinning for supporting multiple databases through the rake db commands. I'm doing this in small PR's so that reviewing is easier.

This app can be used to test out the features here. Just clone and use the commands below to play with testing re rake tasks for create, migrate, drop, and dump. https://github.com/eileencodes/multiple_databases_demo

Examples below are assuming a three-tier database.yml like this:

development:
  primary:
    <<: *default
    database: multiple_databases_development
  animals:
    <<: *default
    database: multiple_databases_development_animals
    migrations_paths: "db/animals_migrate"

test:
  primary:
    <<: *default
    database: multiple_databases_test
  animals:
    <<: *default
    database: multiple_databases_test_animals
    migrations_paths: "db/animals_migrate"
  • Creates internal DatabaseConfig objects so it's easier to pass around the configs. We can then ask a DatabaseConfig for it's env or spec or config directly. This will come in handy for a larger refactoring I'm working on.
  • Ensures when running bin/rails db:create, bin/rails db:migrate, bin/rails db:drop, and bin/rails db:schema|structure:dump that tasks are run for all relevant envs and all databases in that env so given the above config bin/rails db:create will create the dev and test dbs for both primary and animals configs.
  • Adds new Rails db tasks that can perform create/migrate/drop on a specific database in an environment
    • bin/rails db:create:primary or bin/rails db:create:animals
    • bin/rails db:drop:primary or bin/rails db:drop:animals
    • bin/rails db:migrate:primary or bin/rails db:migrate:animals

Future parts of this work will:

  • Add more rake task support for multi db as this only does the first 4 major ones
  • Add documentation for multi-db three-tier database.yml applications
  • Add documentation for the rake tasks
  • Refactoring of the connection management to fix the assumption that the config lives directly under development (see #32271 for more info there on what I'm talking about).
  • Eventually porting these to Rails commands
  • Dealing with rw/ro connection setups

cc/ @matthewd @tenderlove @dhh

@eileencodes eileencodes added this to the 6.0.0 milestone Mar 16, 2018

@eileencodes eileencodes changed the title from Part 1: Add rake tasks for multi db setup to Part 1 Easy Multi db in Rails: Add basic rake tasks for multi db setup Mar 16, 2018

eileencodes added some commits Mar 16, 2018

Add DatabaseConfig Struct and associated methods
Passing around and parsing hashes is easy if you know that it's a two
tier config and each key will be named after the environment and each
value will be the config for that environment key.

This falls apart pretty quickly with three-tier configs. We have no idea
what the second tier will be named (we know the first is primary but we
don't know the second), we have no easy way of figuring out
how deep a hash we have without iterating over it, and we'd have to do
this a lot throughout the code since it breaks all of Active Record's
assumptions regarding configurations.

These methods allow us to pass around objects instead. This will allow
us to more easily parse the configs for the rake tasks. Evenually I'd
like to replace the Active Record connection management that passes
around config hashes to use these methods as well but that's much
farther down the road.

`walk_configs` takes an environment, specification name, and a config
and turns them into DatabaseConfig struct objects so we can ask the
configs questions like:

```
db_config.spec_name
=> animals

db_config.env_name
=> development

db_config.config
{ :adapter => mysql etc }
```

`db_configs` loops through all given configurations and returns an array
of DatabaseConfig structs for each config in the yaml file.

and lastly `configs_for` takes an environment and either returns the
spec name and config if a block is given or returns an array of
DatabaseConfig structs just for the given environment.
Add create/drop/migrate db tasks for each database in the environment
If we have a three-tier yaml file like this:

```
development:
  primary:
    database: "development"
  animals:
    database: "development_animals"
    migrations_paths: "db/animals_migrate"
```

This will add db create/drop/and migrate tasks for each level of the
config under that environment.

```
bin/rails db:drop:primary
bin/rails db:drop:animals

bin/rails db:create:primary
bin/rails db:create:animals

bin/rails db:migrate:primary
bin/rails db:migrate:animals
```
Add ability to create/drop/migrate all dbs for a given env
`each_current_configuration` is used by create, drop, and other methods
to find the configs for a given environment and returning those to the
method calling them.

The change here allows for the database commands to operate on all the
configs in the environment. Previously we couldn't slice the hashes and
iterate over them becasue they could be two tier or could be three
tier. By using the database config structs we don't need to care whether
we're dealing with a three tier or two tier, we can just parse all the
configs based on the environment.

This makes it possible for us to run `bin/rails db:create` and it will
create all the configs for the dev and test environment ust like it does
for a two tier - it creates the db for dev and test. Now `db:create`
will create `primary` for dev and test, and `animals` for dev and test
if our database.yml looks like:

```
development:
  primary:
    etc
  animals:
    etc

test:
  primary:
    etc
  animals:
    etc
```

This means that `bin/rails db:create`, `bin/rails db:drop`, and
`bin/rails db:migrate` will operate on the dev and test env for both
primary and animals ds.
Update schema/structure dump tasks for multi db
Adds the ability to dump the schema or structure files for mulitple
databases. Loops through the configs for a given env and sets a filename
based on the format, then establishes a connection for that config and
dumps into the file.
Refactor configs_for and friends
Moves the configs_for and DatabaseConfig struct into it's own file. I
was considering doing this in a future refactoring but our set up forced
me to move it now. You see there are `mattr_accessor`'s on the Core
module that have default settings. For example the `schema_format`
defaults to Ruby. So if I call `configs_for` or any methods in the Core
module it will reset the `schema_format` to `:ruby`. By moving it to
it's own class we can keep the logic contained and avoid this
unfortunate issue.

The second change here does a double loop over the yaml files. Bear with
me...

Our tests dictate that we need to load an environment before our rake
tasks because we could have something in an environment that the
database.yml depends on. There are side-effects to this and I think
there's a deeper bug that needs to be fixed but that's for another
issue. The gist of the problem is when I was creating the dynamic rake
tasks if the yaml that that rake task is calling evaluates code (like
erb) that calls the environment configs the code will blow up because
the environment is not loaded yet.

To avoid this issue we added a new method that simply loads the yaml and
does not evaluate the erb or anything in it. We then use that yaml to
create the task name. Inside the task name we can then call
`load_config` and load the real config to actually call the code
internal to the task. I admit, this is gross, but refactoring can't all
be pretty all the time and I'm working hard with `@tenderlove` to
refactor much more of this code to get to a better place re connection
management and rake tasks.

@eileencodes eileencodes merged commit 93e6b5c into rails:master Mar 26, 2018

2 checks passed

codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@eileencodes eileencodes deleted the eileencodes:part-1-add-rake-tasks-for-multi-db branch Mar 26, 2018

@huskylengcb

This comment has been minimized.

huskylengcb commented Apr 4, 2018

Can mysql and PGSQL be used together?

@RouL

This comment has been minimized.

RouL commented Apr 4, 2018

Seeng the question from @huskylengcb, I didn't check the code, but that would be perfect. Being able to use multiple databases on the same server, multiple databases on different servers and even multiple databases with different backends.
But I would guess at least, thats how it is meant anyways. :)

@eileencodes

This comment has been minimized.

Member

eileencodes commented Apr 4, 2018

We use different mysql databases across different servers at GitHub already, so it "works", it just is really hard to set up. We're getting there but it will be a long process before multi db is easy in Rails.

@schneems

This comment has been minimized.

Member

schneems commented Apr 10, 2018

Great work! Are there any preliminary docs around using multi-db with 6.0?

@eileencodes

This comment has been minimized.

Member

eileencodes commented Apr 10, 2018

Thanks @schneems 😄

I haven't started docs because I think a lot more is going to change. This revealed some issues with connection management in Rails that I'm working on refactoring. Once I get more of that work done I can start the docs. I didn't want to start them before the code was written because I think it has a lot of potential to change in the coming months.

@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Apr 25, 2018

Thanks a lot for all the work you put so far in managing multiple db setup ♥️ .

A small heads up in case this wasn't thought of; loading the database.yml without evaluating erb first will break in case the database configuration contains erb loop or conditions

test:
  primary:
    <<: *default
    database: multiple_databases_test
  animals:
    <<: *default
    <% if something %> # This will break as it's invalid YML
    database: multiple_databases_test_animals
    migrations_paths: "db/animals_migrate"
    <% end %>

production:
  <% all_shards.each do |shard| %> # This will break as it's invalid YAML
  <%= shard.name %>:
    username: ...
  <% end %>
@eileencodes

This comment has been minimized.

Member

eileencodes commented Apr 25, 2018

sigh I really think we need to limit what's possible in a database.yml. There are far too many edge cases. Do folks do this in their apps?

@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Apr 25, 2018

I really think we need to limit what's possible in a database.yml.

That would make our lives easier indeed, but not sure how we can do that since the config accepts erb tags, it should be able to handle any form of erb.

Do folks do this in their apps?

Not entirely sure how broad this is used but I have seen it few times and couple blog posts mentioning it to DRY the configuration. We also do that at Shopify in our config

@eileencodes

This comment has been minimized.

Member

eileencodes commented Apr 25, 2018

Oh wait a minute. We don't evaluate it on the first round in for_each but we evaluate it once the task is created. The only thing we use the un-eval'd yaml for is to create the namespaces tasks like :animals, but then after that we load the eval'd yaml. So I think this is fine.

@eileencodes

This comment has been minimized.

Member

eileencodes commented Apr 25, 2018

Er I see the prod db has different shards and needs the shard name for each task name. Does it actually break and throw an error or just not create the tasks?

@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Apr 25, 2018

It throws an error when trying to run the task (the yaml can't be parsed)

Psych::SyntaxError: (/app/config/database.yml): could not find expected ':' while scanning a simple key at line 387 column 3
@schneems

This comment has been minimized.

Member

schneems commented Apr 25, 2018

sigh I really think we need to limit what's possible in a database.yml

If we're re-thinking the database experience, maybe it makes sense to do a non-yaml based config? Maybe config objects help eliminate some of these ambiguities?

@tenderlove

This comment has been minimized.

Member

tenderlove commented Apr 25, 2018

I think we could eval the YAML before creating the rake tasks. The problem is that some people add conditionals based on the RAILS_ENV in the YAML file. Since we're trying to generate rake tasks independent of the RAILS_ENV, we can't set the RAILS_ENV, so their conditionals could be wrong.

IMO we should just run the YAML through ERB, then through the YAML parser, and if any exceptions happen along the way, we emit a warning and don't create the tasks. Does that sound reasonable?

@tenderlove

This comment has been minimized.

Member

tenderlove commented Apr 25, 2018

It occurs to me that we could perform the eval but warn or raise an exception if people try to access ENV in the YAML:

require 'erb'

ENV["RAILS_ENV"] = "omg"

template = ERB.new(DATA.read)

require 'delegate'

class Foo
  class EnvDelegate < SimpleDelegator
    def [] key
      if key == "RAILS_ENV"
        warn "do not access RAILS_ENV"
        super
      else
        super
      end
    end
  end

  ENV = EnvDelegate.new(::ENV)

  def eval_template template
    template.result binding
  end
end

Foo.new.eval_template(template)

__END__
development:
  <% if ENV["RAILS_ENV"] == "omg" %>
  - wow
  <% else %>
  - neat
  <% end %>

Output:

[aaron@TC git]$ ruby testing.rb
do not access RAILS_ENV
@Edouard-chin

This comment has been minimized.

Contributor

Edouard-chin commented Apr 25, 2018

Ah yes ok I understand the problem about the RAILS_ENV, thanks for the snippet.
The problem would happen for any condition tough, not only when checking the RAILS_ENV, right?

Are we specifically checking the RAILS_ENV because it might be one of the most common use case?

development:
  <% if something %>
  - wow
  <% else %>
  - neat
  <% end %>
@tenderlove

This comment has been minimized.

Member

tenderlove commented Apr 25, 2018

@Edouard-chin ya. As long as the conditional doesn't depend on RAILS_ENV, then it should be fine to eval the ERB. I think we only need to protect against code that changes depending on the RAILS_ENV. In your example, as long as the something method call does not depend on the RAILS_ENV, then it's output should not change regardless of when or where we eval the ERB.

We don't know what the RAILS_ENV should be when defining the Rake tasks. However, as long as the ERB code does the same thing regardless of RAILS_ENV, then it should be OK to execute.

@phildionne phildionne referenced this pull request Apr 26, 2018

Closed

Engine support #12

@koic koic referenced this pull request May 22, 2018

Merged

Add new `Rails/BulkChangeTable` cop #5881

7 of 8 tasks complete

eileencodes added a commit to eileencodes/rails that referenced this pull request Aug 29, 2018

Drop load_database_yaml and fix test
We originally did the whole `load_database_yaml` thing because this test
wasn't cooperating and we needed to finish the namespaced rake tasks for
multiple databases.

However, it turns out that YAML can't eval ERB if you don't tell it it's
ERB so you get Pysch parse errors if you're using multi-line ERB or
ERB with conditionals. It's a hot mess.

After trying a few things and thinking it over we decided that it wasn't
worth bandaiding over, the test needed to be improved. The test was
added in rails#31135 to test that the env is loaded in these tasks. But it
was blowing up because we were trying to read a database name out of the
configuration - however that's not the purpose of this change. We want
to read environment files in the rake tasks, but not in the config
file.

In this PR we changed the test to test what the PR was actually fixing.
We've also deleted the `load_database_yaml` because it caused more
problems than it was worth. This should fix the issues described in
rails#32274 (comment). We
also had these problems at GitHub.

Co-authored-by: alimi <aibrahim2k2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment