Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an intermediary called RoleManager to manage connections #37622

Merged
merged 2 commits into from Nov 7, 2019

Conversation

@eileencodes
Copy link
Member

eileencodes commented Nov 1, 2019

This commit message is really long 馃槄 but it's got a lot of info in it. We felt it was important to explain what we were trying to solve and why we prefer this approach.


This PR is an alternate solution to #37388. While there are benefits
to merging #37388 it changes the public API and swaps around existing
concepts for how connection management works. The changes are
backwards-incompatible and pretty major. This will have a negative impact
on gems and applications relying on how conn management currently works.

Background:

Shopify and other applications need sharding but Rails has
made it impossible to do this because a handler can only hold one
connection pool per class. Sharded apps need to hold multiple
connections per handler per class.

This PR aims to solve only that problem.

What this PR does:

In this PR we've added a RoleManager class that can hold multiple
Roles. Each Role holds the db_config,
connection_specification_name, schema_cache and pool. By default
the RoleManager holds a single reference from a default key to the
Role instance. A sharded/multi-tenant app can pass an optional second
argument to remove_connection, retrieve_connection_pool,
establish_connection and connected? on the handler, thus allowing
for multiple connections belonging to the same class/handler without
breaking backwards compatibility.

By using the RoleManager we can avoid altering the public API, moving
around handler/role concepts, and achieve the internal needs for
establishing multiple connections per handler per class.

A note about why we opened this PR:

We very much appreciate the work that went into #37388 and in no way mean
to diminish that work. However, it breaks the following public APIs:

  • #retrieve_connection, #connected?, and #remove_connection are
    public methods on handler and can't be changed from taking a spec to a
    role.
  • The knowledge that the handler keys are symbols relating to a role
    (:writing/:reading) is public - changing how handlers are accessed
    will break apps/libraries.

In addition it doesn't solve the problem of mapping a single connection
to a single class since it has a 1:1 mapping of class (handler) -> role (writing) -> db_config. Multiple pools in a writing role can't exist
in that implementation.

The new PR solves this by using the RoleManager to hold multiple connection
objects for the same class. This lets a handler hold a role manager
which can hold as many roles for that writer as the app needs.

Regarding the Role name:

When I originally designed the API for multiple databases, it wasn't
accidental that handler and role are the same concept. Handler is the
internal concept (since that's what was there already) and Role was the
public external concept. Meaning, role and handler were meant to
be the same thing. The concept here means that when you switch a
handler/role, Rails automatically can pick up the connection on the
other role by knowing the specification name. Changing this would mean not
just that we need to rework how GitHub and many many gems work, but also
means retraining users of Rails 6.0 that all these concepts changed.

Since this PR doesn't move around the concepts in connection
management and instead creates an intermediary between handler and
role to manage the connection data (db_config, schema_cache,
pool, and connection_specification) we think that Role and
RoleManager are the wrong name.

We didn't change it yet in this PR because we wanted to keep change
churn low for initial review. We also haven't come up with a better
name yet. 馃槃

What this PR does not solve:

Our PR here solves a small portion of the problem - it allows models to
have multiple connections on a class. It doesn't aim to solve any other
problems than that. Going forward we'll need to still solve the
following problems:

  • DatabaseConfig doesn't support a sharding configuration
  • connects_to/connected_to still needs a way to switch connections for shards
  • Automatic switching of shards
  • connection_specification_name still exists

The End

Thanks for reading this far. These problems aren't easy to solve. John
and I spent a lot of time trying different things and so I hope that
this doesn't come across as if we think we know better. I would have
commented on the other PR what changes to make but we needed to try out
different solutions in order to get here.

Ultimately we're aiming to change as little as the API as possible. Even
if the handler/role -> manager -> db_config/pool/etc isn't how we'd
design connection management if we could start over, we also don't want
to break public APIs. It's important that we make things better while
maintaining compatibility.

The RoleManager class makes it possible for us to fix the underlying
problem while maintaining all the backwards compatibility in the public
API.

We all have the same goal; to add sharding support to Rails. Let me know
your thoughts on this change in lieu of #37388 and if you have questions.

@rails-bot rails-bot bot added the activerecord label Nov 1, 2019
@eileencodes eileencodes force-pushed the eileencodes:multi-role-per-class branch 2 times, most recently from ada74d8 to 302aab0 Nov 1, 2019
@eileencodes

This comment has been minimized.

Copy link
Member Author

eileencodes commented Nov 1, 2019

@eileencodes eileencodes force-pushed the eileencodes:multi-role-per-class branch from 302aab0 to 364a0d7 Nov 2, 2019
@casperisfine

This comment has been minimized.

Copy link

casperisfine commented Nov 4, 2019

So if I understand correctly, that new indirection would be for holding the shards?

e.g.:

>> ActiveRecord::Base.connection_handlers[:writing].instance_variable_get(:@owner_to_role_manager)["primary"].get_role(:shard_42)

Is that what you have in mind?

@eileencodes

This comment has been minimized.

Copy link
Member Author

eileencodes commented Nov 4, 2019

Yup, you can get the role that way, but once this is merged we can expose a public API on connected_to/connects_to that won't remove the connection like the database key and make it possible for multiple connections for the same handler/role.

>> ActiveRecord::Base.connection_handlers[:writing].instance_variable_get(:@owner_to_role_manager)["primary"].get_role(:shard_42)

This makes it really clear how much we need to rename Role now that it won't be an actual role. And since it no longer returns a connection pool I realized that connection_pool_names are also not a good name going forward.

If you're ok with this direction @casperisfine I'll come up with a new name. I didn't want to rename it before opening the PR because I thought it would be more difficult to see the direction we want to go in.

Like I said previously this will fix the underlying issue with sharding while avoiding drastic changes to connection management.

@casperisfine

This comment has been minimized.

Copy link

casperisfine commented Nov 4, 2019

If you're ok with this direction

TBH I don't find it elegant at all because it makes the hierarchy even more backwards than it already is (role (writing) -> class (handler) -> shard (role manager) whereas logically the hierarchy is class (handler) -> shard (role manager) -> role (writing) ).

But I don't think it's possible to do without breaking remove_connection and friends, so the direction you're going with is probably the only pragmatic way forward for the 6.1 milestone.

So long story short: 馃憤

On another note, something from #37388 that we'll have to reproduce anyway. Right now Base.current_role is global, which might already be a problem if you don't have matching roles on all your DBs. I mention this because for sharding we'll need a similar "current_shard".

@casperisfine

This comment has been minimized.

Copy link

casperisfine commented Nov 4, 2019

Also cc @matthewd

This PR is an alternate solution to #37388. While there are benefits
to merging #37388 it changes the public API and swaps around existing
concepts for how connection management works. The changes are
backwards-incompatible and pretty major. This will have a negative impact
on gems and applications relying on how conn management currently works.

**Background:**

Shopify and other applications need sharding but Rails has
made it impossible to do this because a handler can only hold one
connection pool per class. Sharded apps need to hold multiple
connections per handler per class.

This PR aims to solve only that problem.

**What this PR does:**

In this PR we've added a `RoleManager` class that can hold multiple
`Roles`. Each `Role` holds the `db_config`,
`connection_specification_name`, `schema_cache` and `pool`. By default
the `RoleManager` holds a single reference from a `default` key to the
`Role` instance. A sharded/multi-tenant app can pass an optional second
argument to `remove_connection`, `retrieve_connection_pool`,
`establish_connection` and `connected?` on the handler, thus allowing
for multiple connections belonging to the same class/handler without
breaking backwards compatibility.

By using the `RoleManager` we can avoid altering the public API, moving
around handler/role concepts, and achieve the internal needs for
establishing multiple connections per handler per class.

**A note about why we opened this PR:**

We very much appreciate the work that went into #37388 and in no way mean
to diminish that work. However, it breaks the following public APIs:

* `#retrieve_connection`, `#connected?`, and `#remove_connection` are
public methods on handler and can't be changed from taking a spec to a
role.
* The knowledge that the handler keys are symbols relating to a role
(`:writing`/`:reading`) is public - changing how handlers are accessed
will break apps/libraries.

In addition it doesn't solve the problem of mapping a single connection
to a single class since it has a 1:1 mapping of `class (handler) -> role
(writing) -> db_config`. Multiple pools in a writing role can't exist
in that implementation.

The new PR solves this by using the `RoleManager` to hold multiple connection
objects for the same class. This lets a handler hold a role manager
which can hold as many roles for that writer as the app needs.

**Regarding the `Role` name:**

When I originally designed the API for multiple databases, it wasn't
accidental that handler and role are the same concept. Handler is the
internal concept (since that's what was there already) and Role was the
public external concept. Meaning, role and handler were meant to
be the same thing. The concept here means that when you switch a
handler/role, Rails automatically can pick up the connection on the
other role by knowing the specification name. Changing this would mean not
just that we need to rework how GitHub and many many gems work, but also
means retraining users of Rails 6.0 that all these concepts changed.

Since this PR doesn't move around the concepts in connection
management and instead creates an intermediary between `handler` and
`role` to manage the connection data (`db_config`, `schema_cache`,
`pool`, and `connection_specification`) we think that `Role` and
`RoleManager` are the wrong name.

We didn't change it yet in this PR because we wanted to keep change
churn low for initial review. We also haven't come up with a better
name yet. 馃槃

**What this PR does not solve:**

Our PR here solves a small portion of the problem - it allows models to
have multiple connections on a class. It doesn't aim to solve any other
problems than that. Going forward we'll need to still solve the
following problems:

* `DatabaseConfig` doesn't support a sharding configuration
* `connects_to`/`connected_to` still needs a way to switch connections for shards
* Automatic switching of shards
* `connection_specification_name` still exists

**The End**

Thanks for reading this far. These problems aren't easy to solve. John
and I spent a lot of time trying different things and so I hope that
this doesn't come across as if we think we know better. I would have
commented on the other PR what changes to make but we needed to try out
different solutions in order to get here.

Ultimately we're aiming to change as little as the API as possible. Even
if the handler/role -> manager -> db_config/pool/etc isn't how we'd
design connection management if we could start over, we also don't want
to break public APIs. It's important that we make things better while
maintaining compatibility.

The `RoleManager` class makes it possible for us to fix the underlying
problem while maintaining all the backwards compatibility in the public
API.

We all have the same goal; to add sharding support to Rails. Let me know
your thoughts on this change in lieu of #37388 and if you have questions.

Co-authored-by: John Crepezzi <seejohnrun@github.com>
@eileencodes eileencodes force-pushed the eileencodes:multi-role-per-class branch from 364a0d7 to 3a46c3a Nov 5, 2019
@eileencodes

This comment has been minimized.

Copy link
Member Author

eileencodes commented Nov 5, 2019

I added a new commit that changes RoleManager -> PoolManager and Role -> PoolConfig. There's more work to be done to clean up the concepts but I think the new names are going in the right direction.


TBH I don't find it elegant at all because it makes the hierarchy even more backwards than it already is (role (writing) -> class (handler) -> shard (role manager) whereas logically the hierarchy is class (handler) -> shard (role manager) -> role (writing) ).

I've been working with connection management in Rails for over a year. It's basically all I do these days (in Rails and in GitHub). 馃槃

IMO the current system does make sense and is logical because it's easy to follow. This isn't more backwards, it's just a different way of doing it than you would prefer (and that's ok!).

I don't see a future in which we can rewrite connection management without breaking apps - and I also don't agree the concepts are backwards. We can incrementally solve the real problems with conn management without rewriting it.

It's important to me that we don't break apps/libraries existing behavior. For a long time apps have been rolling their own multiple-databases and we should work within the existing system rather than rewrite it and risk breaking those apps.

@eileencodes eileencodes added this to the 6.1.0 milestone Nov 5, 2019
@casperisfine

This comment has been minimized.

Copy link

casperisfine commented Nov 6, 2019

IMO the current system does make sense and is logical because it's easy to follow. This isn't more backwards, it's just a different way of doing it than you would prefer (and that's ok!).

No it's not just a matter of personal preference or habit, this has actual repercussions on how that datastructure is accessed.

When I say backwards I don't mean it in its pejorative sense, but in the sense of it being "upside down".

Like if you'd store a Post(has_many: Comment) as {comment_id => post_id}, well, it works, but now you want to know if a post_id exists, you have a O(n) algo when you could have a O(1).

Same here, if I want to grab the :reading for "AnimalsBase" and I don't find it, is it because "AnimalsBase" doesn't have a :reading role, or because "AnimalsBase" doesn't have any database at all?

But enough beating of a dead horse, I already agreed this was the only way forward for 6.1.

This commit renames `RoleManager` -> `PoolManager` and `Role` ->
`PoolConfig`.

Once we introduced the previous commit, and looking at the existing
code, it's clearer that `Role` and `RoleManager` are not the right names
for these.

Since this PR moves away from swapping the connection handler concepts
around and the role concept will continue existing on the handler level,
we need to rename this.

A `PoolConfig` holds a `connection_specification_name` (we may rename
this down the road), a `db_config`, a `schema_cache`, and a `pool`. It
does feel like `pool` could eventually hold all of these things instead
of having a `PoolConfig` object. This would remove one level of the
object graph and reduce complexity. For now I'm leaving this object to
keep the change churn low and will revisit later.

Co-authored-by: John Crepezzi <seejohnrun@github.com>
@eileencodes eileencodes force-pushed the eileencodes:multi-role-per-class branch from 3a46c3a to 6d81eab Nov 7, 2019
@eileencodes eileencodes merged commit 74688af into rails:master Nov 7, 2019
2 checks passed
2 checks passed
build
Details
buildkite/rails Build #64792 passed (46 minutes, 55 seconds)
Details
@eileencodes eileencodes deleted the eileencodes:multi-role-per-class branch Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can鈥檛 perform that action at this time.