Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to rebalance shards #2981

Closed
timmaxw opened this issue Aug 28, 2014 · 20 comments
Closed

Add a way to rebalance shards #2981

timmaxw opened this issue Aug 28, 2014 · 20 comments
Assignees
Milestone

Comments

@timmaxw
Copy link
Member

timmaxw commented Aug 28, 2014

Currently in reql_admin, shards are rebalanced whenever the number of shards increases, but not otherwise. We should add a way to rebalance shards. One option would be a rebalance=True argument to reconfigure(), which forces a rebalance even if the number of shards has decreased or not changed. Another option is an explicit rebalance() command.

@timmaxw timmaxw added this to the reql-admin milestone Aug 28, 2014
@mlucy
Copy link
Member

mlucy commented Aug 28, 2014

The optarg sounds slightly better to me.

@coffeemug
Copy link
Contributor

I think we should consider not adding an optarg or a command. The UX here is pretty confusing because the implications aren't clear. What exactly does it mean to rebalance vs not rebalance? What does it mean to drop the number of shards and not rebalance? Where does the data on the dropped shard go? As a user, when should I call this command and when shouldn't I (or alternatively, set the flag to true)? The whole thing is surprisingly confusing.

Here are a few options:

  • Always rebalance when the user calls reconfigure()/writes to table_config no matter what.
  • Always rebalance when the number of shards changes.

I think either of these two options is fine (I'd have to think a bit about which one is better). The whole idea of rebalancing will go away when we move to consistent hashing, so I think we shouldn't introduce the option now if at all possible.

@coffeemug
Copy link
Contributor

Also note, we might want to rename reconfigure to rebalance because you're "rebalancing" the data. This sounds much clearer to me (though I'd have to think about it).

@timmaxw
Copy link
Member Author

timmaxw commented Aug 29, 2014

It needs to be possible for the user to call reconfigure() even when the table is not available for reads. But we can't calculate a new distribution when the table is not available. If the number of shards is the same or decreased, then computing the new shard boundaries from the old shard boundaries is a good fallback. But this doesn't work well when the number of shard is increased.

Changing the shard boundaries is an expensive operation that causes loss of availability. However, calling the current implementation of reconfigure() without changing shard boundaries is in many cases a very cheap operation. For example, if you call reconfigure() after losing a single machine and the table has enough replicas, I think you will get availability back almost immediately. So I'm reluctant to force the user to change the shard boundaries unnecessarily.

@Tryneus
Copy link
Member

Tryneus commented Oct 22, 2014

@coffeemug: We need to make a decision on this.

Also, this might be related to the question of how to pick split points when the user tries to shard an empty table.

@danielmewes
Copy link
Member

I think we should conceptually distinguish between reconfigure and rebalance. The former should generally replace the configuration of the table completely, without caring about the current configuration too much (apart for maybe picking servers such that backfilling and therefore loss of availability is minimized, see below).
rebalance in my opinion should be a separate command. It should not change any server assignments, but only shift the boundaries between the existing shards to make them balanced.

Another confusing thing here is that reconfigure sometimes does cause loss of availability (like rebalance would), but generally makes an attempt to not do so if it's avoidable.
I think we should make this difference explicit.

I propose we add an opt arg maintain_availability to reconfigure.
Possible values are null, "outdated_read", "read", "write".
If that opt arg is null, I think reconfigure should imply rebalancing shards. Generally it should not care about maintaining availability in this case.
If the user specifies an availability level through that opt arg, reconfigure would make an attempt to maintain availablity by keeping a necessary number of replicas where they are. If it cannot derive a configuration that fulfills the given constraints, it should fail.
The definition of these availability levels would have to depend on the ack configuration.

I think this would give the best user experience, but it might be too complicated to implement?

@danielmewes
Copy link
Member

For the sake of keeping things simple in terms of implementation costs,
@coffeemug's proposal

Always rebalance when the number of shards changes.

in combination with a separate rebalance term (and no new opt args) sounds like a good compromise.

@coffeemug
Copy link
Contributor

I've thought about it a lot, and I'm not sure what to do here. Let's talk about it in person next week when @timmaxw gets back and settle on a good-enough solution for v1.

@deontologician
Copy link
Contributor

Is there any corresponding webui component for this issue?

@coffeemug
Copy link
Contributor

There isn't one now (we used to merge reconfigure and rebalance into a single action), but we should consider adding one.

@timmaxw
Copy link
Member Author

timmaxw commented Oct 29, 2014

After offline discussion, we decided to use a distribution query to calculate new shard points whenever the number of shards changes. The number of shards could change either by the user calling reconfigure() or writing to rethinkdb.table_config. If the distribution query fails, we give the user an error; since we use outdated reads, this can only happen if there are no replicas available for a shard.

In addition, we'll have a rebalance command. I propose the following syntax: r.table("foo").rebalance(). The return value is { rebalanced: 1 }. It can also be called on a database to rebalance all the tables in the database.

@coffeemug
Copy link
Contributor

Would rebalance block and return after everything is done?

@timmaxw
Copy link
Member Author

timmaxw commented Oct 29, 2014

I'd say no. Maybe it should be {rebalancing: 1}.

@coffeemug
Copy link
Contributor

I'd consider returning the table status.

@danielmewes
Copy link
Member

I think having rebalance wait until it's done is a bad idea.
Most clients will have timed out by then, and users will think that the rebalance has failed.

Also users might get confused over what happens if they close the connection in the middle. Is the rebalance interrupted? Is it reversed? Neither will be the case I think, but I think many people will instinctively leave the connection open while it is running at all cost. That in turn comes with the problem that they will have to open another connection for continuing whatever administration job they were doing if the given rebalance wasn't the last thing on their list.

@danielmewes
Copy link
Member

I think returning the status in a way similar to reconfigure would be good. The same considerations as here #3223 would apply, except that we are only interested in the status and not in the config.

I propose we make rebalance return

{
  old_status: {...}
  new_status: {...}
}

new_status will typically have all shards unavailable, but I think that is fine for the same reasons for which it is for reconfigure.

@timmaxw
Copy link
Member Author

timmaxw commented Nov 12, 2014

Do we also need r.db(...).rebalance()? @coffeemug

@coffeemug
Copy link
Contributor

I'd put that in polish. If we can get to it in time -- great. If not, I don't think it's a showstopper.

@timmaxw
Copy link
Member Author

timmaxw commented Nov 12, 2014

OK, branch tim/rebalance-2981 now has rebalancing on both database and tables. It takes no arguments. On a table it returns { old_status: { ... }, new_status: { ... } }. On a database it returns an array of those.

It's in CR 2303.

@timmaxw timmaxw self-assigned this Nov 12, 2014
@timmaxw
Copy link
Member Author

timmaxw commented Nov 13, 2014

Merged into reql_admin as of be97e5b.

@danielmewes danielmewes modified the milestones: 1.16, reql-admin Jan 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants