Add a way to rebalance shards #2981

timmaxw · 2014-08-28T21:16:47Z

Currently in reql_admin, shards are rebalanced whenever the number of shards increases, but not otherwise. We should add a way to rebalance shards. One option would be a rebalance=True argument to reconfigure(), which forces a rebalance even if the number of shards has decreased or not changed. Another option is an explicit rebalance() command.

The text was updated successfully, but these errors were encountered:

mlucy · 2014-08-28T22:25:19Z

The optarg sounds slightly better to me.

coffeemug · 2014-08-29T19:05:21Z

I think we should consider not adding an optarg or a command. The UX here is pretty confusing because the implications aren't clear. What exactly does it mean to rebalance vs not rebalance? What does it mean to drop the number of shards and not rebalance? Where does the data on the dropped shard go? As a user, when should I call this command and when shouldn't I (or alternatively, set the flag to true)? The whole thing is surprisingly confusing.

Here are a few options:

Always rebalance when the user calls reconfigure()/writes to table_config no matter what.
Always rebalance when the number of shards changes.

I think either of these two options is fine (I'd have to think a bit about which one is better). The whole idea of rebalancing will go away when we move to consistent hashing, so I think we shouldn't introduce the option now if at all possible.

coffeemug · 2014-08-29T19:07:25Z

Also note, we might want to rename reconfigure to rebalance because you're "rebalancing" the data. This sounds much clearer to me (though I'd have to think about it).

timmaxw · 2014-08-29T22:39:55Z

It needs to be possible for the user to call reconfigure() even when the table is not available for reads. But we can't calculate a new distribution when the table is not available. If the number of shards is the same or decreased, then computing the new shard boundaries from the old shard boundaries is a good fallback. But this doesn't work well when the number of shard is increased.

Changing the shard boundaries is an expensive operation that causes loss of availability. However, calling the current implementation of reconfigure() without changing shard boundaries is in many cases a very cheap operation. For example, if you call reconfigure() after losing a single machine and the table has enough replicas, I think you will get availability back almost immediately. So I'm reluctant to force the user to change the shard boundaries unnecessarily.

Tryneus · 2014-10-22T20:43:53Z

@coffeemug: We need to make a decision on this.

Also, this might be related to the question of how to pick split points when the user tries to shard an empty table.

danielmewes · 2014-10-23T01:07:57Z

I think we should conceptually distinguish between reconfigure and rebalance. The former should generally replace the configuration of the table completely, without caring about the current configuration too much (apart for maybe picking servers such that backfilling and therefore loss of availability is minimized, see below).
rebalance in my opinion should be a separate command. It should not change any server assignments, but only shift the boundaries between the existing shards to make them balanced.

Another confusing thing here is that reconfigure sometimes does cause loss of availability (like rebalance would), but generally makes an attempt to not do so if it's avoidable.
I think we should make this difference explicit.

I propose we add an opt arg maintain_availability to reconfigure.
Possible values are null, "outdated_read", "read", "write".
If that opt arg is null, I think reconfigure should imply rebalancing shards. Generally it should not care about maintaining availability in this case.
If the user specifies an availability level through that opt arg, reconfigure would make an attempt to maintain availablity by keeping a necessary number of replicas where they are. If it cannot derive a configuration that fulfills the given constraints, it should fail.
The definition of these availability levels would have to depend on the ack configuration.

I think this would give the best user experience, but it might be too complicated to implement?

danielmewes · 2014-10-23T01:20:40Z

For the sake of keeping things simple in terms of implementation costs,
@coffeemug's proposal

Always rebalance when the number of shards changes.

in combination with a separate rebalance term (and no new opt args) sounds like a good compromise.

coffeemug · 2014-10-24T22:25:19Z

I've thought about it a lot, and I'm not sure what to do here. Let's talk about it in person next week when @timmaxw gets back and settle on a good-enough solution for v1.

deontologician · 2014-10-28T23:06:40Z

Is there any corresponding webui component for this issue?

coffeemug · 2014-10-28T23:22:18Z

There isn't one now (we used to merge reconfigure and rebalance into a single action), but we should consider adding one.

timmaxw · 2014-10-29T00:58:30Z

After offline discussion, we decided to use a distribution query to calculate new shard points whenever the number of shards changes. The number of shards could change either by the user calling reconfigure() or writing to rethinkdb.table_config. If the distribution query fails, we give the user an error; since we use outdated reads, this can only happen if there are no replicas available for a shard.

In addition, we'll have a rebalance command. I propose the following syntax: r.table("foo").rebalance(). The return value is { rebalanced: 1 }. It can also be called on a database to rebalance all the tables in the database.

coffeemug · 2014-10-29T04:49:32Z

Would rebalance block and return after everything is done?

timmaxw · 2014-10-29T18:08:04Z

I'd say no. Maybe it should be {rebalancing: 1}.

coffeemug · 2014-10-29T18:10:59Z

I'd consider returning the table status.

danielmewes · 2014-10-29T18:57:19Z

I think having rebalance wait until it's done is a bad idea.
Most clients will have timed out by then, and users will think that the rebalance has failed.

Also users might get confused over what happens if they close the connection in the middle. Is the rebalance interrupted? Is it reversed? Neither will be the case I think, but I think many people will instinctively leave the connection open while it is running at all cost. That in turn comes with the problem that they will have to open another connection for continuing whatever administration job they were doing if the given rebalance wasn't the last thing on their list.

danielmewes · 2014-10-29T19:00:57Z

I think returning the status in a way similar to reconfigure would be good. The same considerations as here #3223 would apply, except that we are only interested in the status and not in the config.

I propose we make rebalance return

{
  old_status: {...}
  new_status: {...}
}

new_status will typically have all shards unavailable, but I think that is fine for the same reasons for which it is for reconfigure.

timmaxw · 2014-11-12T19:51:37Z

Do we also need r.db(...).rebalance()? @coffeemug

coffeemug · 2014-11-12T19:52:48Z

I'd put that in polish. If we can get to it in time -- great. If not, I don't think it's a showstopper.

timmaxw · 2014-11-12T22:08:01Z

OK, branch tim/rebalance-2981 now has rebalancing on both database and tables. It takes no arguments. On a table it returns { old_status: { ... }, new_status: { ... } }. On a database it returns an array of those.

It's in CR 2303.

timmaxw · 2014-11-13T01:41:21Z

Merged into reql_admin as of be97e5b.

timmaxw added this to the reql-admin milestone Aug 28, 2014

timmaxw mentioned this issue Aug 28, 2014

New ReQL admin API feedback #2957

Closed

danielmewes mentioned this issue Oct 23, 2014

reconfigure should take existing positional parameters as optargs #3224

Closed

danielmewes mentioned this issue Oct 28, 2014

reconfigure should return an indication of what changed #3223

Closed

danielmewes mentioned this issue Oct 29, 2014

r.db.reconfigure should work #3225

Closed

timmaxw self-assigned this Nov 12, 2014

timmaxw closed this as completed Nov 13, 2014

danielmewes mentioned this issue Nov 20, 2014

Web UI: Change way table reconfiguration is presented #3229

Closed

danielmewes modified the milestones: reql-admin, 1.16 Jan 2, 2015

danielmewes modified the milestones: 1.16, reql-admin Jan 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a way to rebalance shards #2981

Add a way to rebalance shards #2981

timmaxw commented Aug 28, 2014

mlucy commented Aug 28, 2014

coffeemug commented Aug 29, 2014

coffeemug commented Aug 29, 2014

timmaxw commented Aug 29, 2014

Tryneus commented Oct 22, 2014

danielmewes commented Oct 23, 2014

danielmewes commented Oct 23, 2014

coffeemug commented Oct 24, 2014

deontologician commented Oct 28, 2014

coffeemug commented Oct 28, 2014

timmaxw commented Oct 29, 2014

coffeemug commented Oct 29, 2014

timmaxw commented Oct 29, 2014

coffeemug commented Oct 29, 2014

danielmewes commented Oct 29, 2014

danielmewes commented Oct 29, 2014

timmaxw commented Nov 12, 2014

coffeemug commented Nov 12, 2014

timmaxw commented Nov 12, 2014

timmaxw commented Nov 13, 2014

Add a way to rebalance shards #2981

Add a way to rebalance shards #2981

Comments

timmaxw commented Aug 28, 2014

mlucy commented Aug 28, 2014

coffeemug commented Aug 29, 2014

coffeemug commented Aug 29, 2014

timmaxw commented Aug 29, 2014

Tryneus commented Oct 22, 2014

danielmewes commented Oct 23, 2014

danielmewes commented Oct 23, 2014

coffeemug commented Oct 24, 2014

deontologician commented Oct 28, 2014

coffeemug commented Oct 28, 2014

timmaxw commented Oct 29, 2014

coffeemug commented Oct 29, 2014

timmaxw commented Oct 29, 2014

coffeemug commented Oct 29, 2014

danielmewes commented Oct 29, 2014

danielmewes commented Oct 29, 2014

timmaxw commented Nov 12, 2014

coffeemug commented Nov 12, 2014

timmaxw commented Nov 12, 2014

timmaxw commented Nov 13, 2014