Delayed allocation can miss a reroute #14445

ywelsch · 2015-11-02T17:17:18Z

The issue I focus on is described in #14010, #14011. Reproduced it using an integration test and I think I found a program flow leading to this issue.

Assume cluster where setting for delayed allocation is 1 minute. Now a node goes down. Shard becomes unassigned. In RoutingService.clusterChanged() (master node), nextDelaySetting is 1 minute (= smallest delayed allocation setting). The variable registeredNextDelaySetting is initially set to Long.Max_VALUE, hence we schedule a new reroute in a minute. The variable registeredNextDelaySetting is set to a minute. After half a minute, a second node goes down. A shard of the second node becomes unassigned. In this case, registeredNextDelaySetting is still 1 minute, and nextDelaySetting is also 1 minute. So nothing happens in RoutingService.clusterChanged(). The previous scheduled reroute gets executed at some point. This goes as follows: First, the variable registeredNextDelaySetting is again set to Long.MAX_VALUE and then we submit a cluster update task to do the reroute. Lets assume that routingResult.changed() yields true (but only because it is set to true in ReplicaShardAllocator due to the second shard still being delayed). Now, after the reroute is successfully applied, InternalClusterService calls back to RoutingService.clusterChanged(). Here we check if we were the reason for the cluster change event. If yes (and that's the case), we do not schedule a delayed allocation until the next cluster change event. This means that if no new cluster change event happens, the check never gets reevaluated for the remaining delayed shards.

The text was updated successfully, but these errors were encountered:

dakrone · 2015-11-02T17:34:17Z

Let me see if I understand this:

00:00 Nodes goes down, 1 shard unassigned (shard A)
00:00 Routing service sees a 1 minute delay in assigning the shard and schedules a reroute in 1 minute
00:30 Another node goes down, 1 additional shard unassigned (shard B)
00:30 Routing service sees a 1 minute configured delay, but does nothing
01:00 Scheduled reroute is executed, shard A is assigned (B still unassigned because it has 30 seconds of "delay" left)
01:05 New cluster state from the reroute, but ignored by RoutingService because it came from routing service

we do not schedule a delayed allocation until the next cluster change event. This means that if no new cluster change event happens, the check never gets reevaluated for the remaining delayed shards.

However, assuming shard A is actually assigned there will be a new cluster change event (a shard started event), so that's why this doesn't look like it occurs very frequently.

I think perhaps we can remove the

        if (event.source().startsWith(CLUSTER_UPDATE_TASK_SOURCE)) {
            // that's us, ignore this event
            return;
        }

From the cluster changed event, what do you think?

After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards. Closes elastic#14494 Closes elastic#14010 Closes elastic#14445

After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards. Closes #14494 Closes #14010 Closes #14445

ywelsch mentioned this issue Nov 3, 2015

Delayed allocation can miss a reroute #14494

Merged

clintongormley added >bug :Allocation labels Nov 8, 2015

jolynch mentioned this issue Nov 9, 2015

1.7.X Reroutes seem really fragile #14631

Closed

ywelsch closed this as completed in #14494 Nov 12, 2015

ywelsch pushed a commit that referenced this issue Nov 12, 2015

Fix missing reroute in case of multiple delayed shards

f821d50

After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards. Closes #14494 Closes #14010 Closes #14445

ywelsch pushed a commit that referenced this issue Nov 12, 2015

Fix missing reroute in case of multiple delayed shards

90d990b

After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards. Closes #14494 Closes #14010 Closes #14445

ywelsch pushed a commit that referenced this issue Nov 12, 2015

Fix missing reroute in case of multiple delayed shards

5bf2505

After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards. Closes #14494 Closes #14010 Closes #14445

ywelsch pushed a commit that referenced this issue Nov 12, 2015

Fix missing reroute in case of multiple delayed shards

d342200

After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards. Closes #14494 Closes #14010 Closes #14445

lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018

clintongormley added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delayed allocation can miss a reroute #14445

Delayed allocation can miss a reroute #14445

ywelsch commented Nov 2, 2015

dakrone commented Nov 2, 2015

Delayed allocation can miss a reroute #14445

Delayed allocation can miss a reroute #14445

Comments

ywelsch commented Nov 2, 2015

dakrone commented Nov 2, 2015