Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delayed allocation can miss a reroute #14445

Closed
ywelsch opened this issue Nov 2, 2015 · 1 comment · Fixed by #14494
Closed

Delayed allocation can miss a reroute #14445

ywelsch opened this issue Nov 2, 2015 · 1 comment · Fixed by #14494
Labels
>bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes)

Comments

@ywelsch
Copy link
Contributor

ywelsch commented Nov 2, 2015

The issue I focus on is described in #14010, #14011. Reproduced it using an integration test and I think I found a program flow leading to this issue.

Assume cluster where setting for delayed allocation is 1 minute. Now a node goes down. Shard becomes unassigned. In RoutingService.clusterChanged() (master node), nextDelaySetting is 1 minute (= smallest delayed allocation setting). The variable registeredNextDelaySetting is initially set to Long.Max_VALUE, hence we schedule a new reroute in a minute. The variable registeredNextDelaySetting is set to a minute. After half a minute, a second node goes down. A shard of the second node becomes unassigned. In this case, registeredNextDelaySetting is still 1 minute, and nextDelaySetting is also 1 minute. So nothing happens in RoutingService.clusterChanged(). The previous scheduled reroute gets executed at some point. This goes as follows: First, the variable registeredNextDelaySetting is again set to Long.MAX_VALUE and then we submit a cluster update task to do the reroute. Lets assume that routingResult.changed() yields true (but only because it is set to true in ReplicaShardAllocator due to the second shard still being delayed). Now, after the reroute is successfully applied, InternalClusterService calls back to RoutingService.clusterChanged(). Here we check if we were the reason for the cluster change event. If yes (and that's the case), we do not schedule a delayed allocation until the next cluster change event. This means that if no new cluster change event happens, the check never gets reevaluated for the remaining delayed shards.

@dakrone
Copy link
Member

dakrone commented Nov 2, 2015

Let me see if I understand this:

  • 00:00 Nodes goes down, 1 shard unassigned (shard A)
  • 00:00 Routing service sees a 1 minute delay in assigning the shard and schedules a reroute in 1 minute
  • 00:30 Another node goes down, 1 additional shard unassigned (shard B)
  • 00:30 Routing service sees a 1 minute configured delay, but does nothing
  • 01:00 Scheduled reroute is executed, shard A is assigned (B still unassigned because it has 30 seconds of "delay" left)
  • 01:05 New cluster state from the reroute, but ignored by RoutingService because it came from routing service

we do not schedule a delayed allocation until the next cluster change event. This means that if no new cluster change event happens, the check never gets reevaluated for the remaining delayed shards.

However, assuming shard A is actually assigned there will be a new cluster change event (a shard started event), so that's why this doesn't look like it occurs very frequently.

I think perhaps we can remove the

        if (event.source().startsWith(CLUSTER_UPDATE_TASK_SOURCE)) {
            // that's us, ignore this event
            return;
        }

From the cluster changed event, what do you think?

ywelsch pushed a commit to ywelsch/elasticsearch that referenced this issue Nov 12, 2015
After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards.

Closes elastic#14494
Closes elastic#14010
Closes elastic#14445
ywelsch pushed a commit that referenced this issue Nov 12, 2015
After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards.

Closes #14494
Closes #14010
Closes #14445
ywelsch pushed a commit that referenced this issue Nov 12, 2015
After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards.

Closes #14494
Closes #14010
Closes #14445
ywelsch pushed a commit that referenced this issue Nov 12, 2015
After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards.

Closes #14494
Closes #14010
Closes #14445
ywelsch pushed a commit that referenced this issue Nov 12, 2015
After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards.

Closes #14494
Closes #14010
Closes #14445
@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
@clintongormley clintongormley added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants