Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delayed allocation can miss a reroute #14445

Closed
ywelsch opened this issue Nov 2, 2015 · 1 comment

Comments

Projects
None yet
4 participants
@ywelsch
Copy link
Contributor

commented Nov 2, 2015

The issue I focus on is described in #14010, #14011. Reproduced it using an integration test and I think I found a program flow leading to this issue.

Assume cluster where setting for delayed allocation is 1 minute. Now a node goes down. Shard becomes unassigned. In RoutingService.clusterChanged() (master node), nextDelaySetting is 1 minute (= smallest delayed allocation setting). The variable registeredNextDelaySetting is initially set to Long.Max_VALUE, hence we schedule a new reroute in a minute. The variable registeredNextDelaySetting is set to a minute. After half a minute, a second node goes down. A shard of the second node becomes unassigned. In this case, registeredNextDelaySetting is still 1 minute, and nextDelaySetting is also 1 minute. So nothing happens in RoutingService.clusterChanged(). The previous scheduled reroute gets executed at some point. This goes as follows: First, the variable registeredNextDelaySetting is again set to Long.MAX_VALUE and then we submit a cluster update task to do the reroute. Lets assume that routingResult.changed() yields true (but only because it is set to true in ReplicaShardAllocator due to the second shard still being delayed). Now, after the reroute is successfully applied, InternalClusterService calls back to RoutingService.clusterChanged(). Here we check if we were the reason for the cluster change event. If yes (and that's the case), we do not schedule a delayed allocation until the next cluster change event. This means that if no new cluster change event happens, the check never gets reevaluated for the remaining delayed shards.

@dakrone

This comment has been minimized.

Copy link
Member

commented Nov 2, 2015

Let me see if I understand this:

  • 00:00 Nodes goes down, 1 shard unassigned (shard A)
  • 00:00 Routing service sees a 1 minute delay in assigning the shard and schedules a reroute in 1 minute
  • 00:30 Another node goes down, 1 additional shard unassigned (shard B)
  • 00:30 Routing service sees a 1 minute configured delay, but does nothing
  • 01:00 Scheduled reroute is executed, shard A is assigned (B still unassigned because it has 30 seconds of "delay" left)
  • 01:05 New cluster state from the reroute, but ignored by RoutingService because it came from routing service

we do not schedule a delayed allocation until the next cluster change event. This means that if no new cluster change event happens, the check never gets reevaluated for the remaining delayed shards.

However, assuming shard A is actually assigned there will be a new cluster change event (a shard started event), so that's why this doesn't look like it occurs very frequently.

I think perhaps we can remove the

        if (event.source().startsWith(CLUSTER_UPDATE_TASK_SOURCE)) {
            // that's us, ignore this event
            return;
        }

From the cluster changed event, what do you think?

ywelsch added a commit to ywelsch/elasticsearch that referenced this issue Nov 12, 2015

Fix missing reroute in case of multiple delayed shards
After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards.

Closes elastic#14494
Closes elastic#14010
Closes elastic#14445

ywelsch added a commit that referenced this issue Nov 12, 2015

Fix missing reroute in case of multiple delayed shards
After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards.

Closes #14494
Closes #14010
Closes #14445

ywelsch added a commit that referenced this issue Nov 12, 2015

Fix missing reroute in case of multiple delayed shards
After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards.

Closes #14494
Closes #14010
Closes #14445

ywelsch added a commit that referenced this issue Nov 12, 2015

Fix missing reroute in case of multiple delayed shards
After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards.

Closes #14494
Closes #14010
Closes #14445

ywelsch added a commit that referenced this issue Nov 12, 2015

Fix missing reroute in case of multiple delayed shards
After a delayed reroute of a shard, RoutingService misses to schedule a new delayed reroute of other delayed shards.

Closes #14494
Closes #14010
Closes #14445
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.