IndicesClusterStateService should clean local started when re-assigns an initializing shard with the same aid #20687

bleskes · 2016-09-28T20:13:23Z

When a node get disconnected from the cluster and rejoins during a master election, it may be that the new master already has that node in it's cluster and will try to assign it shards. If the node hosts started primaries, the new shards will be initializing and will have the same allocation id as the allocation ids of the current started size. We currently do not recognize this currently. We should clean the current IndexShard instances and initialize new ones.

This also hardens test assertions in the same area.

… an initializing shard with the same aID When a node get disconnected from the cluster and rejoins during a master election, it may be that the new master already has that node in it's cluster and will try to assign it shards. If the node hosts started primaries, the new shards will be initializing and will have the same allocation id as the allocation ids of the current started size. We currently do not recognize this currently. We should clean the current IndexShard instances and initialize new ones. This also hardens test assertions in the same area.

bleskes · 2016-09-28T21:10:13Z

retest this please

ywelsch

Left minor comments. Fix and push at will.

ywelsch · 2016-10-03T14:53:50Z

core/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

+                        currentRoutingEntry, newShardRouting);
+                    indexService.removeShard(shardId.id(), "removing shard (stale copy)");
+                } else if (newShardRouting.initializing() && currentRoutingEntry.active()) {
+                    logger.debug("{} removing shard (not active, current {}, new {})", shardId, currentRoutingEntry, newShardRouting);


can you add a comment here that describes how this situation can occur?

ywelsch · 2016-10-03T14:56:45Z

.../test/java/org/elasticsearch/indices/cluster/AbstractIndicesClusterStateServiceTestCase.java

-        public MockIndexShard(ShardRouting shardRouting) {
-            this.shardRouting = shardRouting;
+        public MockIndexShard(ShardRouting shardRouting, long term) {
+            this.shardRouting = shardRouting; this.term = term;


are we saving some newlines here? 🙄

I guess it was folded by intellij into a one liner and I just added it in there. Anyway - fixed.

ywelsch · 2016-10-03T15:00:17Z

.../test/java/org/elasticsearch/indices/cluster/AbstractIndicesClusterStateServiceTestCase.java

+            assertThat(this.shardId(), equalTo(shardRouting.shardId()));
+            assertTrue("current: " + this.shardRouting + ", got: " + shardRouting, this.shardRouting.isSameAllocation(shardRouting));
+            if (this.shardRouting.active()) {
+                assertTrue("and active shard must state active, current: " + this.shardRouting + ", got: " + shardRouting,


I guess this reads "an active shard must stay active"... ;-)

:) can't get rid of these states.

…tate_stricter_checks

…rd with the same allocation id as a currently started shard.

… an initializing shard with the same aid (#20687) When a node get disconnected from the cluster and rejoins during a master election, it may be that the new master already has that node in it's cluster and will try to assign it shards. If the node hosts started primaries, the new shards will be initializing and will have the same allocation id as the allocation ids of the current started size. We currently do not recognize this currently. We should clean the current IndexShard instances and initialize new ones. This also hardens test assertions in the same area.

bleskes · 2016-10-03T15:45:04Z

thx @ywelsch

bleskes added >bug :Allocation v6.0.0-alpha1 v5.1.1 v5.0.0 labels Sep 28, 2016

bleskes assigned ywelsch Sep 28, 2016

ywelsch approved these changes Oct 3, 2016

View reviewed changes

bleskes added 4 commits October 3, 2016 17:24

Merge remote-tracking branch 'upstream/master' into indices_cluster_s…

9080c02

…tate_stricter_checks

add a comment to clarify when a node can recieved an initializing sha…

3f9114b

…rd with the same allocation id as a currently started shard.

extra new line

db4d17f

fix typo

ae72d48

bleskes merged commit 7b5e651 into elastic:master Oct 3, 2016

bleskes deleted the indices_cluster_state_stricter_checks branch October 3, 2016 15:33

clintongormley added v5.0.0-rc1 and removed v5.0.0 labels Oct 7, 2016

clintongormley removed the v5.1.1 label Dec 8, 2016

lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndicesClusterStateService should clean local started when re-assigns an initializing shard with the same aid #20687

IndicesClusterStateService should clean local started when re-assigns an initializing shard with the same aid #20687

bleskes commented Sep 28, 2016

bleskes commented Sep 28, 2016

ywelsch left a comment

ywelsch Oct 3, 2016

bleskes Oct 3, 2016

ywelsch Oct 3, 2016

bleskes Oct 3, 2016

ywelsch Oct 3, 2016

bleskes Oct 3, 2016

bleskes commented Oct 3, 2016

IndicesClusterStateService should clean local started when re-assigns an initializing shard with the same aid #20687

IndicesClusterStateService should clean local started when re-assigns an initializing shard with the same aid #20687

Conversation

bleskes commented Sep 28, 2016

bleskes commented Sep 28, 2016

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Oct 3, 2016

Choose a reason for hiding this comment

bleskes Oct 3, 2016

Choose a reason for hiding this comment

ywelsch Oct 3, 2016

Choose a reason for hiding this comment

bleskes Oct 3, 2016

Choose a reason for hiding this comment

ywelsch Oct 3, 2016

Choose a reason for hiding this comment

bleskes Oct 3, 2016

Choose a reason for hiding this comment

bleskes commented Oct 3, 2016