New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndicesClusterStateService should clean local started when re-assigns an initializing shard with the same aid #20687
IndicesClusterStateService should clean local started when re-assigns an initializing shard with the same aid #20687
Conversation
… an initializing shard with the same aID When a node get disconnected from the cluster and rejoins during a master election, it may be that the new master already has that node in it's cluster and will try to assign it shards. If the node hosts started primaries, the new shards will be initializing and will have the same allocation id as the allocation ids of the current started size. We currently do not recognize this currently. We should clean the current IndexShard instances and initialize new ones. This also hardens test assertions in the same area.
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left minor comments. Fix and push at will.
currentRoutingEntry, newShardRouting); | ||
indexService.removeShard(shardId.id(), "removing shard (stale copy)"); | ||
} else if (newShardRouting.initializing() && currentRoutingEntry.active()) { | ||
logger.debug("{} removing shard (not active, current {}, new {})", shardId, currentRoutingEntry, newShardRouting); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here that describes how this situation can occur?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
public MockIndexShard(ShardRouting shardRouting) { | ||
this.shardRouting = shardRouting; | ||
public MockIndexShard(ShardRouting shardRouting, long term) { | ||
this.shardRouting = shardRouting; this.term = term; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we saving some newlines here? 🙄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it was folded by intellij into a one liner and I just added it in there. Anyway - fixed.
assertThat(this.shardId(), equalTo(shardRouting.shardId())); | ||
assertTrue("current: " + this.shardRouting + ", got: " + shardRouting, this.shardRouting.isSameAllocation(shardRouting)); | ||
if (this.shardRouting.active()) { | ||
assertTrue("and active shard must state active, current: " + this.shardRouting + ", got: " + shardRouting, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this reads "an active shard must stay active"... ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) can't get rid of these states.
…tate_stricter_checks
…rd with the same allocation id as a currently started shard.
… an initializing shard with the same aid (#20687) When a node get disconnected from the cluster and rejoins during a master election, it may be that the new master already has that node in it's cluster and will try to assign it shards. If the node hosts started primaries, the new shards will be initializing and will have the same allocation id as the allocation ids of the current started size. We currently do not recognize this currently. We should clean the current IndexShard instances and initialize new ones. This also hardens test assertions in the same area.
… an initializing shard with the same aid (#20687) When a node get disconnected from the cluster and rejoins during a master election, it may be that the new master already has that node in it's cluster and will try to assign it shards. If the node hosts started primaries, the new shards will be initializing and will have the same allocation id as the allocation ids of the current started size. We currently do not recognize this currently. We should clean the current IndexShard instances and initialize new ones. This also hardens test assertions in the same area.
thx @ywelsch |
When a node get disconnected from the cluster and rejoins during a master election, it may be that the new master already has that node in it's cluster and will try to assign it shards. If the node hosts started primaries, the new shards will be initializing and will have the same allocation id as the allocation ids of the current started size. We currently do not recognize this currently. We should clean the current
IndexShard
instances and initialize new ones.This also hardens test assertions in the same area.