Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndicesClusterStateService should clean local started when re-assigns an initializing shard with the same aid #20687

Merged

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Sep 28, 2016

When a node get disconnected from the cluster and rejoins during a master election, it may be that the new master already has that node in it's cluster and will try to assign it shards. If the node hosts started primaries, the new shards will be initializing and will have the same allocation id as the allocation ids of the current started size. We currently do not recognize this currently. We should clean the current IndexShard instances and initialize new ones.

This also hardens test assertions in the same area.

… an initializing shard with the same aID

When a node get disconnected from the cluster and rejoins during a master election, it may be that the new master already has that node in it's cluster and will try to assign it shards. If the node hosts started primaries, the new shards will be initializing and will have the same allocation id as the allocation ids of the current started size. We currently do not recognize this currently. We should clean the current IndexShard instances and initialize new ones.

This also hardens test assertions in the same area.
@bleskes
Copy link
Contributor Author

bleskes commented Sep 28, 2016

retest this please

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left minor comments. Fix and push at will.

currentRoutingEntry, newShardRouting);
indexService.removeShard(shardId.id(), "removing shard (stale copy)");
} else if (newShardRouting.initializing() && currentRoutingEntry.active()) {
logger.debug("{} removing shard (not active, current {}, new {})", shardId, currentRoutingEntry, newShardRouting);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here that describes how this situation can occur?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

public MockIndexShard(ShardRouting shardRouting) {
this.shardRouting = shardRouting;
public MockIndexShard(ShardRouting shardRouting, long term) {
this.shardRouting = shardRouting; this.term = term;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we saving some newlines here? 🙄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it was folded by intellij into a one liner and I just added it in there. Anyway - fixed.

assertThat(this.shardId(), equalTo(shardRouting.shardId()));
assertTrue("current: " + this.shardRouting + ", got: " + shardRouting, this.shardRouting.isSameAllocation(shardRouting));
if (this.shardRouting.active()) {
assertTrue("and active shard must state active, current: " + this.shardRouting + ", got: " + shardRouting,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this reads "an active shard must stay active"... ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) can't get rid of these states.

@bleskes bleskes merged commit 7b5e651 into elastic:master Oct 3, 2016
@bleskes bleskes deleted the indices_cluster_state_stricter_checks branch October 3, 2016 15:33
bleskes added a commit that referenced this pull request Oct 3, 2016
… an initializing shard with the same aid (#20687)

When a node get disconnected from the cluster and rejoins during a master election, it may be that the new master already has that node in it's cluster and will try to assign it shards. If the node hosts started primaries, the new shards will be initializing and will have the same allocation id as the allocation ids of the current started size. We currently do not recognize this currently. We should clean the current IndexShard instances and initialize new ones.

This also hardens test assertions in the same area.
bleskes added a commit that referenced this pull request Oct 3, 2016
… an initializing shard with the same aid (#20687)

When a node get disconnected from the cluster and rejoins during a master election, it may be that the new master already has that node in it's cluster and will try to assign it shards. If the node hosts started primaries, the new shards will be initializing and will have the same allocation id as the allocation ids of the current started size. We currently do not recognize this currently. We should clean the current IndexShard instances and initialize new ones.

This also hardens test assertions in the same area.
@bleskes
Copy link
Contributor Author

bleskes commented Oct 3, 2016

thx @ywelsch

@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v5.0.0-rc1 v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants