Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger replica recovery restarts by master when primary relocation completes #23926

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Apr 5, 2017

When a primary relocation completes while there are ongoing replica recoveries, the recoveries for these replicas need to be restarted (as a new primary is in charge of replicating changes). Before this PR, the need for a recovery restart was detected by the data nodes that had the replicas, by checking on each cluster state update if the recovery process had completed before the recovery source changed. That code had a race, however, which could lead to a not-fully recovered shard exposing itself as started (see #23904).

This PR takes a different approach: When the primary relocation completes and the master updates the cluster state to move the primary shard from relocating to started, it will reinitialize all initializing replica shards, by giving them a fresh allocation id. Data nodes that have the replica shard will simply detect that the allocation id changed and restart the recovery process (instead of trying to determine the need to restart based on ongoing recoveries).

Note: Removal of the code in IndicesClusterStateService that checks whether the recovery source has changed will not be backported to the 5.x branch. This ensures backward compatibility for the situation where the master node is older and does not have the code changes that have been introduced in this PR.

Closes #23904

@ywelsch ywelsch added :Allocation :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement v5.4.0 v6.0.0-alpha1 labels Apr 5, 2017
@ywelsch ywelsch requested a review from bleskes April 5, 2017 17:46
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Left a bunch of nits and questions. Nothing major there

@@ -69,6 +69,11 @@
*/
void replicaPromoted(ShardRouting replicaShard);

/**
* Called when an initializing replica is reinitialized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a little note as to when this happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I've pushed f53cded

@@ -120,6 +125,11 @@ public void startedPrimaryReinitialized(ShardRouting startedPrimaryShard, ShardR
public void replicaPromoted(ShardRouting replicaShard) {

}

@Override
public void initializedReplicaReinitialized(ShardRouting initializingReplica, ShardRouting reinitializedReplica) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering (unrelated to this change) - why not use default implementations on the interface?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the parameter naming can be very confusing - can we add java docs? maybe call it oldReplica & reinitializedReplica?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. There is an implementation of the interface (RoutingNodesChangedObserver) where I want to be 100% sure that every method that is added to the interface is properly implemented in that class. With default methods, it's easy to miss this.
  2. I've renamed the method parameter.

public void initializedReplicaReinitialized(ShardRouting initializingReplica, ShardRouting reinitializedReplica) {
assert initializingReplica.initializing() && initializingReplica.primary() == false :
"expected initializing replica shard " + initializingReplica;
assert reinitializedReplica.initializing() && reinitializedReplica.primary() == false :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add an assertion that the allocation id is changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I've pushed f53cded

@@ -416,6 +416,9 @@ public void updateRoutingEntry(ShardRouting newRouting) throws IOException {
// active primaries.
throw new IndexShardRelocatedException(shardId(), "Shard is marked as relocated, cannot safely move to state " + newRouting.state());
}
assert newRouting.active() == false || state == IndexShardState.STARTED || state == IndexShardState.RELOCATED ||
Copy link
Contributor

@bleskes bleskes Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make sure I don't miss something: this PR doesn't make this assertion pass, it was relevant before as well, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it was relevant before, and actually broken: If we had had this assertion, it would have triggered when we had the test failure here: #23904.

@@ -416,6 +416,9 @@ public void updateRoutingEntry(ShardRouting newRouting) throws IOException {
// active primaries.
throw new IndexShardRelocatedException(shardId(), "Shard is marked as relocated, cannot safely move to state " + newRouting.state());
}
assert newRouting.active() == false || state == IndexShardState.STARTED || state == IndexShardState.RELOCATED ||
state == IndexShardState.CLOSED :
"shard state is " + state + ", but routing is active " + newRouting;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate? What's not correct?
What I want to check is newRouting.active() ==> state == XYZ, which is rewritten as newRouting.active() == false || state == XYZ

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. got it. I got confusing. I would prefer a message like "routing is active, but local shard state isn't. routing [] , local state [ ]". If you prefer yours I'm got with keeping as is.


logger.info("--> building initial cluster state");
AllocationId primaryId = AllocationId.newRelocation(AllocationId.newInitializing());
AllocationId replicaId = AllocationId.newRelocation(AllocationId.newInitializing());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be relocating per se? should we use the relocatingReplica boolean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I've pushed f53cded

assertNotEquals(replica.allocationId().getId(), startedReplica.allocationId().getId());
}

logger.info("--> test starting of relocating primary shard together with initializing / relocating replica");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Yannick.

@@ -416,6 +416,9 @@ public void updateRoutingEntry(ShardRouting newRouting) throws IOException {
// active primaries.
throw new IndexShardRelocatedException(shardId(), "Shard is marked as relocated, cannot safely move to state " + newRouting.state());
}
assert newRouting.active() == false || state == IndexShardState.STARTED || state == IndexShardState.RELOCATED ||
state == IndexShardState.CLOSED :
"shard state is " + state + ", but routing is active " + newRouting;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. got it. I got confusing. I would prefer a message like "routing is active, but local shard state isn't. routing [] , local state [ ]". If you prefer yours I'm got with keeping as is.

@ywelsch ywelsch merged commit 88a54f1 into elastic:master Apr 11, 2017
ywelsch added a commit that referenced this pull request Apr 11, 2017
…ompletes (#23926)

When a primary relocation completes while there are ongoing replica recoveries, the recoveries for these replicas need to be restarted (as a new primary is in charge of replicating changes). Before this commit, the need for a recovery restart was detected by the data nodes that had the replicas, by checking on each cluster state update if the recovery process had completed before the recovery source changed. That code had a race, however, which could lead to a not-fully recovered shard exposing itself as started (see #23904).

This commit takes a different approach: When the primary relocation completes and the master updates the cluster state to move the primary shard from relocating to started, it will reinitialize all initializing replica shards, by giving them a fresh allocation id. Data nodes that have the replica shard will simply detect that the allocation id changed and restart the recovery process (instead of trying to determine the need to restart based on ongoing recoveries).

Note: Removal of the code in IndicesClusterStateService that checks whether the recovery source has changed will not be backported to the 5.x branch. This ensures backward compatibility for the situation where the master node is older and does not have the code changes that have been introduced in this PR.

Closes #23904
@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement v5.4.0 v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants