-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trigger replica recovery restarts by master when primary relocation completes #23926
Trigger replica recovery restarts by master when primary relocation completes #23926
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Left a bunch of nits and questions. Nothing major there
@@ -69,6 +69,11 @@ | |||
*/ | |||
void replicaPromoted(ShardRouting replicaShard); | |||
|
|||
/** | |||
* Called when an initializing replica is reinitialized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a little note as to when this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I've pushed f53cded
@@ -120,6 +125,11 @@ public void startedPrimaryReinitialized(ShardRouting startedPrimaryShard, ShardR | |||
public void replicaPromoted(ShardRouting replicaShard) { | |||
|
|||
} | |||
|
|||
@Override | |||
public void initializedReplicaReinitialized(ShardRouting initializingReplica, ShardRouting reinitializedReplica) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering (unrelated to this change) - why not use default implementations on the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also the parameter naming can be very confusing - can we add java docs? maybe call it oldReplica
& reinitializedReplica
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- There is an implementation of the interface (RoutingNodesChangedObserver) where I want to be 100% sure that every method that is added to the interface is properly implemented in that class. With default methods, it's easy to miss this.
- I've renamed the method parameter.
public void initializedReplicaReinitialized(ShardRouting initializingReplica, ShardRouting reinitializedReplica) { | ||
assert initializingReplica.initializing() && initializingReplica.primary() == false : | ||
"expected initializing replica shard " + initializingReplica; | ||
assert reinitializedReplica.initializing() && reinitializedReplica.primary() == false : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add an assertion that the allocation id is changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I've pushed f53cded
@@ -416,6 +416,9 @@ public void updateRoutingEntry(ShardRouting newRouting) throws IOException { | |||
// active primaries. | |||
throw new IndexShardRelocatedException(shardId(), "Shard is marked as relocated, cannot safely move to state " + newRouting.state()); | |||
} | |||
assert newRouting.active() == false || state == IndexShardState.STARTED || state == IndexShardState.RELOCATED || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make sure I don't miss something: this PR doesn't make this assertion pass, it was relevant before as well, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it was relevant before, and actually broken: If we had had this assertion, it would have triggered when we had the test failure here: #23904.
@@ -416,6 +416,9 @@ public void updateRoutingEntry(ShardRouting newRouting) throws IOException { | |||
// active primaries. | |||
throw new IndexShardRelocatedException(shardId(), "Shard is marked as relocated, cannot safely move to state " + newRouting.state()); | |||
} | |||
assert newRouting.active() == false || state == IndexShardState.STARTED || state == IndexShardState.RELOCATED || | |||
state == IndexShardState.CLOSED : | |||
"shard state is " + state + ", but routing is active " + newRouting; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate? What's not correct?
What I want to check is newRouting.active() ==> state == XYZ
, which is rewritten as newRouting.active() == false || state == XYZ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. got it. I got confusing. I would prefer a message like "routing is active, but local shard state isn't. routing [] , local state [ ]". If you prefer yours I'm got with keeping as is.
|
||
logger.info("--> building initial cluster state"); | ||
AllocationId primaryId = AllocationId.newRelocation(AllocationId.newInitializing()); | ||
AllocationId replicaId = AllocationId.newRelocation(AllocationId.newInitializing()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be relocating per se? should we use the relocatingReplica boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I've pushed f53cded
assertNotEquals(replica.allocationId().getId(), startedReplica.allocationId().getId()); | ||
} | ||
|
||
logger.info("--> test starting of relocating primary shard together with initializing / relocating replica"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks Yannick.
@@ -416,6 +416,9 @@ public void updateRoutingEntry(ShardRouting newRouting) throws IOException { | |||
// active primaries. | |||
throw new IndexShardRelocatedException(shardId(), "Shard is marked as relocated, cannot safely move to state " + newRouting.state()); | |||
} | |||
assert newRouting.active() == false || state == IndexShardState.STARTED || state == IndexShardState.RELOCATED || | |||
state == IndexShardState.CLOSED : | |||
"shard state is " + state + ", but routing is active " + newRouting; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. got it. I got confusing. I would prefer a message like "routing is active, but local shard state isn't. routing [] , local state [ ]". If you prefer yours I'm got with keeping as is.
…ompletes (#23926) When a primary relocation completes while there are ongoing replica recoveries, the recoveries for these replicas need to be restarted (as a new primary is in charge of replicating changes). Before this commit, the need for a recovery restart was detected by the data nodes that had the replicas, by checking on each cluster state update if the recovery process had completed before the recovery source changed. That code had a race, however, which could lead to a not-fully recovered shard exposing itself as started (see #23904). This commit takes a different approach: When the primary relocation completes and the master updates the cluster state to move the primary shard from relocating to started, it will reinitialize all initializing replica shards, by giving them a fresh allocation id. Data nodes that have the replica shard will simply detect that the allocation id changed and restart the recovery process (instead of trying to determine the need to restart based on ongoing recoveries). Note: Removal of the code in IndicesClusterStateService that checks whether the recovery source has changed will not be backported to the 5.x branch. This ensures backward compatibility for the situation where the master node is older and does not have the code changes that have been introduced in this PR. Closes #23904
When a primary relocation completes while there are ongoing replica recoveries, the recoveries for these replicas need to be restarted (as a new primary is in charge of replicating changes). Before this PR, the need for a recovery restart was detected by the data nodes that had the replicas, by checking on each cluster state update if the recovery process had completed before the recovery source changed. That code had a race, however, which could lead to a not-fully recovered shard exposing itself as started (see #23904).
This PR takes a different approach: When the primary relocation completes and the master updates the cluster state to move the primary shard from relocating to started, it will reinitialize all initializing replica shards, by giving them a fresh allocation id. Data nodes that have the replica shard will simply detect that the allocation id changed and restart the recovery process (instead of trying to determine the need to restart based on ongoing recoveries).
Note: Removal of the code in
IndicesClusterStateService
that checks whether the recovery source has changed will not be backported to the 5.x branch. This ensures backward compatibility for the situation where the master node is older and does not have the code changes that have been introduced in this PR.Closes #23904