Skip to content

Commit

Permalink
Fix Overly Optimistic Request Deduplication
Browse files Browse the repository at this point in the history
On master failover we have to resent all the shard failed messages,
but the transport requests remain the same in the eyes of `equals`.
If the master failover is registered and the requests to the new master
are sent before all the callbacks have executed and the request to the
old master removed from the deduplicator then the requuests to the new
master will incorrectly fail and the snapshot get stuck.

Closes elastic#51253
  • Loading branch information
original-brownbear committed Jan 21, 2020
1 parent 4e0e6e8 commit a2c7a31
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -358,6 +358,9 @@ private void syncShardStatsOnNewMaster(ClusterChangedEvent event) {
return;
}

// Clear request deduplicator since we need to send all requests that were potentially not handled by the previous
// master again
remoteFailedRequestDeduplicator.clear();
for (SnapshotsInProgress.Entry snapshot : snapshotsInProgress.entries()) {
if (snapshot.state() == State.STARTED || snapshot.state() == State.ABORTED) {
Map<ShardId, IndexShardSnapshotStatus> localShards = currentSnapshotShards(snapshot.snapshot());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,14 @@ public void executeOnce(T request, ActionListener<Void> listener, BiConsumer<T,
}
}

/**
* Remove all tracked requests from this instance so that the first time {@link #executeOnce} is invoked with any request it triggers
* an actual request execution. Use this e.g. for requests to master that need to be sent again on master failover.
*/
public void clear() {
requests.clear();
}

public int size() {
return requests.size();
}
Expand Down

0 comments on commit a2c7a31

Please sign in to comment.