Adding a refresh listener to a recovering shard should be a noop #26055

bleskes · 2017-08-04T08:28:28Z

When refresh=wait_for is set on an indexing request, we register a listener on the shards that are call during the next refresh. During the recover translog phase, when the engine is open, we have a window of time when indexing operations succeed and they can add their listeners. Those listeners will only be called when the recovery finishes as we do not refresh during recoveries (unless the indexing buffer is full). Next to being a bad user experience, it can also cause deadlocks with an ongoing peer recovery that may wait for those operations to mark the replica in sync (details below).

To fix this, this PR changes refresh listeners to be a noop when the shard is not yet serving reads (implicitly covering the recovery period). It doesn't matter anyway.

There is a still a small problem I want to think how to solve - an indexing operation that came in with wait_for_refresh after the finalize recovery and before markAsDone is called will not be immediately visible when moving to POST_RECOVERY. I'm going to give it some more thought (I hope a simple refresh will do) but I think we can start reviewing on the main issue.

Deadlock with recovery:

When finalizing a peer recovery we mark the peer as "in sync". To do so we wait until the peer's local checkpoint is at least as high as the global checkpoint. If an operation with refresh=wait_for is added as a listener on that peer during recovery, it is not completed from the perspective of the primary. The primary than may wait for it to complete before advancing the local checkpoint for that peer. Since that peer is not considered in sync, the global checkpoint on the primary can be higher, causing a deadlock. Operation waits for recovery to finish and a refresh to happen. Recovery waits on the operation.

…ds (during recovery)

jasontedor

LGTM.

Can you also revert f154e53 and e1ef3d5?

jasontedor · 2017-08-04T08:50:43Z

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

@@ -848,7 +848,7 @@ public long getWritingBytes() {

    public RefreshStats refreshStats() {
        // Null refreshListeners means this shard doesn't support them so there can't be any.


Drop the comment too?

jasontedor · 2017-08-04T08:52:03Z

core/src/test/java/org/elasticsearch/index/shard/IndexShardTests.java

+            .settings(settings)
+            .primaryTerm(0, 1).build();
+        IndexShard primary = newShard(new ShardId(metaData.getIndex(), 0), true, "n1", metaData, null);
+        recoveryShardFromStore(primary);


While we're here, can you fix the name of this method to be recoverShardFromStore?

bleskes · 2017-08-04T11:45:11Z

@jasontedor thanks. Pushed another commit with a solution for the visibility issue. Can you take another look?

jasontedor

Visibility change looks good, so still LGTM.

…recovery

) When `refresh=wait_for` is set on an indexing request, we register a listener on the shards that are call during the next refresh. During the recover translog phase, when the engine is open, we have a window of time when indexing operations succeed and they can add their listeners. Those listeners will only be called when the recovery finishes as we do not refresh during recoveries (unless the indexing buffer is full). Next to being a bad user experience, it can also cause deadlocks with an ongoing peer recovery that may wait for those operations to mark the replica in sync (details below). To fix this, this PR changes refresh listeners to be a noop when the shard is not yet serving reads (implicitly covering the recovery period). It doesn't matter anyway. Deadlock with recovery: When finalizing a peer recovery we mark the peer as "in sync". To do so we wait until the peer's local checkpoint is at least as high as the global checkpoint. If an operation with `refresh=wait_for` is added as a listener on that peer during recovery, it is not completed from the perspective of the primary. The primary than may wait for it to complete before advancing the local checkpoint for that peer. Since that peer is not considered in sync, the global checkpoint on the primary can be higher, causing a deadlock. Operation waits for recovery to finish and a refresh to happen. Recovery waits on the operation.

don't register refresh listeners when the shard is not exposed to rea…

0b84121

…ds (during recovery)

bleskes added :Internal :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >bug v6.1.0 v7.0.0 labels Aug 4, 2017

bleskes requested a review from jasontedor August 4, 2017 08:28

jasontedor approved these changes Aug 4, 2017

View reviewed changes

bleskes added 2 commits August 4, 2017 13:41

add a visibility barrier from addRefreshListener to postRecovery

fbe41dd

feedback

b92e147

jasontedor approved these changes Aug 4, 2017

View reviewed changes

bleskes added 3 commits August 4, 2017 17:46

doh

496b13c

Merge remote-tracking branch 'upstream/master' into wait_for_refresh_…

bfb671c

…recovery

fix testRefreshMetric

fc6e558

bleskes force-pushed the wait_for_refresh_recovery branch from b32f9ef to fc6e558 Compare August 4, 2017 16:16

bleskes merged commit e11cbed into elastic:master Aug 4, 2017

bleskes deleted the wait_for_refresh_recovery branch August 4, 2017 17:51

ywelsch mentioned this pull request Dec 28, 2017

Don't refresh shard on activation #28013

Merged

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a refresh listener to a recovering shard should be a noop #26055

Adding a refresh listener to a recovering shard should be a noop #26055

bleskes commented Aug 4, 2017

jasontedor left a comment

jasontedor Aug 4, 2017

bleskes Aug 4, 2017

jasontedor Aug 4, 2017

bleskes Aug 4, 2017

bleskes commented Aug 4, 2017

jasontedor left a comment

		@@ -848,7 +848,7 @@ public long getWritingBytes() {

		public RefreshStats refreshStats() {
		// Null refreshListeners means this shard doesn't support them so there can't be any.

Adding a refresh listener to a recovering shard should be a noop #26055

Adding a refresh listener to a recovering shard should be a noop #26055

Conversation

bleskes commented Aug 4, 2017

jasontedor left a comment

Choose a reason for hiding this comment

jasontedor Aug 4, 2017

Choose a reason for hiding this comment

bleskes Aug 4, 2017

Choose a reason for hiding this comment

jasontedor Aug 4, 2017

Choose a reason for hiding this comment

bleskes Aug 4, 2017

Choose a reason for hiding this comment

bleskes commented Aug 4, 2017

jasontedor left a comment

Choose a reason for hiding this comment