Use peer recovery retention leases for indices without soft-deletes #50351

dnhatn · 2019-12-19T08:20:27Z

Today, the replica allocator uses peer recovery retention leases to select the best-matched copies when allocating replicas of indices with soft-deletes. We can employ this mechanism for indices without soft-deletes because the retaining sequence number of a PRRL is the persisted global checkpoint (plus one) of that copy. If the primary and replica have the same retaining sequence number, then we should be able to perform a noop recovery. The reason is that we must be retaining translog up to the local checkpoint of the safe commit, which is at most the global checkpoint of either copy). The only limitation is that we might not cancel ongoing file-based recoveries with PRRLs for noop recoveries. We can't make the translog retention policy comply with PRRLs. We also have this problem with soft-deletes if a PRRL is about to expire.

A nice side-effect of this is that we can turn off the translog retention once all shards started. However, I prefer leaving translog disconnect to PRRLs.

Relates #45136
Relates #46959

This reverts commit 781a4b825a4d429cb07aeb8b51975e6b8573a8a9.

elasticmachine · 2019-12-19T08:20:29Z

Pinging @elastic/es-distributed (:Distributed/Recovery)

ywelsch

This is looking good. I've left one question about strengthening the assertions in the tests.

ywelsch · 2019-12-19T14:39:11Z

test/framework/src/main/java/org/elasticsearch/test/rest/ESRestTestCase.java

                        continue;
                    }
+                    assertNotNull(retentionLeases);
                    for (Map<String, ?> retentionLease : retentionLeases) {
                        if (((String) retentionLease.get("id")).startsWith("peer_recovery/")) {


this does not require that there is always a peer recovery retention lease. Should we require finding such a lease, and for the right node?

++. Adjusted in 6071abd.

ywelsch

LGTM

ywelsch · 2019-12-19T18:31:55Z

qa/full-cluster-restart/src/test/java/org/elasticsearch/upgrades/FullClusterRestartIT.java

@@ -1278,7 +1278,7 @@ public void testOperationBasedRecovery() throws Exception {
                }
            }
            flush(index, true);
-            ensurePeerRecoveryRetentionLeasesRenewedAndSynced(index);
+            ensurePeerRecoveryRetentionLeasesRenewedAndSynced(index, false);


Should we set alwaysExists to true if minimumNodeVersion() is on or after 7.6 (after backport) (here as well as in other places)?

Sure, will do.

dnhatn · 2019-12-20T05:39:35Z

Thanks Yannick!

…50351) Today, the replica allocator uses peer recovery retention leases to select the best-matched copies when allocating replicas of indices with soft-deletes. We can employ this mechanism for indices without soft-deletes because the retaining sequence number of a PRRL is the persisted global checkpoint (plus one) of that copy. If the primary and replica have the same retaining sequence number, then we should be able to perform a noop recovery. The reason is that we must be retaining translog up to the local checkpoint of the safe commit, which is at most the global checkpoint of either copy). The only limitation is that we might not cancel ongoing file-based recoveries with PRRLs for noop recoveries. We can't make the translog retention policy comply with PRRLs. We also have this problem with soft-deletes if a PRRL is about to expire. Relates #45136 Relates #46959

Relates #50351

testCancelRecoveryDuringPhase1 uses a mock of IndexShard, which can't create retention leases. We need to stub method createRetentionLease. Relates #50351 Closes #50424

…0486) We forgot to establish peer recovery retention leases for relocating primaries without soft-deletes. Relates #50351

testCancelRecoveryDuringPhase1 uses a mock of IndexShard, which can't create retention leases. We need to stub method createRetentionLease. Relates #50351 Closes #50424

…0486) We forgot to establish peer recovery retention leases for relocating primaries without soft-deletes. Relates #50351

…lastic#50351) Today, the replica allocator uses peer recovery retention leases to select the best-matched copies when allocating replicas of indices with soft-deletes. We can employ this mechanism for indices without soft-deletes because the retaining sequence number of a PRRL is the persisted global checkpoint (plus one) of that copy. If the primary and replica have the same retaining sequence number, then we should be able to perform a noop recovery. The reason is that we must be retaining translog up to the local checkpoint of the safe commit, which is at most the global checkpoint of either copy). The only limitation is that we might not cancel ongoing file-based recoveries with PRRLs for noop recoveries. We can't make the translog retention policy comply with PRRLs. We also have this problem with soft-deletes if a PRRL is about to expire. Relates elastic#45136 Relates elastic#46959

Relates elastic#50351

testCancelRecoveryDuringPhase1 uses a mock of IndexShard, which can't create retention leases. We need to stub method createRetentionLease. Relates elastic#50351 Closes elastic#50424

…astic#50486) We forgot to establish peer recovery retention leases for relocating primaries without soft-deletes. Relates elastic#50351

dnhatn added 4 commits December 19, 2019 03:00

Employ prrl for indices without soft-deletes

84d9c01

policy with retaining seqno

bda80cd

Revert "policy with retaining seqno"

8d59445

This reverts commit 781a4b825a4d429cb07aeb8b51975e6b8573a8a9.

fix test

f630cdb

dnhatn added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.0.0 v7.6.0 labels Dec 19, 2019

dnhatn requested a review from ywelsch December 19, 2019 08:20

dnhatn added 3 commits December 19, 2019 03:26

adjust hasAllPeerRecoveryRetentionLeases flag

9c0d68b

Do not disable translog retention for indices without soft-deletes

877b686

remove stale test

36d2dd9

ywelsch reviewed Dec 19, 2019

View reviewed changes

dnhatn added 2 commits December 19, 2019 12:37

make sure every copy has established PRRL

6071abd

Merge branch 'master' into translog-prrl

3481242

dnhatn requested a review from ywelsch December 19, 2019 18:09

ywelsch approved these changes Dec 19, 2019

View reviewed changes

elasticmachine added 2 commits December 19, 2019 12:26

Merge branch 'master' into translog-prrl

8863dfb

Merge branch 'master' into translog-prrl

6dfaea3

dnhatn merged commit cec6678 into elastic:master Dec 20, 2019

dnhatn deleted the translog-prrl branch December 20, 2019 05:39

dnhatn added the backport pending label Dec 20, 2019

ywelsch mentioned this pull request Dec 20, 2019

RecoverySourceHandlerTests.testCancelRecoveryDuringPhase1 failing #50424

Closed

dnhatn mentioned this pull request Dec 20, 2019

Fix testCancelRecoveryDuringPhase1 #50449

Merged

dnhatn added a commit that referenced this pull request Dec 24, 2019

Adjust BWC for peer recovery retention leases (#50351)

5e0030e

Relates #50351

dnhatn removed the backport pending label Dec 24, 2019

dnhatn added a commit that referenced this pull request Dec 26, 2019

Fix testCancelRecoveryDuringPhase1 (#50449)

50bd584

testCancelRecoveryDuringPhase1 uses a mock of IndexShard, which can't create retention leases. We need to stub method createRetentionLease. Relates #50351 Closes #50424

dnhatn added a commit that referenced this pull request Dec 26, 2019

Ensure relocating shards establish peer recovery retention leases (#5…

d02afcc

…0486) We forgot to establish peer recovery retention leases for relocating primaries without soft-deletes. Relates #50351

dnhatn added a commit that referenced this pull request Dec 26, 2019

Fix testCancelRecoveryDuringPhase1 (#50449)

7713221

testCancelRecoveryDuringPhase1 uses a mock of IndexShard, which can't create retention leases. We need to stub method createRetentionLease. Relates #50351 Closes #50424

dnhatn added a commit that referenced this pull request Dec 26, 2019

Ensure relocating shards establish peer recovery retention leases (#5…

e7c15a5

…0486) We forgot to establish peer recovery retention leases for relocating primaries without soft-deletes. Relates #50351

SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020

Adjust BWC for peer recovery retention leases (elastic#50351)

96c6f30

Relates elastic#50351

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

mkleen mentioned this pull request Oct 29, 2020

Recovery Tests Backport crate/crate-qa#162

Merged

1 task

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use peer recovery retention leases for indices without soft-deletes #50351

Use peer recovery retention leases for indices without soft-deletes #50351

dnhatn commented Dec 19, 2019 •

edited

elasticmachine commented Dec 19, 2019

ywelsch left a comment

ywelsch Dec 19, 2019

dnhatn Dec 19, 2019

ywelsch left a comment

ywelsch Dec 19, 2019

dnhatn Dec 20, 2019

dnhatn commented Dec 20, 2019

Use peer recovery retention leases for indices without soft-deletes #50351

Use peer recovery retention leases for indices without soft-deletes #50351

Conversation

dnhatn commented Dec 19, 2019 • edited

elasticmachine commented Dec 19, 2019

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Dec 19, 2019

Choose a reason for hiding this comment

dnhatn Dec 19, 2019

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Dec 19, 2019

Choose a reason for hiding this comment

dnhatn Dec 20, 2019

Choose a reason for hiding this comment

dnhatn commented Dec 20, 2019

dnhatn commented Dec 19, 2019 •

edited