Recover retention leases during peer recovery #38435

jasontedor · 2019-02-05T15:24:40Z

This commit integrates retention leases with recovery. With this change, we copy the current retention leases on primary to the replica during phase two of recovery. At this point, the replica will have been added to the replication group and so is already receiving retention lease sync requests from the primary. This means that if any retention lease syncs are triggered on the primary after we sample the retention leases here during phase two, that sync request will also arrive on the replica ensuring that the replica is from this point on up to date with the retention leases on the primary. We have to copy these during phase two since we will be applying indexing operations, potentially triggering merges, and therefore must ensure the correct retention leases are in place beforehand.

Relates #37165

This commit integrates retention leases with recovery. With this change, we copy the current retention leases on primary to the replica during phase two of recovery. At this point, the replica will have been added to the replication group and so is already receiving retention lease sync requests from the primary. This means that if any retention lease syncs are triggered on the primary after we sample the retention leases here during phase two, that sync request will also arrive on the replica ensuring that the replica is from this point on up to date with the retention leases on the primary. We have to copy these during phase two since we will be applying indexing operations, potentially triggering merges, and therefore must ensure the correct retention leases are in place beforehand.

elasticmachine · 2019-02-05T15:24:42Z

Pinging @elastic/es-distributed

jasontedor · 2019-02-05T17:20:01Z

@elasticmachine run elasticsearch-ci/default-distro

jasontedor · 2019-02-05T17:22:32Z

@elasticmachine run elasticsearch-ci/default-distro

jasontedor · 2019-02-05T17:28:54Z

@elasticmachine run elasticsearch-ci/1

dnhatn

I prefer not to sync RetentionLeases multiple times since syncing the leases once during recovery is enough. However, we can't do it without introducing a new step because we can't piggyback the leases in the prepareTranslog step nor the finalize step. Moreover, we expect to have a few leases, thus this choice makes a lot of sense to me.

dnhatn · 2019-02-05T19:16:41Z

server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTranslogOperationsRequest.java

@@ -39,18 +40,26 @@
    private int totalTranslogOps = RecoveryState.Translog.UNKNOWN;
    private long maxSeenAutoIdTimestampOnPrimary;
    private long maxSeqNoOfUpdatesOrDeletesOnPrimary;
+    private RetentionLeases retentionLeases;


I think we need to initialize the retentionLeases with an empty leases in a mixed cluster?

Great catch. I pushed 4d9fa1a.

jasontedor · 2019-02-05T19:45:43Z

I prefer not to sync RetentionLeases multiple times since syncing the leases once during recovery is enough. However, we can't do it without introducing a new step because we can't piggyback the leases in the prepareTranslog step nor the finalize step. Moreover, we expect to have a few leases, thus this choice makes a lot of sense to me.

Indeed, I went through exactly the same dilemma. We could introduce a new step but I am not sure that it's worth it. Any thoughts @ywelsch before I push go on this?

* master: (23 commits) Lift retention lease expiration to index shard (elastic#38380) Make Ccr recovery file chunk size configurable (elastic#38370) Prevent CCR recovery from missing documents (elastic#38237) re-enables awaitsfixed datemath tests (elastic#38376) Types removal fix FullClusterRestartIT warnings (elastic#38445) Make sure to reject mappings with type _doc when include_type_name is false. (elastic#38270) Updates the grok patterns to be consistent with logstash (elastic#27181) Ignore type-removal warnings in XPackRestTestHelper (elastic#38431) testHlrcFromXContent() should respect assertToXContentEquivalence() (elastic#38232) add basic REST test for geohash_grid (elastic#37996) Remove DiscoveryPlugin#getDiscoveryTypes (elastic#38414) Fix the clock resolution to millis in GetWatchResponseTests (elastic#38405) Throw AssertionError when no master (elastic#38432) `if_seq_no` and `if_primary_term` parameters aren't wired correctly in REST Client's CRUD API (elastic#38411) Enable CronEvalToolTest.testEnsureDateIsShownInRootLocale (elastic#38394) Fix failures in BulkProcessorIT#testGlobalParametersAndBulkProcessor. (elastic#38129) SQL: Implement CURRENT_DATE (elastic#38175) Mute testReadRequestsReturnLatestMappingVersion (elastic#38438) [ML] Report index unavailable instead of waiting for lazy node (elastic#38423) Update Rollup Caps to allow unknown fields (elastic#38339) ...

jasontedor · 2019-02-05T22:44:38Z

@dnhatn and @ywelsch If we want a new step we can do it in a follow-up, I am going to keep moving for now.

* master: Add an authentication cache for API keys (elastic#38469) Fix exit code in certutil packaging test (elastic#38393) Enable logs for intermittent test failure (elastic#38426) Disable BWC to backport recovering retention leases (elastic#38477) Enable bwc tests now that elastic#38443 is backported. (elastic#38462) Fix Master Failover and DataNode Leave Blocking Snapshot (elastic#38460) Recover retention leases during peer recovery (elastic#38435) Set update mappings mater node timeout to 30 min (elastic#38439) Assert job is not null in FullClusterRestartIT (elastic#38218) Update ilm-api.asciidoc, point to REMOVE policy (elastic#38235) (elastic#38463) SQL: Fix esType for DATETIME/DATE and INTERVALS (elastic#38179) Handle deprecation header-AbstractUpgradeTestCase (elastic#38396) XPack: core/ccr/Security-cli migration to java-time (elastic#38415) Disable bwc tests for elastic#38443 (elastic#38456) Bubble-up exceptions from scheduler (elastic#38317) Re-enable TasksClientDocumentationIT.testCancelTasks (elastic#38234) Allow custom authorization with an authorization engine (elastic#38358) CRUDDocumentationIT fix documentation references Remove support for internal versioning for concurrency control (elastic#38254)

jasontedor added >enhancement v7.0.0 :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v6.7.0 labels Feb 5, 2019

jasontedor requested review from martijnvg, bleskes, ywelsch and dnhatn February 5, 2019 15:24

jasontedor changed the title ~~Integrate retention leases with recovery~~ Recover retention leases during peer recovery Feb 5, 2019

jasontedor mentioned this pull request Feb 5, 2019

Shard history retention leases #37165

Closed

24 tasks

jasontedor added 2 commits February 5, 2019 10:34

Revert some formatting changes

5ffb91c

Fix comment

439c1d0

dnhatn approved these changes Feb 5, 2019

View reviewed changes

dnhatn reviewed Feb 5, 2019

View reviewed changes

jasontedor added 3 commits February 5, 2019 14:46

Initialize

4d9fa1a

Fix precommit

8612a26

jasontedor merged commit 79a45b4 into elastic:master Feb 5, 2019

jasontedor deleted the retention-leases-recovery branch February 5, 2019 22:43

jasontedor mentioned this pull request Feb 5, 2019

Disable BWC to backport recovering retention leases #38477

Merged

jasontedor mentioned this pull request Feb 6, 2019

Recover retention leases during peer recovery 6.x #38478

Merged

jasontedor mentioned this pull request Feb 6, 2019

Enable BWC after backport recovering leases #38485

Merged

jkakavas mentioned this pull request Feb 6, 2019

RetentionLeaseSyncIT.testRetentionLeasesSyncOnRecovery fails reproducibly on master and 6.x #38487

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recover retention leases during peer recovery #38435

Recover retention leases during peer recovery #38435

jasontedor commented Feb 5, 2019

elasticmachine commented Feb 5, 2019

jasontedor commented Feb 5, 2019

jasontedor commented Feb 5, 2019

jasontedor commented Feb 5, 2019

dnhatn left a comment

dnhatn Feb 5, 2019

jasontedor Feb 5, 2019

jasontedor commented Feb 5, 2019

jasontedor commented Feb 5, 2019

Recover retention leases during peer recovery #38435

Recover retention leases during peer recovery #38435

Conversation

jasontedor commented Feb 5, 2019

elasticmachine commented Feb 5, 2019

jasontedor commented Feb 5, 2019

jasontedor commented Feb 5, 2019

jasontedor commented Feb 5, 2019

dnhatn left a comment

Choose a reason for hiding this comment

dnhatn Feb 5, 2019

Choose a reason for hiding this comment

jasontedor Feb 5, 2019

Choose a reason for hiding this comment

jasontedor commented Feb 5, 2019

jasontedor commented Feb 5, 2019