Use retention lease in peer recovery of closed indices #48430

dnhatn · 2019-10-23T21:28:40Z

Today we do not use retention leases in peer recovery for closed indices because we can't sync retention leases on closed indices. This change allows that ability and adjusts peer recovery to use retention leases for all indices with soft-deletes enabled.

Relates #45136

elasticmachine · 2019-10-23T21:28:42Z

Pinging @elastic/es-distributed (:Distributed/Distributed)

DaveCTurner

Thanks for picking this up Nhat, looks good. Does this mean we can remove the checks for indexSettings().getIndexMetaData().getState() == IndexMetaData.State.OPEN throughout ReplicationTracker and simplify useRetentionLeases to shard.indexSettings().isSoftDeleteEnabled() in RecoverySourceHandler?

tlrx · 2019-10-24T07:43:47Z

Does this mean we can remove the checks for indexSettings().getIndexMetaData().getState() == IndexMetaData.State.OPEN

~~I don't think those checks can be removed: on 7.6 and 8.x we still need to differentiate open indices + replicated closed indices vs. non-replicated closed indices.~~

Edit: Non-replicated closed indices are not instanciated at all and thus have no IndexShard or ReplicationTracker, thanks David for pointing this out.

MetaDataIndexStateService.isIndexVerifiedBeforeClosed can be used to check if the closed index is supposed to be replicated.

dnhatn · 2019-10-24T13:26:02Z

Does this mean we can remove the checks for indexSettings().getIndexMetaData().getState() == IndexMetaData.State.OPEN throughout ReplicationTracker and simplify useRetentionLeases to shard.indexSettings().isSoftDeleteEnabled() in RecoverySourceHandler?

Yes, that is my plan.

DaveCTurner · 2019-10-24T19:54:40Z

When you say that's your plan, do you mean to do it in a follow-up or in this PR?

dnhatn · 2019-10-24T20:23:13Z

@DaveCTurner In a follow-up. I can make both changes in this PR if you prefer.

DaveCTurner · 2019-10-24T20:55:15Z

I would prefer the assertions to be adjusted here, yes, since this PR is strengthening those very invariants.

DaveCTurner

Unfortunately I am seeing occasional failures on this PR branch. For instance, this test fails sometimes:

$ ./gradlew :server:integTest --tests "org.elasticsearch.cluster.ClusterHealthIT.testHealthWithClosedIndices" -Dtests.iters=100 -Dtests.failfast=true
...
  2> java.lang.AssertionError:
    Expected: <YELLOW>
         but: was <RED>
        at __randomizedtesting.SeedInfo.seed([2032DCC3D4D6AA6D:92F6D1A1B2C791EF]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.junit.Assert.assertThat(Assert.java:923)
        at org.elasticsearch.cluster.ClusterHealthIT.testHealthWithClosedIndices(ClusterHealthIT.java:165)

It fails when it closes the index while there is an ongoing recovery that has just sent a retention lease sync. The mechanism is a bit tricky but here's what I think is happening. Prior to this change this sync would fail during the reroute phase with an IndexClosedException thrown from the IndexNameExpressionResolver since it wasn't considering closed indices, but with this change we now resolve this index correctly and wait for a minute for the primary to become active:

  1> [2019-10-31T05:04:53,924][WARN ][o.e.i.c.IndicesClusterStateService] [node_s1] [index-1][0] retention lease sync failed
  1> org.elasticsearch.action.UnavailableShardsException: [index-1][0] primary shard is not active Timeout: [1m], request: [RetentionLeaseSyncAction.Request{retentionLeases=RetentionLeases{primaryTerm=1, version=2, leases={peer_recovery/-PtC6-JyTJqmI3p3OeA-5g=RetentionLease{id='peer_recovery/-PtC6-JyTJqmI3p3OeA-5g', retainingSequenceNumber=0, timestamp=1572523433841, source='peer recovery'}, peer_recovery/UW1qUPngQKusN4ZjmnqKCA=RetentionLease{id='peer_recovery/UW1qUPngQKusN4ZjmnqKCA', retainingSequenceNumber=0, timestamp=1572523433841, source='peer recovery'}}}, shardId=[index-1][0], timeout=1m, index='index-1', waitForActiveShards=0}]
  1>    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryBecauseUnavailable(TransportReplicationAction.java:846) [main/:?]
  1>    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryIfUnavailable(TransportReplicationAction.java:725) [main/:?]
  1>    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:677) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onTimeout(TransportReplicationAction.java:806) [main/:?]
  1>    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:325) [main/:?]
  1>    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:592) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:699) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
  1>    at java.lang.Thread.run(Thread.java:835) [?:?]

The wait is futile, however, because the recovery holds the shard lock from the previous assignment of the shard which prevents us from making another assignment:

  1> [2019-11-01T00:15:46,026][WARN ][o.e.i.c.IndicesClusterStateService] [node_s0] [index-3][2] marking and sending shard failed due to [failed to create shard]
  1> java.io.IOException: failed to obtain in-memory shard lock
  1>    at org.elasticsearch.index.IndexService.createShard(IndexService.java:445) ~[main/:?]
  1>    at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:652) ~[main/:?]
  1>    at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:164) ~[main/:?]
  1>    at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:664) [main/:?]
  1>    at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:640) [main/:?]
  1>    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:252) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:511) [main/:?]
  1>    at java.lang.Iterable.forEach(Iterable.java:75) [?:?]
  1>    at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:508) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:699) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
  1>    at java.lang.Thread.run(Thread.java:835) [?:?]
  1> Caused by: org.elasticsearch.env.ShardLockObtainFailedException: [index-3][2]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]
  1>    at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:860) ~[main/:?]
  1>    at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:775) ~[main/:?]
  1>    at org.elasticsearch.index.IndexService.createShard(IndexService.java:365) ~[main/:?]
  1>    ... 17 more

dnhatn · 2019-11-13T18:48:47Z

@DaveCTurner Thank you for digging into the test failure. I appreciate that :). I have adjusted the RetentionLeaseSyncAction to skip the ReroutePhase. Can you please take another look?

DaveCTurner

I left a handful of questions about the bypassing of the reroute phase.

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java

server/src/main/java/org/elasticsearch/index/seqno/RetentionLeaseSyncAction.java

comments addressed

DaveCTurner

I left a question about a change to the REST client used, but assuming that change was necessary this LGTM. Great work @dnhatn.

DaveCTurner · 2019-11-21T18:03:52Z

.../plugin/ccr/qa/security/src/test/java/org/elasticsearch/xpack/ccr/FollowIndexSecurityIT.java

@@ -202,7 +202,7 @@ public void testForgetFollower() throws IOException {

            assertOK(client().performRequest(new Request("POST", "/" + forgetFollower + "/_ccr/pause_follow")));

-            try (RestClient leaderClient = buildLeaderClient(restClientSettings())) {


This change is unexpected to me. Can you explain why it's needed here?

Removing retention leases requires the admin role.

DaveCTurner · 2019-11-21T18:36:57Z

I think this will fix the test failure, although I haven't tested it:

diff --git a/test/framework/src/main/java/org/elasticsearch/cluster/coordination/DeterministicTaskQueue.java b/test/framework/src/main/java/org/elasticsearch/cluster/coordination/DeterministicTaskQueue.java
index 0837f431fff..db3818832f6 100644
--- a/test/framework/src/main/java/org/elasticsearch/cluster/coordination/DeterministicTaskQueue.java
+++ b/test/framework/src/main/java/org/elasticsearch/cluster/coordination/DeterministicTaskQueue.java
@@ -383,7 +383,7 @@ public class DeterministicTaskQueue {

             @Override
             public Runnable preserveContext(Runnable command) {
-                throw new UnsupportedOperationException();
+                return command;
             }

             @Override

dnhatn · 2019-11-21T19:53:50Z

@DaveCTurner Thank you very much for your thoughtful review. This PR should be a joint work :).

henningandersen

LGTM.

I added a single question, primarily for my understanding I think.

henningandersen · 2019-11-21T19:17:31Z

server/src/main/java/org/elasticsearch/common/util/concurrent/AbstractAsyncTask.java

@@ -91,7 +91,7 @@ public synchronized void rescheduleIfNecessary() {
            if (logger.isTraceEnabled()) {
                logger.trace("scheduling {} every {}", toString(), interval);
            }
-            cancellable = threadPool.schedule(this, interval, getThreadPool());
+            cancellable = threadPool.schedule(threadPool.preserveContext(this), interval, getThreadPool());


I am curious about why this change is necessary?

The concrete tasks all seem to be system like and thus should not really depend on the caller context. If there is some dependency, this could become problematic if a user triggers the creation of an IndexService?

ThreadPool#schedule does not itself preserve the context of the caller and instead runs the scheduled task in the default context which is not a system context.

Yeah, I knew about that. The question was more about the motivation for moving from simply setting the system context inside the task to preserving it here. I did some double checking on the callers and AFAICS it looks ok, just seemed odd to me to make this change for this specific PR. On second thought, I think it makes sense to preserve the context here, given that the AbstractAsyncTask is not specific to IndexService, but could be good to maybe add an assertion about being in system-context to IndexService?

I see. I prefer preserving an existing context here since the security implications are much clearer - security bugs lurk in areas where privileges change, so the less of that we do the better. IMO it's a bit trappy that EsThreadPoolExecutor#execute preserves the caller's context but ThreadPool#schedule does not, although this is one of the very few places where that matters right now.

It's possible we could assert that we are in system context here, but that seems an overly strong statement to make. We already have tests to assert that we're in a context with appropriate permissions which I think is enough.

dnhatn · 2019-11-24T20:11:08Z

The backport depends on #49448.

testClosedIndexNoopRecovery fails with an index created and closed before 7.4 as there is no PRRL after the cluster is upgraded.

./gradlew ':qa:rolling-upgrade:v7.2.0#upgradedClusterTest' --tests "org.elasticsearch.upgrades.RecoveryIT.testClosedIndexNoopRecovery" -Dtests.seed=E13F6FA3F56203F5 -Dtests.security.manager=true -Dtests.locale=pt-BR -Dtests.timezone=America/Panama -Dcompiler.java=12 -Druntime.java=12

Today we do not use retention leases in peer recovery for closed indices because we can't sync retention leases on closed indices. This change allows that ability and adjusts peer recovery to use retention leases for all indices with soft-deletes enabled. Relates #45136 Co-authored-by: David Turner <david.turner@elastic.co>

Relates #48430

Relates elastic#48430

Allow adding retention leases on closed indices

376160e

dnhatn added >enhancement :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v8.0.0 v7.6.0 labels Oct 23, 2019

dnhatn requested review from ywelsch, DaveCTurner and henningandersen October 23, 2019 21:28

DaveCTurner reviewed Oct 24, 2019

View reviewed changes

Merge branch 'master' into leases-on-closed-index

cc66a17

dnhatn requested a review from DaveCTurner October 24, 2019 13:26

dnhatn added 2 commits October 29, 2019 17:35

Merge branch 'master' into leases-on-closed-index

4e3a97c

Use retention lease in peer recoveries of closed indices

fafdb36

dnhatn changed the title ~~Allow syncing retention leases on closed indices~~ Use retention lease in peer recovery of closed indices Oct 30, 2019

DaveCTurner previously requested changes Oct 31, 2019

View reviewed changes

dnhatn added 3 commits November 8, 2019 12:59

Merge branch 'master' into leases-on-closed-index

f2dc04c

Merge branch 'master' into leases-on-closed-index

959e4b3

Without reroute phase

540e0e3

dnhatn requested a review from DaveCTurner November 13, 2019 18:48

dnhatn mentioned this pull request Nov 14, 2019

Flush instead of synced-flush inactive shards #49126

Merged

DaveCTurner reviewed Nov 15, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java Show resolved Hide resolved

server/src/main/java/org/elasticsearch/index/seqno/RetentionLeaseSyncAction.java Outdated Show resolved Hide resolved

dnhatn added 2 commits November 18, 2019 21:55

send child requests explicitly

6e4500f

Merge branch 'master' into leases-on-closed-index-syncer

fa7e1cb

dnhatn added 4 commits November 21, 2019 11:31

start with correct context

99ccfcb

admin

71c4b9d

better task description

b26fd0a

Merge branch 'master' into leases-on-closed-index

a0abec9

dnhatn requested a review from DaveCTurner November 21, 2019 17:58

DaveCTurner approved these changes Nov 21, 2019

View reviewed changes

return the same task for DeterministicTaskQueue

1be6964

dnhatn merged commit 7754e62 into elastic:master Nov 21, 2019

dnhatn deleted the leases-on-closed-index branch November 21, 2019 19:57

dnhatn added the backport pending label Nov 21, 2019

henningandersen reviewed Nov 22, 2019

View reviewed changes

dnhatn added a commit that referenced this pull request Dec 15, 2019

Adjust bwc for #48430

34f8390

Relates #48430

dnhatn removed the backport pending label Dec 15, 2019

SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020

Adjust bwc for elastic#48430

31906b3

Relates elastic#48430

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

DaveCTurner mentioned this pull request Mar 18, 2020

Timed out cluster state publication is applied in an empty context #53751

Closed

jasontedor mentioned this pull request Mar 20, 2020

Execute retention lease syncs under system context #53838

Merged

This was referenced Apr 1, 2020

7.7.0 meta ticket (Part 2) elastic/elasticsearch-net#4533

Closed

7.7.0 meta ticket (Part 3) elastic/elasticsearch-net#4534

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use retention lease in peer recovery of closed indices #48430

Use retention lease in peer recovery of closed indices #48430

dnhatn commented Oct 23, 2019 •

edited

Loading

elasticmachine commented Oct 23, 2019

DaveCTurner left a comment

tlrx commented Oct 24, 2019 •

edited

Loading

dnhatn commented Oct 24, 2019

DaveCTurner commented Oct 24, 2019

dnhatn commented Oct 24, 2019

DaveCTurner commented Oct 24, 2019

DaveCTurner left a comment •

edited

Loading

dnhatn commented Nov 13, 2019

DaveCTurner left a comment

DaveCTurner left a comment

DaveCTurner Nov 21, 2019

dnhatn Nov 21, 2019

DaveCTurner commented Nov 21, 2019

dnhatn commented Nov 21, 2019 •

edited

Loading

henningandersen left a comment

henningandersen Nov 21, 2019

DaveCTurner Nov 22, 2019

henningandersen Nov 25, 2019

DaveCTurner Dec 2, 2019

dnhatn commented Nov 24, 2019

		@@ -202,7 +202,7 @@ public void testForgetFollower() throws IOException {

		assertOK(client().performRequest(new Request("POST", "/" + forgetFollower + "/_ccr/pause_follow")));

		try (RestClient leaderClient = buildLeaderClient(restClientSettings())) {

Use retention lease in peer recovery of closed indices #48430

Use retention lease in peer recovery of closed indices #48430

Conversation

dnhatn commented Oct 23, 2019 • edited Loading

elasticmachine commented Oct 23, 2019

DaveCTurner left a comment

Choose a reason for hiding this comment

tlrx commented Oct 24, 2019 • edited Loading

dnhatn commented Oct 24, 2019

DaveCTurner commented Oct 24, 2019

dnhatn commented Oct 24, 2019

DaveCTurner commented Oct 24, 2019

DaveCTurner left a comment • edited Loading

Choose a reason for hiding this comment

dnhatn commented Nov 13, 2019

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner Nov 21, 2019

Choose a reason for hiding this comment

dnhatn Nov 21, 2019

Choose a reason for hiding this comment

DaveCTurner commented Nov 21, 2019

dnhatn commented Nov 21, 2019 • edited Loading

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen Nov 21, 2019

Choose a reason for hiding this comment

DaveCTurner Nov 22, 2019

Choose a reason for hiding this comment

henningandersen Nov 25, 2019

Choose a reason for hiding this comment

DaveCTurner Dec 2, 2019

Choose a reason for hiding this comment

dnhatn commented Nov 24, 2019

dnhatn commented Oct 23, 2019 •

edited

Loading

tlrx commented Oct 24, 2019 •

edited

Loading

DaveCTurner left a comment •

edited

Loading

dnhatn commented Nov 21, 2019 •

edited

Loading