Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use retention lease in peer recovery of closed indices #48430

Merged
merged 16 commits into from
Nov 21, 2019

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Oct 23, 2019

Today we do not use retention leases in peer recovery for closed indices because we can't sync retention leases on closed indices. This change allows that ability and adjusts peer recovery to use retention leases for all indices with soft-deletes enabled.

Relates #45136

@dnhatn dnhatn added >enhancement :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v8.0.0 v7.6.0 labels Oct 23, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Distributed)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up Nhat, looks good. Does this mean we can remove the checks for indexSettings().getIndexMetaData().getState() == IndexMetaData.State.OPEN throughout ReplicationTracker and simplify useRetentionLeases to shard.indexSettings().isSoftDeleteEnabled() in RecoverySourceHandler?

@tlrx
Copy link
Member

tlrx commented Oct 24, 2019

Does this mean we can remove the checks for indexSettings().getIndexMetaData().getState() == IndexMetaData.State.OPEN

I don't think those checks can be removed: on 7.6 and 8.x we still need to differentiate open indices + replicated closed indices vs. non-replicated closed indices.

Edit: Non-replicated closed indices are not instanciated at all and thus have no IndexShard or ReplicationTracker, thanks David for pointing this out.

MetaDataIndexStateService.isIndexVerifiedBeforeClosed can be used to check if the closed index is supposed to be replicated.

@dnhatn
Copy link
Member Author

dnhatn commented Oct 24, 2019

Does this mean we can remove the checks for indexSettings().getIndexMetaData().getState() == IndexMetaData.State.OPEN throughout ReplicationTracker and simplify useRetentionLeases to shard.indexSettings().isSoftDeleteEnabled() in RecoverySourceHandler?

Yes, that is my plan.

@DaveCTurner
Copy link
Contributor

When you say that's your plan, do you mean to do it in a follow-up or in this PR?

@dnhatn
Copy link
Member Author

dnhatn commented Oct 24, 2019

@DaveCTurner In a follow-up. I can make both changes in this PR if you prefer.

@DaveCTurner
Copy link
Contributor

I would prefer the assertions to be adjusted here, yes, since this PR is strengthening those very invariants.

@dnhatn dnhatn changed the title Allow syncing retention leases on closed indices Use retention lease in peer recovery of closed indices Oct 30, 2019
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I am seeing occasional failures on this PR branch. For instance, this test fails sometimes:

$ ./gradlew :server:integTest --tests "org.elasticsearch.cluster.ClusterHealthIT.testHealthWithClosedIndices" -Dtests.iters=100 -Dtests.failfast=true
...
  2> java.lang.AssertionError:
    Expected: <YELLOW>
         but: was <RED>
        at __randomizedtesting.SeedInfo.seed([2032DCC3D4D6AA6D:92F6D1A1B2C791EF]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.junit.Assert.assertThat(Assert.java:923)
        at org.elasticsearch.cluster.ClusterHealthIT.testHealthWithClosedIndices(ClusterHealthIT.java:165)

It fails when it closes the index while there is an ongoing recovery that has just sent a retention lease sync. The mechanism is a bit tricky but here's what I think is happening. Prior to this change this sync would fail during the reroute phase with an IndexClosedException thrown from the IndexNameExpressionResolver since it wasn't considering closed indices, but with this change we now resolve this index correctly and wait for a minute for the primary to become active:

  1> [2019-10-31T05:04:53,924][WARN ][o.e.i.c.IndicesClusterStateService] [node_s1] [index-1][0] retention lease sync failed
  1> org.elasticsearch.action.UnavailableShardsException: [index-1][0] primary shard is not active Timeout: [1m], request: [RetentionLeaseSyncAction.Request{retentionLeases=RetentionLeases{primaryTerm=1, version=2, leases={peer_recovery/-PtC6-JyTJqmI3p3OeA-5g=RetentionLease{id='peer_recovery/-PtC6-JyTJqmI3p3OeA-5g', retainingSequenceNumber=0, timestamp=1572523433841, source='peer recovery'}, peer_recovery/UW1qUPngQKusN4ZjmnqKCA=RetentionLease{id='peer_recovery/UW1qUPngQKusN4ZjmnqKCA', retainingSequenceNumber=0, timestamp=1572523433841, source='peer recovery'}}}, shardId=[index-1][0], timeout=1m, index='index-1', waitForActiveShards=0}]
  1>    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryBecauseUnavailable(TransportReplicationAction.java:846) [main/:?]
  1>    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryIfUnavailable(TransportReplicationAction.java:725) [main/:?]
  1>    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:677) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onTimeout(TransportReplicationAction.java:806) [main/:?]
  1>    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:325) [main/:?]
  1>    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:592) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:699) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
  1>    at java.lang.Thread.run(Thread.java:835) [?:?]

The wait is futile, however, because the recovery holds the shard lock from the previous assignment of the shard which prevents us from making another assignment:

  1> [2019-11-01T00:15:46,026][WARN ][o.e.i.c.IndicesClusterStateService] [node_s0] [index-3][2] marking and sending shard failed due to [failed to create shard]
  1> java.io.IOException: failed to obtain in-memory shard lock
  1>    at org.elasticsearch.index.IndexService.createShard(IndexService.java:445) ~[main/:?]
  1>    at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:652) ~[main/:?]
  1>    at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:164) ~[main/:?]
  1>    at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:664) [main/:?]
  1>    at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:640) [main/:?]
  1>    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:252) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:511) [main/:?]
  1>    at java.lang.Iterable.forEach(Iterable.java:75) [?:?]
  1>    at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:508) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:699) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
  1>    at java.lang.Thread.run(Thread.java:835) [?:?]
  1> Caused by: org.elasticsearch.env.ShardLockObtainFailedException: [index-3][2]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]
  1>    at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:860) ~[main/:?]
  1>    at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:775) ~[main/:?]
  1>    at org.elasticsearch.index.IndexService.createShard(IndexService.java:365) ~[main/:?]
  1>    ... 17 more

@dnhatn
Copy link
Member Author

dnhatn commented Nov 13, 2019

@DaveCTurner Thank you for digging into the test failure. I appreciate that :). I have adjusted the RetentionLeaseSyncAction to skip the ReroutePhase. Can you please take another look?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a handful of questions about the bypassing of the reroute phase.

@DaveCTurner DaveCTurner dismissed their stale review November 21, 2019 15:43

comments addressed

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question about a change to the REST client used, but assuming that change was necessary this LGTM. Great work @dnhatn.

@@ -202,7 +202,7 @@ public void testForgetFollower() throws IOException {

assertOK(client().performRequest(new Request("POST", "/" + forgetFollower + "/_ccr/pause_follow")));

try (RestClient leaderClient = buildLeaderClient(restClientSettings())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is unexpected to me. Can you explain why it's needed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing retention leases requires the admin role.

@DaveCTurner
Copy link
Contributor

I think this will fix the test failure, although I haven't tested it:

diff --git a/test/framework/src/main/java/org/elasticsearch/cluster/coordination/DeterministicTaskQueue.java b/test/framework/src/main/java/org/elasticsearch/cluster/coordination/DeterministicTaskQueue.java
index 0837f431fff..db3818832f6 100644
--- a/test/framework/src/main/java/org/elasticsearch/cluster/coordination/DeterministicTaskQueue.java
+++ b/test/framework/src/main/java/org/elasticsearch/cluster/coordination/DeterministicTaskQueue.java
@@ -383,7 +383,7 @@ public class DeterministicTaskQueue {

             @Override
             public Runnable preserveContext(Runnable command) {
-                throw new UnsupportedOperationException();
+                return command;
             }

             @Override

@dnhatn
Copy link
Member Author

dnhatn commented Nov 21, 2019

@DaveCTurner Thank you very much for your thoughtful review. This PR should be a joint work :).

@dnhatn dnhatn merged commit 7754e62 into elastic:master Nov 21, 2019
@dnhatn dnhatn deleted the leases-on-closed-index branch November 21, 2019 19:57
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I added a single question, primarily for my understanding I think.

@@ -91,7 +91,7 @@ public synchronized void rescheduleIfNecessary() {
if (logger.isTraceEnabled()) {
logger.trace("scheduling {} every {}", toString(), interval);
}
cancellable = threadPool.schedule(this, interval, getThreadPool());
cancellable = threadPool.schedule(threadPool.preserveContext(this), interval, getThreadPool());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious about why this change is necessary?

The concrete tasks all seem to be system like and thus should not really depend on the caller context. If there is some dependency, this could become problematic if a user triggers the creation of an IndexService?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ThreadPool#schedule does not itself preserve the context of the caller and instead runs the scheduled task in the default context which is not a system context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I knew about that. The question was more about the motivation for moving from simply setting the system context inside the task to preserving it here. I did some double checking on the callers and AFAICS it looks ok, just seemed odd to me to make this change for this specific PR. On second thought, I think it makes sense to preserve the context here, given that the AbstractAsyncTask is not specific to IndexService, but could be good to maybe add an assertion about being in system-context to IndexService?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I prefer preserving an existing context here since the security implications are much clearer - security bugs lurk in areas where privileges change, so the less of that we do the better. IMO it's a bit trappy that EsThreadPoolExecutor#execute preserves the caller's context but ThreadPool#schedule does not, although this is one of the very few places where that matters right now.

It's possible we could assert that we are in system context here, but that seems an overly strong statement to make. We already have tests to assert that we're in a context with appropriate permissions which I think is enough.

@dnhatn
Copy link
Member Author

dnhatn commented Nov 24, 2019

The backport depends on #49448.

testClosedIndexNoopRecovery fails with an index created and closed before 7.4 as there is no PRRL after the cluster is upgraded.

./gradlew ':qa:rolling-upgrade:v7.2.0#upgradedClusterTest' --tests "org.elasticsearch.upgrades.RecoveryIT.testClosedIndexNoopRecovery" -Dtests.seed=E13F6FA3F56203F5 -Dtests.security.manager=true -Dtests.locale=pt-BR -Dtests.timezone=America/Panama -Dcompiler.java=12 -Druntime.java=12

dnhatn added a commit that referenced this pull request Dec 15, 2019
Today we do not use retention leases in peer recovery for closed indices
because we can't sync retention leases on closed indices. This change
allows that ability and adjusts peer recovery to use retention leases
for all indices with soft-deletes enabled.

Relates #45136

Co-authored-by: David Turner <david.turner@elastic.co>
dnhatn added a commit that referenced this pull request Dec 15, 2019
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement v7.6.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants