-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate Heavy CCR Repository CS Requests #91398
Deduplicate Heavy CCR Repository CS Requests #91398
Conversation
We run the same request back to back for each put-follower call during the restore. Also, concurrent put-follower calls will all run the same full CS request concurrently. In older versions prior to elastic#87235 the concurrency was limited by the size of the snapshot pool. With that fix though, they are run at almost arbitry concurrency when many put-follow requests are executed concurrently. -> fixed by using the existing deduplicator to only run a single remote CS request at a time for each CCR repository. Also, this removes the needless forking in the put-follower action that is not necessary any longer now that we have the CCR repository non-blocking (we do the same for normal restores that can safely be started from a transport thread), which should fix some bad-ux situations where the snapshot threads are busy on master, making the put-follower requests not go through in time.
Pinging @elastic/es-distributed (Team:Distributed) |
Hi @original-brownbear, I've created a changelog YAML for you. |
/** | ||
* Dummy request key for deduplicating all remote cluster state requests via {@link #getRemoteStateDeduplicator}. | ||
*/ | ||
private static final Object RESULT_KEY = new Object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little awkward but I think it's good enough and I didn't want to build a whole new things for the deduplication here when the existing deduplicator works just fine for what we need here this way ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/repository/CcrRepository.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could introduce a (edge case for sure) flaw but I also think the cure is easy enough that we should do it.
// We only allow a single remote cluster state requests at a time. The callbacks to the cluster state responses run on the | ||
// transport thread and can safely assume they are fast enough so that this does not lead to seeing substantially outdated | ||
// remote states as a result of a hot loop calling this method ever. | ||
getRemoteStateDeduplicator.executeOnce( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the use of the deduplicator risks outdated info (which I see mentioned in the comment, but not sure I follow the hot-ness argument). I think of it mainly as if the remote cluster is slow to response we risk someone having an application that:
- Creates an index on the remote cluster (leader).
- Invokes put-follow on the local cluster (follower).
The put-follow request could then fail due to seeing an outdated cluster state (in case of other concurrent put-follow requests causing this)?
I think refactoring CapacityResponseCache
will do what you want here. Seems like a utility we want to have - to only do one calculation and collapse queued requests into one, which is what CapacityResponseCache
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The put-follow request could then fail due to seeing an outdated cluster state (in case of other concurrent put-follow requests causing this)?
Hmm maybe ... you're right here actually I think, let me try refactoring that thing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm CapacityResponseCache
turned out to be quite different from what we need here since it deals with a heavy but synchronous action.
I implemented a simple solution similar to what we have for deduplicating repository data in the blob store repository now that we could extract and use for e.g. stats as well like we discussed in the past. Let me know if this is ok with you.
Did some quick benchmarking with this solution and it's also way superior in performance over the previous one since it deduplicates a lot more requests (with the first call causing all subsequent ones to queue up it works out quite nicely)
@@ -192,56 +191,44 @@ private void createFollowerIndex( | |||
threadPool.getThreadContext().getHeaders(), | |||
clusterService.state() | |||
); | |||
threadPool.executor(ThreadPool.Names.SNAPSHOT).execute(new AbstractRunnable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow why this is important to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda liked cleaning this up here since it's part of the necessary follow-up fixes for the async behaviour to work neatly in a sense, but I can pull it out to a separate PR if you want?
EDIT: never mind if I do the other refactoring this gets messy, moving it out :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah Henning is right, we need to use a cluster state requested after receiving the put-follow request.
…nto fix-ccr-duplicate-cs-requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This direction looks good.
Can we add a test verifying that concurrent CcrRepository.getRepositoryData
calls only executes one call at a time on the leader and also does the batching (something like once we fired all concurrent calls, we expect only one more invocation on leader)?
response.getNodes().getMaxNodeVersion(), | ||
SnapshotState.SUCCESS | ||
); | ||
}), false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we now add assert false
to the exception block below? Seems like no exceptions should occur anymore, since getRemoteState
handles it's own exceptions. If it does, we may have double invoked the listener.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ added
uff I tried something with the EDIT: I guess we could add some unit test of sorts where we call the repo directly ... still quite a bit of work and this seems like it should be fixed rather sooner than later since it breaks larger users of CCR? I wanted to go for the same approach in other code, maybe it makes more sense to wait for that and just unit-test the new deduplicator? |
Jenkins run elasticsearch-ci/part-2 |
Could we make a less ambitious test that holds a latch in the beginning of the test, which we wait for in the leader request handling behavior for cluster/state, start X>2 |
This is exactly what I tried. It's not quite as trivial as it seems. The follower will send various requests to the leader (state just for a single index for example) so I can't simply block a transport thread via a latch because that will be unstable and lead to other requests getting blocked potentially. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks Henning! reviewed before I even had the chance to ping ❤️ :) |
💔 Backport failed
You can use sqren/backport to manually backport by running |
We run the same request back to back for each put-follower call during the restore. Also, concurrent put-follower calls will all run the same full CS request concurrently. In older versions prior to elastic#87235 the concurrency was limited by the size of the snapshot pool. With that fix though, they are run at almost arbitry concurrency when many put-follow requests are executed concurrently. -> fixed by using the existing deduplicator to only run a single remote CS request at a time for each CCR repository. Also, this removes the needless forking in the put-follower action that is not necessary any longer now that we have the CCR repository non-blocking (we do the same for normal restores that can safely be started from a transport thread), which should fix some bad-ux situations where the snapshot threads are busy on master, making the put-follower requests not go through in time.
We run the same request back to back for each put-follower call during the restore. Also, concurrent put-follower calls will all run the same full CS request concurrently. In older versions prior to #87235 the concurrency was limited by the size of the snapshot pool. With that fix though, they are run at almost arbitry concurrency when many put-follow requests are executed concurrently. -> fixed by using the existing deduplicator to only run a single remote CS request at a time for each CCR repository. Also, this removes the needless forking in the put-follower action that is not necessary any longer now that we have the CCR repository non-blocking (we do the same for normal restores that can safely be started from a transport thread), which should fix some bad-ux situations where the snapshot threads are busy on master, making the put-follower requests not go through in time.
We run the same request back to back for each put-follower call during the restore. Also, concurrent put-follower calls will all run the same full CS request concurrently.
In older versions prior to #87235 the concurrency was limited by the size of the snapshot pool. With that fix though, they are run at almost arbitry concurrency when many put-follow requests are executed concurrently.
-> fixed by using the existing deduplicator to only run a single remote CS request at a time for each CCR repository.
Also, this removes the needless forking in the put-follower action that is not necessary any longer now that we have the CCR repository non-blocking (we do the same for normal restores that can safely be started from a transport thread), which should fix some bad-ux situations where the snapshot threads are busy on master, making the put-follower requests not go through in time.