Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate Heavy CCR Repository CS Requests #91398

Merged

Conversation

original-brownbear
Copy link
Member

We run the same request back to back for each put-follower call during the restore. Also, concurrent put-follower calls will all run the same full CS request concurrently.
In older versions prior to #87235 the concurrency was limited by the size of the snapshot pool. With that fix though, they are run at almost arbitry concurrency when many put-follow requests are executed concurrently.
-> fixed by using the existing deduplicator to only run a single remote CS request at a time for each CCR repository.
Also, this removes the needless forking in the put-follower action that is not necessary any longer now that we have the CCR repository non-blocking (we do the same for normal restores that can safely be started from a transport thread), which should fix some bad-ux situations where the snapshot threads are busy on master, making the put-follower requests not go through in time.

We run the same request back to back for each put-follower call during
the restore. Also, concurrent put-follower calls will all run the same
full CS request concurrently.
In older versions prior to elastic#87235
the concurrency was limited by the size of the snapshot pool. With that
fix though, they are run at almost arbitry concurrency when many
put-follow requests are executed concurrently.
-> fixed by using the existing deduplicator to only run a single remote
CS request at a time for each CCR repository.
Also, this removes the needless forking in the put-follower action that
is not necessary any longer now that we have the CCR repository
non-blocking (we do the same for normal restores that can safely be
started from a transport thread), which should fix some bad-ux
situations where the snapshot threads are busy on master, making
the put-follower requests not go through in time.
@original-brownbear original-brownbear added >bug :Distributed/CCR Issues around the Cross Cluster State Replication features v8.5.1 v8.6.0 v7.17.8 labels Nov 8, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Nov 8, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @original-brownbear, I've created a changelog YAML for you.

/**
* Dummy request key for deduplicating all remote cluster state requests via {@link #getRemoteStateDeduplicator}.
*/
private static final Object RESULT_KEY = new Object();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little awkward but I think it's good enough and I didn't want to build a whole new things for the deduplication here when the existing deduplicator works just fine for what we need here this way ...

DaveCTurner
DaveCTurner previously approved these changes Nov 8, 2022
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could introduce a (edge case for sure) flaw but I also think the cure is easy enough that we should do it.

// We only allow a single remote cluster state requests at a time. The callbacks to the cluster state responses run on the
// transport thread and can safely assume they are fast enough so that this does not lead to seeing substantially outdated
// remote states as a result of a hot loop calling this method ever.
getRemoteStateDeduplicator.executeOnce(
Copy link
Contributor

@henningandersen henningandersen Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use of the deduplicator risks outdated info (which I see mentioned in the comment, but not sure I follow the hot-ness argument). I think of it mainly as if the remote cluster is slow to response we risk someone having an application that:

  1. Creates an index on the remote cluster (leader).
  2. Invokes put-follow on the local cluster (follower).

The put-follow request could then fail due to seeing an outdated cluster state (in case of other concurrent put-follow requests causing this)?

I think refactoring CapacityResponseCache will do what you want here. Seems like a utility we want to have - to only do one calculation and collapse queued requests into one, which is what CapacityResponseCache does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The put-follow request could then fail due to seeing an outdated cluster state (in case of other concurrent put-follow requests causing this)?

Hmm maybe ... you're right here actually I think, let me try refactoring that thing :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm CapacityResponseCache turned out to be quite different from what we need here since it deals with a heavy but synchronous action.

I implemented a simple solution similar to what we have for deduplicating repository data in the blob store repository now that we could extract and use for e.g. stats as well like we discussed in the past. Let me know if this is ok with you.
Did some quick benchmarking with this solution and it's also way superior in performance over the previous one since it deduplicates a lot more requests (with the first call causing all subsequent ones to queue up it works out quite nicely)

@@ -192,56 +191,44 @@ private void createFollowerIndex(
threadPool.getThreadContext().getHeaders(),
clusterService.state()
);
threadPool.executor(ThreadPool.Names.SNAPSHOT).execute(new AbstractRunnable() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow why this is important to this PR?

Copy link
Member Author

@original-brownbear original-brownbear Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda liked cleaning this up here since it's part of the necessary follow-up fixes for the async behaviour to work neatly in a sense, but I can pull it out to a separate PR if you want?

EDIT: never mind if I do the other refactoring this gets messy, moving it out :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveCTurner DaveCTurner dismissed their stale review November 8, 2022 13:40

Ah Henning is right, we need to use a cluster state requested after receiving the put-follow request.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This direction looks good.

Can we add a test verifying that concurrent CcrRepository.getRepositoryData calls only executes one call at a time on the leader and also does the batching (something like once we fired all concurrent calls, we expect only one more invocation on leader)?

response.getNodes().getMaxNodeVersion(),
SnapshotState.SUCCESS
);
}), false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we now add assert false to the exception block below? Seems like no exceptions should occur anymore, since getRemoteState handles it's own exceptions. If it does, we may have double invoked the listener.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ added

@original-brownbear
Copy link
Member Author

original-brownbear commented Nov 10, 2022

Can we add a test verifying that concurrent CcrRepository.getRepositoryData calls only executes one call at a time on the leader and also does the batching (something like once we fired all concurrent calls, we expect only one more invocation on leader)?

uff I tried something with the MockTransport and it's not entirely trivial to get this right timing wise. We don't have a nice way of checking things from start to end of request via a listener in org.elasticsearch.test.transport.MockTransportService#addRequestHandlingBehavior. I could build out the infrastructure for this but it'll take me quite some time.

EDIT: I guess we could add some unit test of sorts where we call the repo directly ... still quite a bit of work and this seems like it should be fixed rather sooner than later since it breaks larger users of CCR?

I wanted to go for the same approach in other code, maybe it makes more sense to wait for that and just unit-test the new deduplicator?

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/part-2

@henningandersen
Copy link
Contributor

it's not entirely trivial to get this right timing wise. We don't have a nice way of checking things from start to end of request via a listener in

Could we make a less ambitious test that holds a latch in the beginning of the test, which we wait for in the leader request handling behavior for cluster/state, start X>2 getRepositoryData calls and then check that all requests are responded to (bare minimum) and if we can that there is only 2 requests in the leader cluster (from the follower)?

@original-brownbear
Copy link
Member Author

Could we make a less ambitious test that holds a latch in the beginning of the test, which we wait for in the leader request handling behavior for cluster/state, start X>2 getRepositoryData calls and then check that all requests are responded to (bare minimum) and if we can that there is only 2 requests in the leader cluster (from the follower)?

This is exactly what I tried. It's not quite as trivial as it seems. The follower will send various requests to the leader (state just for a single index for example) so I can't simply block a transport thread via a latch because that will be unstable and lead to other requests getting blocked potentially.
So you'll need some non-blocking delay here ideally (had to build that before for another test I believe), and then to really only get two requests for X put-follow actions, you will have to have some waiting for when exactly all those requests have gone out and had their response which is all not rocket science but will probably take me a couple of hours to get really stable.

@tmgordeeva tmgordeeva added v8.5.2 and removed v8.5.1 labels Nov 15, 2022
@kingherc kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@original-brownbear
Copy link
Member Author

Thanks Henning! reviewed before I even had the chance to ping ❤️ :)

@original-brownbear original-brownbear added the auto-backport-and-merge Automatically create backport pull requests and merge when ready label Nov 20, 2022
@original-brownbear original-brownbear merged commit d1c5ca2 into elastic:main Nov 20, 2022
@original-brownbear original-brownbear deleted the fix-ccr-duplicate-cs-requests branch November 20, 2022 18:12
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.17 Commit could not be cherrypicked due to conflicts
8.5 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 91398

@bpintea bpintea added v8.5.3 and removed v8.5.2 labels Nov 22, 2022
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 19, 2023
We run the same request back to back for each put-follower call during
the restore. Also, concurrent put-follower calls will all run the same
full CS request concurrently.
In older versions prior to elastic#87235
the concurrency was limited by the size of the snapshot pool. With that
fix though, they are run at almost arbitry concurrency when many
put-follow requests are executed concurrently.
-> fixed by using the existing deduplicator to only run a single remote
CS request at a time for each CCR repository.
Also, this removes the needless forking in the put-follower action that
is not necessary any longer now that we have the CCR repository
non-blocking (we do the same for normal restores that can safely be
started from a transport thread), which should fix some bad-ux
situations where the snapshot threads are busy on master, making
the put-follower requests not go through in time.
elasticsearchmachine pushed a commit that referenced this pull request Apr 19, 2023
We run the same request back to back for each put-follower call during
the restore. Also, concurrent put-follower calls will all run the same
full CS request concurrently.
In older versions prior to #87235
the concurrency was limited by the size of the snapshot pool. With that
fix though, they are run at almost arbitry concurrency when many
put-follow requests are executed concurrently.
-> fixed by using the existing deduplicator to only run a single remote
CS request at a time for each CCR repository.
Also, this removes the needless forking in the put-follower action that
is not necessary any longer now that we have the CCR repository
non-blocking (we do the same for normal restores that can safely be
started from a transport thread), which should fix some bad-ux
situations where the snapshot threads are busy on master, making
the put-follower requests not go through in time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport-and-merge Automatically create backport pull requests and merge when ready >bug :Distributed/CCR Issues around the Cross Cluster State Replication features Team:Distributed Meta label for distributed team v7.17.10 v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants