Fork CCS remote-cluster responses #98124

DaveCTurner · 2023-08-02T07:51:29Z

Today during CCS we process responses from remote clusters on the
transport worker thread, but that's not the right place for this work.
This commit moves it to the SEARCH_COORDINATION pool.

Closes #97997

Today during CCS we process responses from remote clusters on the transport worker thread, but that's not the right place for this work. This commit moves it to the `SEARCH_COORDINATION` pool. Closes elastic#97997

elasticsearchmachine · 2023-08-02T07:51:53Z

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine · 2023-08-02T07:51:53Z

Hi @DaveCTurner, I've created a changelog YAML for you.

quux00 · 2023-08-04T17:55:14Z

I did a little ad-hoc testing of this change one of my CCS branches in progress and it seemed the same as before based on my limited testing of it (e.g., I'm currently dealing with a strange async-search CCS race condition and it happens on both SEARCH_COORDINATION threadpool and the DIRECT/transport pool).

I don't know a lot about ES threadpool details. Are there any limitations or considerations I should be aware of that this SEARCH_COORDINATION threadpool has compared to the "DIRECT" (transport) thread pool?

DaveCTurner · 2023-08-04T18:38:21Z

Are there any limitations or considerations I should be aware of

We have to be very careful about what we run on the transport pool. We still use it far too much, with cluster-destroying consequences sometimes, although these days it's not as bad as it used to be. A lot of the places we use it are because historically this has been the default choice, but it's a really bad default choice: we really should default to another pool unless there's very good reasons to keep work on the transport pool. CCS responses in particular can be too large to reasonably process on the transport worker, and CCS has enough latency that we can afford a few µs of forking overhead, and I believe if DIRECT hadn't been the default up until #98131 then we would have chosen a different pool years ago.

In contrast the SEARCH_COORDINATION pool intended for the bits of searches that are kinda lightweight (IO-free) but not light enough to run directly on the transport pool. Today that mostly means the can-match phase, but processing CCS responses also fits into this category IMO. It has a bounded queue with default length of 1000; the bound does not apply to responses, but if the queue was full of response handling then we would start rejecting requests. One slight concern is that it defaults to having ½#CPUs threads, with a max of 5, whereas there are #CPUs transport threads. But then again there's no work-sharing between transport threads so you can hit bottlenecks at any scale there. I can see a few more things that I think make sense to move to SEARCH_COORDINATION, and with enough things happening here I could see an argument for expanding it to having #CPUs threads.

Similar to elastic#98124, this action involves doing potentially O(#shards) work on both sender and receiver so it'd be best to avoid the transport worker.

quux00

LGTM

Similar to #98124, this action involves doing potentially O(#shards) work on both sender and receiver so it'd be best to avoid the transport worker.

Fork CCS remote-cluster responses

3bee598

Today during CCS we process responses from remote clusters on the transport worker thread, but that's not the right place for this work. This commit moves it to the `SEARCH_COORDINATION` pool. Closes elastic#97997

DaveCTurner added >bug :Search/Search Search-related issues that do not fall into other categories v8.10.0 labels Aug 2, 2023

DaveCTurner requested review from quux00 and javanna August 2, 2023 07:51

elasticsearchmachine added the Team:Search Meta label for search team label Aug 2, 2023

DaveCTurner added 2 commits August 2, 2023 08:51

Update docs/changelog/98124.yaml

4515419

Spotless

b67700c

DaveCTurner mentioned this pull request Aug 4, 2023

Fork CCS search-shards handling #98209

Merged

quux00 approved these changes Aug 4, 2023

View reviewed changes

javanna approved these changes Aug 7, 2023

View reviewed changes

DaveCTurner mentioned this pull request Aug 7, 2023

Remove bound on SEARCH_COORDINATION default size #98264

Merged

DaveCTurner merged commit 837dcc2 into elastic:main Aug 8, 2023
12 checks passed

DaveCTurner added a commit that referenced this pull request Aug 8, 2023

Fork CCS search-shards handling (#98209)

cd48337

Similar to #98124, this action involves doing potentially O(#shards) work on both sender and receiver so it'd be best to avoid the transport worker.

DaveCTurner deleted the 2023/08/02/ccs-response-executor branch August 8, 2023 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fork CCS remote-cluster responses #98124

Fork CCS remote-cluster responses #98124

DaveCTurner commented Aug 2, 2023

elasticsearchmachine commented Aug 2, 2023

elasticsearchmachine commented Aug 2, 2023

quux00 commented Aug 4, 2023

DaveCTurner commented Aug 4, 2023

quux00 left a comment

Fork CCS remote-cluster responses #98124

Fork CCS remote-cluster responses #98124

Conversation

DaveCTurner commented Aug 2, 2023

elasticsearchmachine commented Aug 2, 2023

elasticsearchmachine commented Aug 2, 2023

quux00 commented Aug 4, 2023

DaveCTurner commented Aug 4, 2023

quux00 left a comment

Choose a reason for hiding this comment