Skip to content

Conversation

@original-brownbear
Copy link
Contributor

Forking to GENERIC makes sense here since we sporadically block for a macroscopic amount of time to protect transport threads, but in almost all cases the operation operates on the same data structures and is very fast. Since it's also very frequent, we shouldn't be creating a bunch of generic threads during a burst -> lets throttle to a single thread.
Throttler logic is taken from the indices cluster service and we can maybe deduplicate in a follow-up.

relates #83515

Forking to `GENERIC` makes sense here since we sporadically block for a
macroscopic amount of time to protect transport threads, but in almost
all cases the operation operates on the same data structures and is very
fast. Since it's also very frequent, we shouldn't be creating a bunch of
generic threads during a burst -> lets throttle to a single thread.
@original-brownbear original-brownbear added >non-issue :Search/Search Search-related issues that do not fall into other categories labels Jul 22, 2024
@elasticsearchmachine elasticsearchmachine added v8.16.0 Team:Search Meta label for search team labels Jul 22, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@original-brownbear original-brownbear added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 22, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 22, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change we should be able to wait for all closes to complete in tests without an assertBusy(): just send another task down the queue and wait for it to complete.

@original-brownbear
Copy link
Contributor Author

just send another task down the queue and wait for it to complete.

Right on a per-node basis this is true, but needs new code in production I guess?
Suppose we could just sent a task per node with some non-existing reader id, that would do the trick as well. Let me see what I can do :)

@original-brownbear
Copy link
Contributor Author

just send another task down the queue and wait for it to complete.

Looking into actually doing this I wonder what this really gets us? It' a couple extra lines of code and in fact a slowdown for almost every test suite since we very rarely leak the generic task here. Also, it's not good enough for the multi-node case I believe where the task can be sort of queued on the wire as well as on the generic pool.
To me, not doing anything tricky here seems the fastest and least code (though I admit to cringing every time I see a busy assert :)). WDYT?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into actually doing this I wonder what this really gets us?

Using assertBusy() when there's a more direct way to wait for the condition we want is a drag on test performance in thousands of tiny ways. Even if a direct wait is a few µs slower than the first assertBusy() iteration, if that iteration doesn't succeed once every few hundred runs then we lose out in aggregate. Also I read a need for assertBusy() as a suggestion of a deeper design issue - I've burned a lot of time fixing fire-and-forget stuff that suddenly needs to not be so forgetful.

So yeah ok we can do this here, each assertBusy() in isolation doesn't make things that much worse, it's the fact that it's used ≥2k times and counting that is the problem.

@original-brownbear
Copy link
Contributor Author

Thanks David!

So yeah ok we can do this here, each assertBusy() in isolation doesn't make things that much worse, it's the fact that it's used ≥2k times and counting that is the problem.

Believe me I'm 100% with you here and have actively removed a bunch of them for that very reason.
In this one case though, I'd say it's a non-issue in practice. At least the newly added single node case. It literally fails the first iteration less than 1/10k. But I could see a much higher failure rate in the multi-node case that exists already. Let me see if it's worth the hassle of a fix in a follow-up real quick :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.16.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants