Skip to content

Conversation

javanna
Copy link
Member

@javanna javanna commented Oct 16, 2024

With recent changes in Lucene 9.12 around not forking execution when not necessary (see apache/lucene#13472), we have removed the search worker thread pool in #111099. The worker thread pool had unlimited queue, and we feared that we couuld have much more queueing on the search thread pool if we execute segment level searches on the same thread pool as the shard level searches, because every shard search would take up to a thread per slice when executing the query phase.

We have then introduced an additional conditional to stop parallelizing when there is a queue. That is perhaps a bit extreme, as it's a decision made when creating the searcher, while a queue may no longer be there once the search is executing. This has caused some benchmarks regressions, given that having a queue may be a transient scenario, especially with short-lived segment searches being queued up. We may end up disabling inter-segment concurrency more aggressively than we would want, penalizing requests that do benefit from concurrency. At the same time, we do want to have some kind of protection against rejections of shard searches that would be caused by excessive slicing. When the queue is above a certain size, we can turn off the slicing and effectively disable inter-segment concurrency. With this commit we set that threshold to be the number of threads in the search pool.

@javanna javanna added >bug auto-backport Automatically create backport pull requests when merged v8.16.0 :Search Foundations/Search Catch all for Search Foundations labels Oct 16, 2024
@elasticsearchmachine elasticsearchmachine added v9.0.0 Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch labels Oct 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one suggestion on maybe making the calculation a little more stable, feel free to ignore it though ... as I said I have little to no intuition about this :)

) {
return executor instanceof ThreadPoolExecutor tpe
&& tpe.getQueue().isEmpty()
&& tpe.getQueue().size() <= tpe.getMaximumPoolSize()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should give this a little more wiggle room. I have no intuition whatsoever about what percentage of a time a full queue will actually be "full" in the sense of this condition. You'll always have a thread randomly picking a task from the queue and it seems like this might hardly ever come into action?
Maybe maxSize - thread_count is more stable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my test, this has the same effect as removing the conditional. That feels good, because we can maximize the parallelism without losing the protection against search rejections. This obviously depends on what you are testing :) I think I am going to leave it this way for now, and we will eventually iterate over it as we do more testing / collect more info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good :) thanks!

@javanna
Copy link
Member Author

javanna commented Oct 16, 2024

Latency for http_logs track, runtime-fields challenge, before the change:

|                                                 Min Throughput |  400s-in-range-from-source-using-grok |    0.3         |  ops/s |
|                                                Mean Throughput |  400s-in-range-from-source-using-grok |    0.3         |  ops/s |
|                                              Median Throughput |  400s-in-range-from-source-using-grok |    0.3         |  ops/s |
|                                                 Max Throughput |  400s-in-range-from-source-using-grok |    0.3         |  ops/s |
|                                        50th percentile latency |  400s-in-range-from-source-using-grok | 1094.78        |     ms |
|                                        90th percentile latency |  400s-in-range-from-source-using-grok | 1168.09        |     ms |
|                                        99th percentile latency |  400s-in-range-from-source-using-grok | 1211.41        |     ms |
|                                       100th percentile latency |  400s-in-range-from-source-using-grok | 1235.26        |     ms |
|                                   50th percentile service time |  400s-in-range-from-source-using-grok | 1090.98        |     ms |
|                                   90th percentile service time |  400s-in-range-from-source-using-grok | 1164.08        |     ms |
|                                   99th percentile service time |  400s-in-range-from-source-using-grok | 1207.09        |     ms |
|                                  100th percentile service time |  400s-in-range-from-source-using-grok | 1232.93        |     ms |
|                                                     error rate |  400s-in-range-from-source-using-grok |    0           |      % |
|                                                 Min Throughput | 400s-in-range-from-keyword-using-grok |    0.3         |  ops/s |
|                                                Mean Throughput | 400s-in-range-from-keyword-using-grok |    0.3         |  ops/s |
|                                              Median Throughput | 400s-in-range-from-keyword-using-grok |    0.3         |  ops/s |
|                                                 Max Throughput | 400s-in-range-from-keyword-using-grok |    0.3         |  ops/s |
|                                        50th percentile latency | 400s-in-range-from-keyword-using-grok | 2187.38        |     ms |
|                                        90th percentile latency | 400s-in-range-from-keyword-using-grok | 2278.73        |     ms |
|                                        99th percentile latency | 400s-in-range-from-keyword-using-grok | 2345.78        |     ms |
|                                       100th percentile latency | 400s-in-range-from-keyword-using-grok | 2353.05        |     ms |
|                                   50th percentile service time | 400s-in-range-from-keyword-using-grok | 2184.1         |     ms |
|                                   90th percentile service time | 400s-in-range-from-keyword-using-grok | 2275.23        |     ms |
|                                   99th percentile service time | 400s-in-range-from-keyword-using-grok | 2341.93        |     ms |
|                                  100th percentile service time | 400s-in-range-from-keyword-using-grok | 2349.96        |     ms |
|                                                     error rate | 400s-in-range-from-keyword-using-grok |    0           |      % |

and after:

|                                                 Min Throughput |  400s-in-range-from-source-using-grok |    0.3         |  ops/s |
|                                                Mean Throughput |  400s-in-range-from-source-using-grok |    0.3         |  ops/s |
|                                              Median Throughput |  400s-in-range-from-source-using-grok |    0.3         |  ops/s |
|                                                 Max Throughput |  400s-in-range-from-source-using-grok |    0.3         |  ops/s |
|                                        50th percentile latency |  400s-in-range-from-source-using-grok |  612.345       |     ms |
|                                        90th percentile latency |  400s-in-range-from-source-using-grok |  657.275       |     ms |
|                                        99th percentile latency |  400s-in-range-from-source-using-grok |  674.38        |     ms |
|                                       100th percentile latency |  400s-in-range-from-source-using-grok |  686.795       |     ms |
|                                   50th percentile service time |  400s-in-range-from-source-using-grok |  607.592       |     ms |
|                                   90th percentile service time |  400s-in-range-from-source-using-grok |  652.377       |     ms |
|                                   99th percentile service time |  400s-in-range-from-source-using-grok |  671.428       |     ms |
|                                  100th percentile service time |  400s-in-range-from-source-using-grok |  684.497       |     ms |
|                                                     error rate |  400s-in-range-from-source-using-grok |    0           |      % |
|                                                 Min Throughput | 400s-in-range-from-keyword-using-grok |    0.3         |  ops/s |
|                                                Mean Throughput | 400s-in-range-from-keyword-using-grok |    0.3         |  ops/s |
|                                              Median Throughput | 400s-in-range-from-keyword-using-grok |    0.3         |  ops/s |
|                                                 Max Throughput | 400s-in-range-from-keyword-using-grok |    0.3         |  ops/s |
|                                        50th percentile latency | 400s-in-range-from-keyword-using-grok | 1047.08        |     ms |
|                                        90th percentile latency | 400s-in-range-from-keyword-using-grok | 1085.22        |     ms |
|                                        99th percentile latency | 400s-in-range-from-keyword-using-grok | 1100.75        |     ms |
|                                       100th percentile latency | 400s-in-range-from-keyword-using-grok | 1143.23        |     ms |
|                                   50th percentile service time | 400s-in-range-from-keyword-using-grok | 1044.02        |     ms |
|                                   90th percentile service time | 400s-in-range-from-keyword-using-grok | 1082.76        |     ms |
|                                   99th percentile service time | 400s-in-range-from-keyword-using-grok | 1097.8         |     ms |
|                                  100th percentile service time | 400s-in-range-from-keyword-using-grok | 1140.11        |     ms |
|                                                     error rate | 400s-in-range-from-keyword-using-grok |    0           |      % |

@javanna javanna requested review from a team as code owners October 16, 2024 14:18
javanna and others added 2 commits October 16, 2024 16:31
With recent changes in Lucene 9.12 around not forking execution when not necessary
(see apache/lucene#13472), we have removed the search
worker thread pool in elastic#111099. The worker thread pool had unlimited queue, and we
feared that we couuld have much more queueing on the search thread pool if we execute
segment level searches on the same thread pool as the shard level searches, because
every shard search would take up to a thread per slice when executing the query phase.

We have then introduced an additional conditional to stop parallelizing when there
is a queue. That is perhaps a bit extreme, as it's a decision made when creating the
searcher, while a queue may no longer be there once the search is executing.
This has caused some benchmarks regressions, given that having a queue may be a transient
scenario, especially with short-lived segment searches being queued up. We may end
up disabling inter-segment concurrency more aggressively than we would want, penalizing
requests that do benefit from concurrency. At the same time, we do want to have some kind
of protection against rejections of shard searches that would be caused by excessive slicing.
When the queue is above a certain size, we can turn off the slicing and effectively disable
inter-segment concurrency. With this commit we set that threshold to be the number of
threads in the search pool.
@javanna javanna force-pushed the fix/queue_size_concurrency branch from 7d82bb1 to 9572002 Compare October 16, 2024 14:32
@javanna javanna removed request for a team October 16, 2024 14:49
@javanna javanna changed the title Replace empty queue conditional from slicing logic Enhance empty queue conditional in slicing logic Oct 16, 2024
@javanna javanna added >non-issue and removed >bug labels Oct 16, 2024
@javanna javanna merged commit 8b87969 into elastic:main Oct 16, 2024
16 checks passed
@javanna javanna deleted the fix/queue_size_concurrency branch October 16, 2024 17:51
javanna added a commit to javanna/elasticsearch that referenced this pull request Oct 16, 2024
With recent changes in Lucene 9.12 around not forking execution when not necessary
(see apache/lucene#13472), we have removed the search
worker thread pool in elastic#111099. The worker thread pool had unlimited queue, and we
feared that we couuld have much more queueing on the search thread pool if we execute
segment level searches on the same thread pool as the shard level searches, because
every shard search would take up to a thread per slice when executing the query phase.

We have then introduced an additional conditional to stop parallelizing when there
is a queue. That is perhaps a bit extreme, as it's a decision made when creating the
searcher, while a queue may no longer be there once the search is executing.
This has caused some benchmarks regressions, given that having a queue may be a transient
scenario, especially with short-lived segment searches being queued up. We may end
up disabling inter-segment concurrency more aggressively than we would want, penalizing
requests that do benefit from concurrency. At the same time, we do want to have some kind
of protection against rejections of shard searches that would be caused by excessive slicing.
When the queue is above a certain size, we can turn off the slicing and effectively disable
inter-segment concurrency. With this commit we set that threshold to be the number of
threads in the search pool.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.16
8.x

javanna added a commit to javanna/elasticsearch that referenced this pull request Oct 16, 2024
With recent changes in Lucene 9.12 around not forking execution when not necessary
(see apache/lucene#13472), we have removed the search
worker thread pool in elastic#111099. The worker thread pool had unlimited queue, and we
feared that we couuld have much more queueing on the search thread pool if we execute
segment level searches on the same thread pool as the shard level searches, because
every shard search would take up to a thread per slice when executing the query phase.

We have then introduced an additional conditional to stop parallelizing when there
is a queue. That is perhaps a bit extreme, as it's a decision made when creating the
searcher, while a queue may no longer be there once the search is executing.
This has caused some benchmarks regressions, given that having a queue may be a transient
scenario, especially with short-lived segment searches being queued up. We may end
up disabling inter-segment concurrency more aggressively than we would want, penalizing
requests that do benefit from concurrency. At the same time, we do want to have some kind
of protection against rejections of shard searches that would be caused by excessive slicing.
When the queue is above a certain size, we can turn off the slicing and effectively disable
inter-segment concurrency. With this commit we set that threshold to be the number of
threads in the search pool.
@javanna
Copy link
Member Author

javanna commented Oct 16, 2024

I marked this non-issue because it addresses an unreleased regression introduced with #111099 . That is the reason why despite it's label, it is a bugfix and it needs backporting to all affected branches.

@javanna javanna added v8.16.1 and removed v8.16.0 labels Oct 16, 2024
elasticsearchmachine pushed a commit that referenced this pull request Oct 16, 2024
With recent changes in Lucene 9.12 around not forking execution when not necessary
(see apache/lucene#13472), we have removed the search
worker thread pool in #111099. The worker thread pool had unlimited queue, and we
feared that we couuld have much more queueing on the search thread pool if we execute
segment level searches on the same thread pool as the shard level searches, because
every shard search would take up to a thread per slice when executing the query phase.

We have then introduced an additional conditional to stop parallelizing when there
is a queue. That is perhaps a bit extreme, as it's a decision made when creating the
searcher, while a queue may no longer be there once the search is executing.
This has caused some benchmarks regressions, given that having a queue may be a transient
scenario, especially with short-lived segment searches being queued up. We may end
up disabling inter-segment concurrency more aggressively than we would want, penalizing
requests that do benefit from concurrency. At the same time, we do want to have some kind
of protection against rejections of shard searches that would be caused by excessive slicing.
When the queue is above a certain size, we can turn off the slicing and effectively disable
inter-segment concurrency. With this commit we set that threshold to be the number of
threads in the search pool.
elasticsearchmachine pushed a commit that referenced this pull request Oct 17, 2024
With recent changes in Lucene 9.12 around not forking execution when not necessary
(see apache/lucene#13472), we have removed the search
worker thread pool in #111099. The worker thread pool had unlimited queue, and we
feared that we couuld have much more queueing on the search thread pool if we execute
segment level searches on the same thread pool as the shard level searches, because
every shard search would take up to a thread per slice when executing the query phase.

We have then introduced an additional conditional to stop parallelizing when there
is a queue. That is perhaps a bit extreme, as it's a decision made when creating the
searcher, while a queue may no longer be there once the search is executing.
This has caused some benchmarks regressions, given that having a queue may be a transient
scenario, especially with short-lived segment searches being queued up. We may end
up disabling inter-segment concurrency more aggressively than we would want, penalizing
requests that do benefit from concurrency. At the same time, we do want to have some kind
of protection against rejections of shard searches that would be caused by excessive slicing.
When the queue is above a certain size, we can turn off the slicing and effectively disable
inter-segment concurrency. With this commit we set that threshold to be the number of
threads in the search pool.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024
With recent changes in Lucene 9.12 around not forking execution when not necessary
(see apache/lucene#13472), we have removed the search
worker thread pool in elastic#111099. The worker thread pool had unlimited queue, and we
feared that we couuld have much more queueing on the search thread pool if we execute
segment level searches on the same thread pool as the shard level searches, because
every shard search would take up to a thread per slice when executing the query phase.

We have then introduced an additional conditional to stop parallelizing when there
is a queue. That is perhaps a bit extreme, as it's a decision made when creating the
searcher, while a queue may no longer be there once the search is executing.
This has caused some benchmarks regressions, given that having a queue may be a transient
scenario, especially with short-lived segment searches being queued up. We may end
up disabling inter-segment concurrency more aggressively than we would want, penalizing
requests that do benefit from concurrency. At the same time, we do want to have some kind
of protection against rejections of shard searches that would be caused by excessive slicing.
When the queue is above a certain size, we can turn off the slicing and effectively disable
inter-segment concurrency. With this commit we set that threshold to be the number of
threads in the search pool.
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Nov 4, 2024
With recent changes in Lucene 9.12 around not forking execution when not necessary
(see apache/lucene#13472), we have removed the search
worker thread pool in elastic#111099. The worker thread pool had unlimited queue, and we
feared that we couuld have much more queueing on the search thread pool if we execute
segment level searches on the same thread pool as the shard level searches, because
every shard search would take up to a thread per slice when executing the query phase.

We have then introduced an additional conditional to stop parallelizing when there
is a queue. That is perhaps a bit extreme, as it's a decision made when creating the
searcher, while a queue may no longer be there once the search is executing.
This has caused some benchmarks regressions, given that having a queue may be a transient
scenario, especially with short-lived segment searches being queued up. We may end
up disabling inter-segment concurrency more aggressively than we would want, penalizing
requests that do benefit from concurrency. At the same time, we do want to have some kind
of protection against rejections of shard searches that would be caused by excessive slicing.
When the queue is above a certain size, we can turn off the slicing and effectively disable
inter-segment concurrency. With this commit we set that threshold to be the number of
threads in the search pool.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >non-issue :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.16.1 v8.17.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants