Try next replica shard when slow request found in search action #82164
Labels
>enhancement
:Search Foundations/Search
Catch all for Search Foundations
Team:Search Foundations
Meta label for the Search Foundations team in Elasticsearch
When searching the shards, some shards may return very slowly, it causes the query to take a long time, it perhaps the data node query queue was suddenly full, or JVM GC, or something like that, This causes to query burrs, a simple query should return within 1 second, but it took 4 second or more. This usually happens with a high query QPS.
Although we have ARS, but he can't solve the issue , on the contrary, we can try the next shard replica when most of the shard requests have already been returned, and the running time of the slow request is greater than a certain threshold, then use the first result to call
onShardResult
. many other systems have similar mechanismsI implemented this idea, and test it in the our cluster, found that query latency is significantly reduced, the picture below is a comparison of before and after this mechanism is enabled
The text was updated successfully, but these errors were encountered: