Sudden high search latencies with 50% used QueryNodes #33293
-
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 13 replies
-
Some questions:
CPU usage of querynode is not high, which indicates the bottleneck is in query() since query() might fetch data from storage. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I was thinking about the topK value that is not the one we requested. Would it be possible that the query nodes ask for a higher topK depending on the targeted bucket ? Except than that, I noticed that in the search topK, there are 2 query nodes that are not contributing : |
Beta Was this translation helpful? Give feedback.
-
Hi!
Now the load is more spread accross all query nodes and I'm starting to wonder if the bottleneck is the fact that as all collections have 4 shards, there are too many shard delegators and growing segment per collection + 10 query nodes, maybe it's a bit too much of messages to process for querynode delegators Should we recreate all collection with shards_num=2. pprof in something we will do after this move. |
Beta Was this translation helpful? Give feedback.
-
I don't think it's really help to reduce shard numbers, unless querynodes is the bottleneck. But you could try with next release. The segment number distribution seems to be not very balanced. did you check the the reason? |
Beta Was this translation helpful? Give feedback.
-
Hey !
Thank a lot again for your help and availability that we really appreciate <3 |
Beta Was this translation helpful? Give feedback.
Hey !
It seems we finally stabilize this situation and understand a lot of stuff. Here is the summary of everything we did to conclude this issue: