Replies: 1 comment
-
|
I tried other backend - tabbyapi using ExLlamaV3. There is exactly same issue that a single thread is loaded the CPU. Maybe this is not a problem but rather something expected. I'm still trying to figure out what exactly is happening. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Few days ago I setup llama.cpp using Qwen3.5-27B model using 100% GPU. It is working well but I have strange problem.
During prefill or decode there is 1 CPU thread that utilize the CPU at 100% therefore generating a lot of heat. Initially I thought that this is normal. I did the following test.
I downclocked the CPU from 5.45GHz to 1.9GHz. I was expecting a significant drop in performance. For my surprise the performance didn't change at all. I think that this experiment confirms the fact that this 1 thread is bug/error/site-effect rather than a feature.
Does anyone else experience such an issue and is there a way to solve it?
Beta Was this translation helpful? Give feedback.
All reactions