100% CPU single core usage while everything is fully offloaded to the GPU #22238

kstoykov · 2026-04-22T08:32:27Z

kstoykov
Apr 22, 2026

Few days ago I setup llama.cpp using Qwen3.5-27B model using 100% GPU. It is working well but I have strange problem.

During prefill or decode there is 1 CPU thread that utilize the CPU at 100% therefore generating a lot of heat. Initially I thought that this is normal. I did the following test.

I downclocked the CPU from 5.45GHz to 1.9GHz. I was expecting a significant drop in performance. For my surprise the performance didn't change at all. I think that this experiment confirms the fact that this 1 thread is bug/error/site-effect rather than a feature.

Does anyone else experience such an issue and is there a way to solve it?

kstoykov · 2026-06-07T12:01:35Z

kstoykov
Jun 7, 2026
Author

I tried other backend - tabbyapi using ExLlamaV3. There is exactly same issue that a single thread is loaded the CPU. Maybe this is not a problem but rather something expected. I'm still trying to figure out what exactly is happening.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

100% CPU single core usage while everything is fully offloaded to the GPU #22238

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

100% CPU single core usage while everything is fully offloaded to the GPU #22238

Uh oh!

Uh oh!

kstoykov Apr 22, 2026

Replies: 1 comment

Uh oh!

kstoykov Jun 7, 2026 Author

kstoykov
Apr 22, 2026

kstoykov
Jun 7, 2026
Author