Skip to content

On the same device (8295), Qwen3-0.6B and Qwen3-1.7B exhibit different CPU utilization, with Qwen3-0.6B showing higher CPU usage. #15998

@lansexinhu

Description

@lansexinhu

🚀 The feature, motivation and pitch

old version commit 763a474
Qwen3-0.6B
quan and convert setting as #15410
seq_mse_candidates = 1000

run on the device 8295:--seq_len 1024 --kv_updater ShiftPointer

�[H�[JTasks: 482 total, 1 running, 481 sleeping, 0 stopped, 0 zombie
�[mm12286 root 20 0 10G 27M 4.6M R 93.0 0.1 0:00.16 qnn_llama_runne+
�[mm12286 root 20 0 10G 103M 5.1M R 60.0 0.4 0:01.09 qnn_llama_runne+
12286 root 20 0 12G 1.0G 897M S 28.0 4.3 0:01.69 qnn_llama_runne+
�[mm12286 root 20 0 12G 1.0G 897M R 28.0 4.3 0:01.97 qnn_llama_runne+
12286 root 20 0 12G 1.0G 897M S 30.0 4.3 0:02.25 qnn_llama_runne+
�[mm12286 root 20 0 12G 1.0G 897M R 28.0 4.3 0:02.55 qnn_llama_runne+
12286 root 20 0 12G 1.0G 897M S 29.0 4.3 0:02.83 qnn_llama_runne+
�[mm12286 root 20 0 12G 1.0G 897M R 28.0 4.3 0:03.12 qnn_llama_runne+
�[mm12286 root 20 0 12G 1.0G 897M R 30.0 4.3 0:03.40 qnn_llama_runne+
�[mm12286 root 20 0 12G 1.0G 897M R 30.0 4.3 0:03.70 qnn_llama_runne+
�[mm12419 root 20 0 10G 10M 4.4M R 97.0 0.0 0:00.03 qnn_llama_runne+
�[mm12419 root 20 0 10G 127M 5.1M R 65.0 0.5 0:01.00 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 28.0 4.3 0:01.65 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 26.0 4.3 0:01.93 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 28.0 4.3 0:02.19 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 27.0 4.3 0:02.47 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 26.0 4.3 0:02.74 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 27.0 4.3 0:03.00 qnn_llama_runne+
�[mm12419 root 20 0 12G 1.0G 897M R 29.0 4.3 0:03.27 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 30.0 4.3 0:03.56 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 27.0 4.3 0:03.86 qnn_llama_runne+
�[mm12710 root 20 0 10G 84M 5.2M R 74.0 0.3 0:00.62 qnn_llama_runne+
�[mm12710 root 20 0 11G 488M 433M R 54.0 1.9 0:01.36 qnn_llama_runne+
�[mm12710 root 20 0 12G 1.0G 897M R 28.0 4.2 0:01.90 qnn_llama_runne+
�[mm12710 root 20 0 12G 1.0G 897M R 26.0 4.3 0:02.18 qnn_llama_runne+
�[mm12710 root 20 0 12G 1.0G 897M R 28.0 4.3 0:02.44 qnn_llama_runne+
�[mm12710 root 20 0 12G 1.0G 897M R 26.0 4.3 0:02.72 qnn_llama_runne+
�[mm12710 root 20 0 12G 1.0G 897M R 25.0 4.3 0:02.98 qnn_llama_runne+
12710 root 20 0 12G 1.0G 897M S 29.0 4.3 0:03.23 qnn_llama_runne+
�[mm12710 root 20 0 12G 1.0G 897M R 29.0 4.3 0:03.52 qnn_llama_runne+
�[mm12873 root 20 0 10G 26M 4.5M R 95.0 0.1 0:00.18 qnn_llama_runne+
�[mm12873 root 20 0 10G 104M 5.0M R 61.0 0.4 0:01.13 qnn_llama_runne+
12873 root 20 0 12G 1.0G 897M S 29.0 4.3 0:01.74 qnn_llama_runne+
12873 root 20 0 12G 1.0G 897M S 29.0 4.3 0:02.03 qnn_llama_runne+
�[mm12873 root 20 0 12G 1.0G 897M R 29.0 4.3 0:02.32 qnn_llama_runne+
12873 root 20 0 12G 1.0G 897M S 26.0 4.3 0:02.61 qnn_llama_runne+
�[mm12873 root 20 0 12G 1.0G 897M R 30.0 4.3 0:02.87 qnn_llama_runne+
12873 root 20 0 12G 1.0G 897M S 29.0 4.3 0:03.17 qnn_llama_runne+
12873 root 20 0 12G 1.0G 897M S 29.0 4.3 0:03.46 qnn_llama_runne+

Qwen3-1.7B:

quan and convert setting:
num_sharding = 2 # quant config # ptq = QuantDtype.use_16a4w_block ptq = QuantDtype.use_16a8w group_size = None masked_softmax = True seq_mse_candidates = 0 r1 = False r2 = False r3 = False

run on the device 8295:--seq_len 1024 --kv_updater ShiftPointer

2042 root 20 0 15G 2.3G 2.2G S 14.2 9.8 0:07.83 qnn_llama_runne+
�[H�[JTasks: 486 total, 2 running, 484 sleeping, 0 stopped, 0 zombie
2042 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:07.87 qnn_llama_runne+
2042 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:08.32 qnn_llama_runne+
2042 root 20 0 15G 2.3G 2.2G S 15.3 9.8 0:08.77 qnn_llama_runne+
�[mm 2042 root 20 0 15G 2.3G 2.2G R 14.3 9.8 0:09.23 qnn_llama_runne+
2042 root 20 0 15G 2.3G 2.2G S 15.3 9.8 0:09.66 qnn_llama_runne+
2042 root 20 0 15G 2.3G 2.2G S 14.6 9.8 0:10.12 qnn_llama_runne+
2042 root 20 0 15G 2.3G 2.2G S 16.0 9.8 0:10.56 qnn_llama_runne+
2042 root 20 0 15G 2.3G 2.2G S 18.6 9.8 0:11.04 qnn_llama_runne+
�[mm 4787 root 20 0 13G 868M 807M R 37.0 3.5 0:01.63 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:02.74 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:03.19 qnn_llama_runne+
�[mm 4787 root 20 0 15G 2.3G 2.2G R 15.6 9.8 0:03.64 qnn_llama_runne+
�[mm 4787 root 20 0 15G 2.3G 2.2G R 15.6 9.8 0:04.11 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 13.6 9.8 0:04.58 qnn_llama_runne+
�[mm 4787 root 20 0 15G 2.3G 2.2G R 15.3 9.8 0:04.99 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:05.45 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:05.90 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.3 9.8 0:06.35 qnn_llama_runne+
�[mm 4787 root 20 0 15G 2.3G 2.2G R 15.3 9.8 0:06.81 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:07.27 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.6 9.8 0:07.72 qnn_llama_runne+
�[mm 4787 root 20 0 15G 2.3G 2.2G R 14.6 9.8 0:08.19 qnn_llama_runne+

The runner with Qwen3-0.6b needs higher CPU usage than that with Qwen3-1.7b, Is it caused by SeqMSE or others?do you have any suggestions or insights that could help me?

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions