On the same device (8295), Qwen3-0.6B and Qwen3-1.7B exhibit different CPU utilization, with Qwen3-0.6B showing higher CPU usage.

### 🚀 The feature, motivation and pitch

old version commit 763a4747b0f23d31a9b568d800954de928603ebd
Qwen3-0.6B
quan and convert setting as https://github.com/pytorch/executorch/issues/15410
seq_mse_candidates = 1000

run on the device 8295：--seq_len 1024 --kv_updater ShiftPointer 

[H[JTasks: 482 total,   1 running, 481 sleeping,   0 stopped,   0 zombie
[mm12286 root         20   0  10G  27M 4.6M R 93.0   0.1   0:00.16 qnn_llama_runne+
[mm12286 root         20   0  10G 103M 5.1M R 60.0   0.4   0:01.09 qnn_llama_runne+
12286 root         20   0  12G 1.0G 897M S 28.0   4.3   0:01.69 qnn_llama_runne+
[mm12286 root         20   0  12G 1.0G 897M R 28.0   4.3   0:01.97 qnn_llama_runne+
12286 root         20   0  12G 1.0G 897M S 30.0   4.3   0:02.25 qnn_llama_runne+
[mm12286 root         20   0  12G 1.0G 897M R 28.0   4.3   0:02.55 qnn_llama_runne+
12286 root         20   0  12G 1.0G 897M S 29.0   4.3   0:02.83 qnn_llama_runne+
[mm12286 root         20   0  12G 1.0G 897M R 28.0   4.3   0:03.12 qnn_llama_runne+
[mm12286 root         20   0  12G 1.0G 897M R 30.0   4.3   0:03.40 qnn_llama_runne+
[mm12286 root         20   0  12G 1.0G 897M R 30.0   4.3   0:03.70 qnn_llama_runne+
[mm12419 root         20   0  10G  10M 4.4M R 97.0   0.0   0:00.03 qnn_llama_runne+
[mm12419 root         20   0  10G 127M 5.1M R 65.0   0.5   0:01.00 qnn_llama_runne+
12419 root         20   0  12G 1.0G 897M S 28.0   4.3   0:01.65 qnn_llama_runne+
12419 root         20   0  12G 1.0G 897M S 26.0   4.3   0:01.93 qnn_llama_runne+
12419 root         20   0  12G 1.0G 897M S 28.0   4.3   0:02.19 qnn_llama_runne+
12419 root         20   0  12G 1.0G 897M S 27.0   4.3   0:02.47 qnn_llama_runne+
12419 root         20   0  12G 1.0G 897M S 26.0   4.3   0:02.74 qnn_llama_runne+
12419 root         20   0  12G 1.0G 897M S 27.0   4.3   0:03.00 qnn_llama_runne+
[mm12419 root         20   0  12G 1.0G 897M R 29.0   4.3   0:03.27 qnn_llama_runne+
12419 root         20   0  12G 1.0G 897M S 30.0   4.3   0:03.56 qnn_llama_runne+
12419 root         20   0  12G 1.0G 897M S 27.0   4.3   0:03.86 qnn_llama_runne+
[mm12710 root         20   0  10G  84M 5.2M R 74.0   0.3   0:00.62 qnn_llama_runne+
[mm12710 root         20   0  11G 488M 433M R 54.0   1.9   0:01.36 qnn_llama_runne+
[mm12710 root         20   0  12G 1.0G 897M R 28.0   4.2   0:01.90 qnn_llama_runne+
[mm12710 root         20   0  12G 1.0G 897M R 26.0   4.3   0:02.18 qnn_llama_runne+
[mm12710 root         20   0  12G 1.0G 897M R 28.0   4.3   0:02.44 qnn_llama_runne+
[mm12710 root         20   0  12G 1.0G 897M R 26.0   4.3   0:02.72 qnn_llama_runne+
[mm12710 root         20   0  12G 1.0G 897M R 25.0   4.3   0:02.98 qnn_llama_runne+
12710 root         20   0  12G 1.0G 897M S 29.0   4.3   0:03.23 qnn_llama_runne+
[mm12710 root         20   0  12G 1.0G 897M R 29.0   4.3   0:03.52 qnn_llama_runne+
[mm12873 root         20   0  10G  26M 4.5M R 95.0   0.1   0:00.18 qnn_llama_runne+
[mm12873 root         20   0  10G 104M 5.0M R 61.0   0.4   0:01.13 qnn_llama_runne+
12873 root         20   0  12G 1.0G 897M S 29.0   4.3   0:01.74 qnn_llama_runne+
12873 root         20   0  12G 1.0G 897M S 29.0   4.3   0:02.03 qnn_llama_runne+
[mm12873 root         20   0  12G 1.0G 897M R 29.0   4.3   0:02.32 qnn_llama_runne+
12873 root         20   0  12G 1.0G 897M S 26.0   4.3   0:02.61 qnn_llama_runne+
[mm12873 root         20   0  12G 1.0G 897M R 30.0   4.3   0:02.87 qnn_llama_runne+
12873 root         20   0  12G 1.0G 897M S 29.0   4.3   0:03.17 qnn_llama_runne+
12873 root         20   0  12G 1.0G 897M S 29.0   4.3   0:03.46 qnn_llama_runne+


Qwen3-1.7B:

quan and convert setting:
`    num_sharding = 2
    # quant config
    # ptq = QuantDtype.use_16a4w_block
    ptq = QuantDtype.use_16a8w
    group_size = None
    masked_softmax = True
    seq_mse_candidates = 0
    r1 = False
    r2 = False
    r3 = False`

run on the device 8295：--seq_len 1024 --kv_updater ShiftPointer 

 2042 root         20   0  15G 2.3G 2.2G S 14.2   9.8   0:07.83 qnn_llama_runne+
[H[JTasks: 486 total,   2 running, 484 sleeping,   0 stopped,   0 zombie
 2042 root         20   0  15G 2.3G 2.2G S 15.0   9.8   0:07.87 qnn_llama_runne+
 2042 root         20   0  15G 2.3G 2.2G S 15.0   9.8   0:08.32 qnn_llama_runne+
 2042 root         20   0  15G 2.3G 2.2G S 15.3   9.8   0:08.77 qnn_llama_runne+
[mm 2042 root         20   0  15G 2.3G 2.2G R 14.3   9.8   0:09.23 qnn_llama_runne+
 2042 root         20   0  15G 2.3G 2.2G S 15.3   9.8   0:09.66 qnn_llama_runne+
 2042 root         20   0  15G 2.3G 2.2G S 14.6   9.8   0:10.12 qnn_llama_runne+
 2042 root         20   0  15G 2.3G 2.2G S 16.0   9.8   0:10.56 qnn_llama_runne+
 2042 root         20   0  15G 2.3G 2.2G S 18.6   9.8   0:11.04 qnn_llama_runne+
[mm 4787 root         20   0  13G 868M 807M R 37.0   3.5   0:01.63 qnn_llama_runne+
 4787 root         20   0  15G 2.3G 2.2G S 15.0   9.8   0:02.74 qnn_llama_runne+
 4787 root         20   0  15G 2.3G 2.2G S 15.0   9.8   0:03.19 qnn_llama_runne+
[mm 4787 root         20   0  15G 2.3G 2.2G R 15.6   9.8   0:03.64 qnn_llama_runne+
[mm 4787 root         20   0  15G 2.3G 2.2G R 15.6   9.8   0:04.11 qnn_llama_runne+
 4787 root         20   0  15G 2.3G 2.2G S 13.6   9.8   0:04.58 qnn_llama_runne+
[mm 4787 root         20   0  15G 2.3G 2.2G R 15.3   9.8   0:04.99 qnn_llama_runne+
 4787 root         20   0  15G 2.3G 2.2G S 15.0   9.8   0:05.45 qnn_llama_runne+
 4787 root         20   0  15G 2.3G 2.2G S 15.0   9.8   0:05.90 qnn_llama_runne+
 4787 root         20   0  15G 2.3G 2.2G S 15.3   9.8   0:06.35 qnn_llama_runne+
[mm 4787 root         20   0  15G 2.3G 2.2G R 15.3   9.8   0:06.81 qnn_llama_runne+
 4787 root         20   0  15G 2.3G 2.2G S 15.0   9.8   0:07.27 qnn_llama_runne+
 4787 root         20   0  15G 2.3G 2.2G S 15.6   9.8   0:07.72 qnn_llama_runne+
[mm 4787 root         20   0  15G 2.3G 2.2G R 14.6   9.8   0:08.19 qnn_llama_runne+


The runner with Qwen3-0.6b needs higher CPU usage than that with Qwen3-1.7b, Is it caused by SeqMSE or others?do you have any suggestions or insights that could help me? 
### Alternatives

_No response_

### Additional context

_No response_

### RFC (Optional)

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

On the same device (8295), Qwen3-0.6B and Qwen3-1.7B exhibit different CPU utilization, with Qwen3-0.6B showing higher CPU usage. #15998

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

On the same device (8295), Qwen3-0.6B and Qwen3-1.7B exhibit different CPU utilization, with Qwen3-0.6B showing higher CPU usage. #15998

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions