-
Notifications
You must be signed in to change notification settings - Fork 741
Description
🚀 The feature, motivation and pitch
old version commit 763a474
Qwen3-0.6B
quan and convert setting as #15410
seq_mse_candidates = 1000
run on the device 8295:--seq_len 1024 --kv_updater ShiftPointer
�[H�[JTasks: 482 total, 1 running, 481 sleeping, 0 stopped, 0 zombie
�[mm12286 root 20 0 10G 27M 4.6M R 93.0 0.1 0:00.16 qnn_llama_runne+
�[mm12286 root 20 0 10G 103M 5.1M R 60.0 0.4 0:01.09 qnn_llama_runne+
12286 root 20 0 12G 1.0G 897M S 28.0 4.3 0:01.69 qnn_llama_runne+
�[mm12286 root 20 0 12G 1.0G 897M R 28.0 4.3 0:01.97 qnn_llama_runne+
12286 root 20 0 12G 1.0G 897M S 30.0 4.3 0:02.25 qnn_llama_runne+
�[mm12286 root 20 0 12G 1.0G 897M R 28.0 4.3 0:02.55 qnn_llama_runne+
12286 root 20 0 12G 1.0G 897M S 29.0 4.3 0:02.83 qnn_llama_runne+
�[mm12286 root 20 0 12G 1.0G 897M R 28.0 4.3 0:03.12 qnn_llama_runne+
�[mm12286 root 20 0 12G 1.0G 897M R 30.0 4.3 0:03.40 qnn_llama_runne+
�[mm12286 root 20 0 12G 1.0G 897M R 30.0 4.3 0:03.70 qnn_llama_runne+
�[mm12419 root 20 0 10G 10M 4.4M R 97.0 0.0 0:00.03 qnn_llama_runne+
�[mm12419 root 20 0 10G 127M 5.1M R 65.0 0.5 0:01.00 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 28.0 4.3 0:01.65 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 26.0 4.3 0:01.93 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 28.0 4.3 0:02.19 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 27.0 4.3 0:02.47 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 26.0 4.3 0:02.74 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 27.0 4.3 0:03.00 qnn_llama_runne+
�[mm12419 root 20 0 12G 1.0G 897M R 29.0 4.3 0:03.27 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 30.0 4.3 0:03.56 qnn_llama_runne+
12419 root 20 0 12G 1.0G 897M S 27.0 4.3 0:03.86 qnn_llama_runne+
�[mm12710 root 20 0 10G 84M 5.2M R 74.0 0.3 0:00.62 qnn_llama_runne+
�[mm12710 root 20 0 11G 488M 433M R 54.0 1.9 0:01.36 qnn_llama_runne+
�[mm12710 root 20 0 12G 1.0G 897M R 28.0 4.2 0:01.90 qnn_llama_runne+
�[mm12710 root 20 0 12G 1.0G 897M R 26.0 4.3 0:02.18 qnn_llama_runne+
�[mm12710 root 20 0 12G 1.0G 897M R 28.0 4.3 0:02.44 qnn_llama_runne+
�[mm12710 root 20 0 12G 1.0G 897M R 26.0 4.3 0:02.72 qnn_llama_runne+
�[mm12710 root 20 0 12G 1.0G 897M R 25.0 4.3 0:02.98 qnn_llama_runne+
12710 root 20 0 12G 1.0G 897M S 29.0 4.3 0:03.23 qnn_llama_runne+
�[mm12710 root 20 0 12G 1.0G 897M R 29.0 4.3 0:03.52 qnn_llama_runne+
�[mm12873 root 20 0 10G 26M 4.5M R 95.0 0.1 0:00.18 qnn_llama_runne+
�[mm12873 root 20 0 10G 104M 5.0M R 61.0 0.4 0:01.13 qnn_llama_runne+
12873 root 20 0 12G 1.0G 897M S 29.0 4.3 0:01.74 qnn_llama_runne+
12873 root 20 0 12G 1.0G 897M S 29.0 4.3 0:02.03 qnn_llama_runne+
�[mm12873 root 20 0 12G 1.0G 897M R 29.0 4.3 0:02.32 qnn_llama_runne+
12873 root 20 0 12G 1.0G 897M S 26.0 4.3 0:02.61 qnn_llama_runne+
�[mm12873 root 20 0 12G 1.0G 897M R 30.0 4.3 0:02.87 qnn_llama_runne+
12873 root 20 0 12G 1.0G 897M S 29.0 4.3 0:03.17 qnn_llama_runne+
12873 root 20 0 12G 1.0G 897M S 29.0 4.3 0:03.46 qnn_llama_runne+
Qwen3-1.7B:
quan and convert setting:
num_sharding = 2 # quant config # ptq = QuantDtype.use_16a4w_block ptq = QuantDtype.use_16a8w group_size = None masked_softmax = True seq_mse_candidates = 0 r1 = False r2 = False r3 = False
run on the device 8295:--seq_len 1024 --kv_updater ShiftPointer
2042 root 20 0 15G 2.3G 2.2G S 14.2 9.8 0:07.83 qnn_llama_runne+
�[H�[JTasks: 486 total, 2 running, 484 sleeping, 0 stopped, 0 zombie
2042 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:07.87 qnn_llama_runne+
2042 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:08.32 qnn_llama_runne+
2042 root 20 0 15G 2.3G 2.2G S 15.3 9.8 0:08.77 qnn_llama_runne+
�[mm 2042 root 20 0 15G 2.3G 2.2G R 14.3 9.8 0:09.23 qnn_llama_runne+
2042 root 20 0 15G 2.3G 2.2G S 15.3 9.8 0:09.66 qnn_llama_runne+
2042 root 20 0 15G 2.3G 2.2G S 14.6 9.8 0:10.12 qnn_llama_runne+
2042 root 20 0 15G 2.3G 2.2G S 16.0 9.8 0:10.56 qnn_llama_runne+
2042 root 20 0 15G 2.3G 2.2G S 18.6 9.8 0:11.04 qnn_llama_runne+
�[mm 4787 root 20 0 13G 868M 807M R 37.0 3.5 0:01.63 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:02.74 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:03.19 qnn_llama_runne+
�[mm 4787 root 20 0 15G 2.3G 2.2G R 15.6 9.8 0:03.64 qnn_llama_runne+
�[mm 4787 root 20 0 15G 2.3G 2.2G R 15.6 9.8 0:04.11 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 13.6 9.8 0:04.58 qnn_llama_runne+
�[mm 4787 root 20 0 15G 2.3G 2.2G R 15.3 9.8 0:04.99 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:05.45 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:05.90 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.3 9.8 0:06.35 qnn_llama_runne+
�[mm 4787 root 20 0 15G 2.3G 2.2G R 15.3 9.8 0:06.81 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.0 9.8 0:07.27 qnn_llama_runne+
4787 root 20 0 15G 2.3G 2.2G S 15.6 9.8 0:07.72 qnn_llama_runne+
�[mm 4787 root 20 0 15G 2.3G 2.2G R 14.6 9.8 0:08.19 qnn_llama_runne+
The runner with Qwen3-0.6b needs higher CPU usage than that with Qwen3-1.7b, Is it caused by SeqMSE or others?do you have any suggestions or insights that could help me?
Alternatives
No response
Additional context
No response
RFC (Optional)
No response