QNN: huge accuracy degradation in 16a4w quantization #2590

salykova · 2024-03-22T07:43:19Z

Dear @cccclai, @shewu-quic, @chunit-quic

Im trying to quantize the dummy llama model and run on my Qualcomm device

python ./examples/qualcomm/scripts/dummy_llama2.py --model SM8550 --device *** -b build_android --ptq 16a4w

However the output of the quantized model is far away from where it should be

/data/local/tmp/executorch/dummy_llama2_qnn...ile pulled. 0.1 MB/s (6144 bytes in 0.087s)
is_close? False
x86_golden tensor([[[ 0.2713,  0.5471, -0.3194,  ...,  0.1733, -0.7186, -1.1417],
         [ 0.2635,  0.0273, -0.1612,  ...,  1.2671, -1.4816, -0.6256],
         [ 0.1451, -0.5109,  0.0358,  ...,  0.4289, -0.3217, -1.4835]]],
       grad_fn=<UnsafeViewBackward0>)
device_out tensor([[[-0.3499, -0.3881,  0.5011,  ..., -0.2530,  0.3161, -0.0744],
         [ 0.4127, -0.4308, -0.5663,  ..., -0.3564,  0.0952,  0.7879],
         [-0.2407, -0.5039,  0.3697,  ..., -0.1345,  0.5565,  0.1253]]])

Is it a known issue or did I do something wrong? Are there guidelines or a strategy how to quantize these models in 16a4w and keep accuracy on reasonable level? I would be grateful if you could give some insight on it. Btw, non-quantized model outputs accurate results.

Full log:

opcode         name                      target                       args                           kwargs
-------------  ------------------------  ---------------------------  -----------------------------  --------
placeholder    arg55_1                   arg55_1                      ()                             {}
get_attr       lowered_module_0          lowered_module_0             ()                             {}
call_function  executorch_call_delegate  executorch_call_delegate     (lowered_module_0, arg55_1)    {}
call_function  getitem                   <built-in function getitem>  (executorch_call_delegate, 0)  {}
output         output                    output                       ((getitem,),)                  {}
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
./dummy_llama2/dummy_llama2_qnn.pte: 1 file pushed. 12.8 MB/s (611280 bytes in 0.046s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/aar...pushed. 33.4 MB/s (1545776 bytes in 0.044s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/hex...pushed. 37.5 MB/s (7360784 bytes in 0.187s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/aar... pushed. 27.7 MB/s (290504 bytes in 0.010s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/aar...ushed. 38.1 MB/s (26628144 bytes in 0.666s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/aar... pushed. 14.1 MB/s (229024 bytes in 0.015s)
build_android/examples/qualcomm/qnn_executo...shed. 38.0 MB/s (385895152 bytes in 9.695s)
build_android/backends/qualcomm/libqnn_exec...pushed. 36.9 MB/s (8854840 bytes in 0.229s)
/home/anzen/Projects/executorch/dummy_llama... file pushed. 0.0 MB/s (14 bytes in 0.002s)
/home/anzen/Projects/executorch/dummy_llama... file pushed. 0.0 MB/s (12 bytes in 0.002s)
I 00:00:00.003985 executorch:qnn_executor_runner.cpp:81] Model file dummy_llama2_qnn.pte is loaded.
I 00:00:00.004075 executorch:qnn_executor_runner.cpp:90] Using method forward
I 00:00:00.004103 executorch:qnn_executor_runner.cpp:138] Setting up planned buffer 0, size 6160.
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 2
[WARNING] [Qnn ExecuTorch]:  <W> Initializing HtpProvider

[WARNING] [Qnn ExecuTorch]:  <W> Function not called, PrepareLib isn't loaded!

[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[WARNING] [Qnn ExecuTorch]:  <W> sg_stubPtr is not null, skip loadRemoteSymbols


[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> sg_stubPtr is not null, skip loadRemoteSymbols


[WARNING] [Qnn ExecuTorch]:  <W> Function not called, PrepareLib isn't loaded!

[WARNING] [Qnn ExecuTorch]:  <W> sg_stubPtr is not null, skip loadRemoteSymbols


[WARNING] [Qnn ExecuTorch]:  <W> Function not called, PrepareLib isn't loaded!

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[INFO] [Qnn ExecuTorch]: Running level=3 optimization.
I 00:00:00.205369 executorch:qnn_executor_runner.cpp:161] Method loaded.
I 00:00:00.205507 executorch:qnn_executor_runner.cpp:166] Inputs prepared.
I 00:00:00.205798 executorch:qnn_executor_runner.cpp:171] Number of inputs: 1
I 00:00:00.206130 executorch:qnn_executor_runner.cpp:232] Perform 0 inference for warming up
I 00:00:00.206200 executorch:qnn_executor_runner.cpp:238] Start inference (0)
[WARNING] [Qnn ExecuTorch]:  <W> sg_stubPtr is not null, skip loadRemoteSymbols


I 00:00:00.207318 executorch:qnn_executor_runner.cpp:256] 1 inference took 1.069000 ms, avg 1.069000 ms
I 00:00:00.207700 executorch:qnn_executor_runner.cpp:298] 1 inference took 1.069000 ms, avg 1.069000 ms
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> sg_stubPtr is not null, skip loadRemoteSymbols


[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> sg_stubPtr is not null, skip loadRemoteSymbols


[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
[WARNING] [Qnn ExecuTorch]:  <W> qnnOpPackageManager: hexagon unload op package function pointer is nullptr!

[WARNING] [Qnn ExecuTorch]:  <W> Function not called, PrepareLib isn't loaded!

/data/local/tmp/executorch/dummy_llama2_qnn...ile pulled. 0.1 MB/s (6144 bytes in 0.087s)
is_close? False
x86_golden tensor([[[ 0.2713,  0.5471, -0.3194,  ...,  0.1733, -0.7186, -1.1417],
         [ 0.2635,  0.0273, -0.1612,  ...,  1.2671, -1.4816, -0.6256],
         [ 0.1451, -0.5109,  0.0358,  ...,  0.4289, -0.3217, -1.4835]]],
       grad_fn=<UnsafeViewBackward0>)
device_out tensor([[[-0.3499, -0.3881,  0.5011,  ..., -0.2530,  0.3161, -0.0744],
         [ 0.4127, -0.4308, -0.5663,  ..., -0.3564,  0.0952,  0.7879],
         [-0.2407, -0.5039,  0.3697,  ..., -0.1345,  0.5565,  0.1253]]])

The text was updated successfully, but these errors were encountered:

cccclai · 2024-03-22T16:45:28Z

Yeah it's an issue we're aware of and investigating. Maybe let's put a warning for this option

salykova · 2024-03-22T17:03:25Z

@cccclai thanks for your response! I will leave the issue open until its resolved

cccclai added the partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm label Mar 22, 2024

JacobSzwejbka assigned JacobSzwejbka and cccclai and unassigned JacobSzwejbka Mar 22, 2024

salykova mentioned this issue Apr 22, 2024

[Draft] Qualcomm AI Engine Direct - Support kv_cached llama2 model #2966

Closed

salykova closed this as completed Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QNN: huge accuracy degradation in 16a4w quantization #2590

QNN: huge accuracy degradation in 16a4w quantization #2590

salykova commented Mar 22, 2024 •

edited

Loading

cccclai commented Mar 22, 2024

salykova commented Mar 22, 2024

QNN: huge accuracy degradation in 16a4w quantization #2590

QNN: huge accuracy degradation in 16a4w quantization #2590

Comments

salykova commented Mar 22, 2024 • edited Loading

cccclai commented Mar 22, 2024

salykova commented Mar 22, 2024

salykova commented Mar 22, 2024 •

edited

Loading