Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QNN: huge accuracy degradation in 16a4w quantization #2590

Closed
salykova opened this issue Mar 22, 2024 · 2 comments
Closed

QNN: huge accuracy degradation in 16a4w quantization #2590

salykova opened this issue Mar 22, 2024 · 2 comments
Assignees
Labels
partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm

Comments

@salykova
Copy link
Contributor

salykova commented Mar 22, 2024

Dear @cccclai, @shewu-quic, @chunit-quic

Im trying to quantize the dummy llama model and run on my Qualcomm device

python ./examples/qualcomm/scripts/dummy_llama2.py --model SM8550 --device *** -b build_android --ptq 16a4w

However the output of the quantized model is far away from where it should be

/data/local/tmp/executorch/dummy_llama2_qnn...ile pulled. 0.1 MB/s (6144 bytes in 0.087s)
is_close? False
x86_golden tensor([[[ 0.2713,  0.5471, -0.3194,  ...,  0.1733, -0.7186, -1.1417],
         [ 0.2635,  0.0273, -0.1612,  ...,  1.2671, -1.4816, -0.6256],
         [ 0.1451, -0.5109,  0.0358,  ...,  0.4289, -0.3217, -1.4835]]],
       grad_fn=<UnsafeViewBackward0>)
device_out tensor([[[-0.3499, -0.3881,  0.5011,  ..., -0.2530,  0.3161, -0.0744],
         [ 0.4127, -0.4308, -0.5663,  ..., -0.3564,  0.0952,  0.7879],
         [-0.2407, -0.5039,  0.3697,  ..., -0.1345,  0.5565,  0.1253]]])

Is it a known issue or did I do something wrong? Are there guidelines or a strategy how to quantize these models in 16a4w and keep accuracy on reasonable level? I would be grateful if you could give some insight on it. Btw, non-quantized model outputs accurate results.

Full log:

opcode         name                      target                       args                           kwargs
-------------  ------------------------  ---------------------------  -----------------------------  --------
placeholder    arg55_1                   arg55_1                      ()                             {}
get_attr       lowered_module_0          lowered_module_0             ()                             {}
call_function  executorch_call_delegate  executorch_call_delegate     (lowered_module_0, arg55_1)    {}
call_function  getitem                   <built-in function getitem>  (executorch_call_delegate, 0)  {}
output         output                    output                       ((getitem,),)                  {}
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
./dummy_llama2/dummy_llama2_qnn.pte: 1 file pushed. 12.8 MB/s (611280 bytes in 0.046s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/aar...pushed. 33.4 MB/s (1545776 bytes in 0.044s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/hex...pushed. 37.5 MB/s (7360784 bytes in 0.187s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/aar... pushed. 27.7 MB/s (290504 bytes in 0.010s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/aar...ushed. 38.1 MB/s (26628144 bytes in 0.666s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/aar... pushed. 14.1 MB/s (229024 bytes in 0.015s)
build_android/examples/qualcomm/qnn_executo...shed. 38.0 MB/s (385895152 bytes in 9.695s)
build_android/backends/qualcomm/libqnn_exec...pushed. 36.9 MB/s (8854840 bytes in 0.229s)
/home/anzen/Projects/executorch/dummy_llama... file pushed. 0.0 MB/s (14 bytes in 0.002s)
/home/anzen/Projects/executorch/dummy_llama... file pushed. 0.0 MB/s (12 bytes in 0.002s)
I 00:00:00.003985 executorch:qnn_executor_runner.cpp:81] Model file dummy_llama2_qnn.pte is loaded.
I 00:00:00.004075 executorch:qnn_executor_runner.cpp:90] Using method forward
I 00:00:00.004103 executorch:qnn_executor_runner.cpp:138] Setting up planned buffer 0, size 6160.
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 2
[WARNING] [Qnn ExecuTorch]:  <W> Initializing HtpProvider

[WARNING] [Qnn ExecuTorch]:  <W> Function not called, PrepareLib isn't loaded!

[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[WARNING] [Qnn ExecuTorch]:  <W> sg_stubPtr is not null, skip loadRemoteSymbols


[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> sg_stubPtr is not null, skip loadRemoteSymbols


[WARNING] [Qnn ExecuTorch]:  <W> Function not called, PrepareLib isn't loaded!

[WARNING] [Qnn ExecuTorch]:  <W> sg_stubPtr is not null, skip loadRemoteSymbols


[WARNING] [Qnn ExecuTorch]:  <W> Function not called, PrepareLib isn't loaded!

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[INFO] [Qnn ExecuTorch]: Running level=3 optimization.
I 00:00:00.205369 executorch:qnn_executor_runner.cpp:161] Method loaded.
I 00:00:00.205507 executorch:qnn_executor_runner.cpp:166] Inputs prepared.
I 00:00:00.205798 executorch:qnn_executor_runner.cpp:171] Number of inputs: 1
I 00:00:00.206130 executorch:qnn_executor_runner.cpp:232] Perform 0 inference for warming up
I 00:00:00.206200 executorch:qnn_executor_runner.cpp:238] Start inference (0)
[WARNING] [Qnn ExecuTorch]:  <W> sg_stubPtr is not null, skip loadRemoteSymbols


I 00:00:00.207318 executorch:qnn_executor_runner.cpp:256] 1 inference took 1.069000 ms, avg 1.069000 ms
I 00:00:00.207700 executorch:qnn_executor_runner.cpp:298] 1 inference took 1.069000 ms, avg 1.069000 ms
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> sg_stubPtr is not null, skip loadRemoteSymbols


[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[WARNING] [Qnn ExecuTorch]:  <W> sg_stubPtr is not null, skip loadRemoteSymbols


[WARNING] [Qnn ExecuTorch]:  <W> This META does not have Alloc2 Support

[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
[WARNING] [Qnn ExecuTorch]:  <W> qnnOpPackageManager: hexagon unload op package function pointer is nullptr!

[WARNING] [Qnn ExecuTorch]:  <W> Function not called, PrepareLib isn't loaded!

/data/local/tmp/executorch/dummy_llama2_qnn...ile pulled. 0.1 MB/s (6144 bytes in 0.087s)
is_close? False
x86_golden tensor([[[ 0.2713,  0.5471, -0.3194,  ...,  0.1733, -0.7186, -1.1417],
         [ 0.2635,  0.0273, -0.1612,  ...,  1.2671, -1.4816, -0.6256],
         [ 0.1451, -0.5109,  0.0358,  ...,  0.4289, -0.3217, -1.4835]]],
       grad_fn=<UnsafeViewBackward0>)
device_out tensor([[[-0.3499, -0.3881,  0.5011,  ..., -0.2530,  0.3161, -0.0744],
         [ 0.4127, -0.4308, -0.5663,  ..., -0.3564,  0.0952,  0.7879],
         [-0.2407, -0.5039,  0.3697,  ..., -0.1345,  0.5565,  0.1253]]])
@cccclai cccclai added the partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm label Mar 22, 2024
@cccclai
Copy link
Contributor

cccclai commented Mar 22, 2024

Yeah it's an issue we're aware of and investigating. Maybe let's put a warning for this option

@salykova
Copy link
Contributor Author

@cccclai thanks for your response! I will leave the issue open until its resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm
Projects
None yet
Development

No branches or pull requests

3 participants