You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is it a known issue or did I do something wrong? Are there guidelines or a strategy how to quantize these models in 16a4w and keep accuracy on reasonable level? I would be grateful if you could give some insight on it. Btw, non-quantized model outputs accurate results.
Full log:
opcode name target args kwargs
------------- ------------------------ --------------------------- ----------------------------- --------
placeholder arg55_1 arg55_1 () {}
get_attr lowered_module_0 lowered_module_0 () {}
call_function executorch_call_delegate executorch_call_delegate (lowered_module_0, arg55_1) {}
call_function getitem <built-in function getitem> (executorch_call_delegate, 0) {}
output output output ((getitem,),) {}
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
./dummy_llama2/dummy_llama2_qnn.pte: 1 file pushed. 12.8 MB/s (611280 bytes in 0.046s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/aar...pushed. 33.4 MB/s (1545776 bytes in 0.044s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/hex...pushed. 37.5 MB/s (7360784 bytes in 0.187s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/aar... pushed. 27.7 MB/s (290504 bytes in 0.010s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/aar...ushed. 38.1 MB/s (26628144 bytes in 0.666s)
/opt/qcom/aistack/qnn/2.19.4.240226/lib/aar... pushed. 14.1 MB/s (229024 bytes in 0.015s)
build_android/examples/qualcomm/qnn_executo...shed. 38.0 MB/s (385895152 bytes in 9.695s)
build_android/backends/qualcomm/libqnn_exec...pushed. 36.9 MB/s (8854840 bytes in 0.229s)
/home/anzen/Projects/executorch/dummy_llama... file pushed. 0.0 MB/s (14 bytes in 0.002s)
/home/anzen/Projects/executorch/dummy_llama... file pushed. 0.0 MB/s (12 bytes in 0.002s)
I 00:00:00.003985 executorch:qnn_executor_runner.cpp:81] Model file dummy_llama2_qnn.pte is loaded.
I 00:00:00.004075 executorch:qnn_executor_runner.cpp:90] Using method forward
I 00:00:00.004103 executorch:qnn_executor_runner.cpp:138] Setting up planned buffer 0, size 6160.
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 2
[WARNING] [Qnn ExecuTorch]: <W> Initializing HtpProvider
[WARNING] [Qnn ExecuTorch]: <W> Function not called, PrepareLib isn't loaded!
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in RESTORE MODE.
[WARNING] [Qnn ExecuTorch]: <W> sg_stubPtr is not null, skip loadRemoteSymbols
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> sg_stubPtr is not null, skip loadRemoteSymbols
[WARNING] [Qnn ExecuTorch]: <W> Function not called, PrepareLib isn't loaded!
[WARNING] [Qnn ExecuTorch]: <W> sg_stubPtr is not null, skip loadRemoteSymbols
[WARNING] [Qnn ExecuTorch]: <W> Function not called, PrepareLib isn't loaded!
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[INFO] [Qnn ExecuTorch]: Running level=3 optimization.
I 00:00:00.205369 executorch:qnn_executor_runner.cpp:161] Method loaded.
I 00:00:00.205507 executorch:qnn_executor_runner.cpp:166] Inputs prepared.
I 00:00:00.205798 executorch:qnn_executor_runner.cpp:171] Number of inputs: 1
I 00:00:00.206130 executorch:qnn_executor_runner.cpp:232] Perform 0 inference for warming up
I 00:00:00.206200 executorch:qnn_executor_runner.cpp:238] Start inference (0)
[WARNING] [Qnn ExecuTorch]: <W> sg_stubPtr is not null, skip loadRemoteSymbols
I 00:00:00.207318 executorch:qnn_executor_runner.cpp:256] 1 inference took 1.069000 ms, avg 1.069000 ms
I 00:00:00.207700 executorch:qnn_executor_runner.cpp:298] 1 inference took 1.069000 ms, avg 1.069000 ms
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> sg_stubPtr is not null, skip loadRemoteSymbols
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[WARNING] [Qnn ExecuTorch]: <W> sg_stubPtr is not null, skip loadRemoteSymbols
[WARNING] [Qnn ExecuTorch]: <W> This META does not have Alloc2 Support
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
[WARNING] [Qnn ExecuTorch]: <W> qnnOpPackageManager: hexagon unload op package function pointer is nullptr!
[WARNING] [Qnn ExecuTorch]: <W> Function not called, PrepareLib isn't loaded!
/data/local/tmp/executorch/dummy_llama2_qnn...ile pulled. 0.1 MB/s (6144 bytes in 0.087s)
is_close? False
x86_golden tensor([[[ 0.2713, 0.5471, -0.3194, ..., 0.1733, -0.7186, -1.1417],
[ 0.2635, 0.0273, -0.1612, ..., 1.2671, -1.4816, -0.6256],
[ 0.1451, -0.5109, 0.0358, ..., 0.4289, -0.3217, -1.4835]]],
grad_fn=<UnsafeViewBackward0>)
device_out tensor([[[-0.3499, -0.3881, 0.5011, ..., -0.2530, 0.3161, -0.0744],
[ 0.4127, -0.4308, -0.5663, ..., -0.3564, 0.0952, 0.7879],
[-0.2407, -0.5039, 0.3697, ..., -0.1345, 0.5565, 0.1253]]])
The text was updated successfully, but these errors were encountered:
Dear @cccclai, @shewu-quic, @chunit-quic
Im trying to quantize the dummy llama model and run on my Qualcomm device
However the output of the quantized model is far away from where it should be
Is it a known issue or did I do something wrong? Are there guidelines or a strategy how to quantize these models in 16a4w and keep accuracy on reasonable level? I would be grateful if you could give some insight on it. Btw, non-quantized model outputs accurate results.
Full log:
The text was updated successfully, but these errors were encountered: