Hi Teams,
if i depoly the model successfully, when decoding, the NPU will take amost all bandwith, like 30g/s, Which is unacceptable if other apps running on the device.
It seems we can use htp_backend_ext_config.json, to set the perf_profile to power_saver (try it but not work with qnn-2.39) in the QNN notebook:
{
"graphs": [{
"num_cores": 1,
"O":3.0,
"vtcm_mb":16
}],
"devices": [{
"device_id": 0,
"core_type": 0,
"core_id":[0],
"dsp_arch": "v68",
"soc_id":39,
"soc_model":39,
"cores":[{
"perf_profile": "burst"
}]
}]
}
so we hope to know how to limit the decode rate in executorch with qnn?
looking forward to your help :)
cc @cccclai @cbilgin @abhinaykukkadapu @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic
Hi Teams,
if i depoly the model successfully, when decoding, the NPU will take amost all bandwith, like 30g/s, Which is unacceptable if other apps running on the device.
It seems we can use
htp_backend_ext_config.json, to set theperf_profiletopower_saver(try it but not work with qnn-2.39) in the QNN notebook:{ "graphs": [{ "num_cores": 1, "O":3.0, "vtcm_mb":16 }], "devices": [{ "device_id": 0, "core_type": 0, "core_id":[0], "dsp_arch": "v68", "soc_id":39, "soc_model":39, "cores":[{ "perf_profile": "burst" }] }] }so we hope to know how to limit the decode rate in executorch with qnn?
looking forward to your help :)
cc @cccclai @cbilgin @abhinaykukkadapu @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic