Skip to content

[qnn] How to lower the generation rate to reduce the bandwith/cpu usage? #18806

@ecccccsgo

Description

@ecccccsgo

Hi Teams,

if i depoly the model successfully, when decoding, the NPU will take amost all bandwith, like 30g/s, Which is unacceptable if other apps running on the device.

It seems we can use htp_backend_ext_config.json, to set the perf_profile to power_saver (try it but not work with qnn-2.39) in the QNN notebook:

{ 
  "graphs": [{
    "num_cores": 1,
    "O":3.0,
    "vtcm_mb":16
    }],
    "devices": [{
            "device_id": 0,
            "core_type": 0,
            "core_id":[0],
            "dsp_arch": "v68",
            "soc_id":39,
            "soc_model":39,    
            "cores":[{
        "perf_profile": "burst"
      }]            
    }]
}

so we hope to know how to limit the decode rate in executorch with qnn?

looking forward to your help :)

cc @cccclai @cbilgin @abhinaykukkadapu @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic

Metadata

Metadata

Labels

module: qnnIssues related to Qualcomm's QNN delegate and code under backends/qualcomm/

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions