[qnn] How to lower the generation rate to reduce the bandwith/cpu usage?

Hi Teams,

if i depoly the model successfully, when decoding, the NPU will take amost all bandwith, like 30g/s, Which is unacceptable if other apps running on the device.

It seems we can use `htp_backend_ext_config.json`, to set the `perf_profile` to `power_saver` (try it but not work with qnn-2.39) in the QNN notebook:
```json
{ 
  "graphs": [{
    "num_cores": 1,
    "O":3.0,
    "vtcm_mb":16
    }],
    "devices": [{
            "device_id": 0,
            "core_type": 0,
            "core_id":[0],
            "dsp_arch": "v68",
            "soc_id":39,
            "soc_model":39,    
            "cores":[{
        "perf_profile": "burst"
      }]            
    }]
}
```

so we hope to know how to limit the decode rate in executorch with qnn?

looking forward to your help :)

cc @cccclai @cbilgin @abhinaykukkadapu @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[qnn] How to lower the generation rate to reduce the bandwith/cpu usage? #18806

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[qnn] How to lower the generation rate to reduce the bandwith/cpu usage? #18806

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions