Performace degradation Qwen0.6 on Qnn Backend.

### 🚀 The feature, motivation and pitch

I am face a great performance degradation problems when running this model on qualcomm SM8850.

currently,  I got the results as follows. But in early try, the prefill speed could be about 4000 and decode speed 100+. I am not sure what happened? Are there any suggestions?
```
I 00:00:05.299159 executorch:prompt_processor.cpp:267] Prompt Processor: total 94 prompt tokens (AR-128 * 1 iters)
I 00:00:05.378665 executorch:runner.cpp:462] RSS after prompt prefill: 770.968750 MiB (0 if unsupported)
I 00:00:10.963822 executorch:token_generator.cpp:356] Warning: Generation stopped at seq_len limit (512) without reaching EOS token. Response may be incomplete.
I 00:00:10.964286 executorch:token_generator.cpp:370] - seq_len (512) is less than compiled max_context_len (1024). Consider increasing --seq_len (up to 1024).
I 00:00:10.964308 executorch:runner.cpp:477] RSS after finishing text generation: 770.968750 MiB (0 if unsupported)
I 00:00:10.964396 executorch:stats.h:161] 	Prompt Tokens: 94    Generated Tokens: 417
I 00:00:10.964407 executorch:stats.h:167] 	Model Load Time:		5.266000 (seconds)
I 00:00:10.964419 executorch:stats.h:177] 	Total inference time:		5.695000 (seconds)		 Rate: 	73.222125 (tokens/second)
I 00:00:10.964432 executorch:stats.h:185] 		Prompt evaluation:	0.109000 (seconds)		 Rate: 	862.385321 (tokens/second)
I 00:00:10.964445 executorch:stats.h:196] 		Generated 417 tokens:	5.586000 (seconds)		 Rate: 	74.650913 (tokens/second)
I 00:00:10.964457 executorch:stats.h:204] 	Time to first generated token:	0.109000 (seconds)
I 00:00:10.964623 executorch:stats.h:211] 	Sampling time over 511 tokens:	0.717000 (seconds)
```
The export command is as follows.

```
python -m examples.qualcomm.oss_scripts.llama.llama \
    -b build-android \
    -m SM8850 \
    --temperature 0 \
    --model_mode hybrid \
    --prefill_ar_len 128 \
    --max_seq_len 1024 \
    --decoder_model qwen3-0_6b \
    --prompt "Hello" \
    --compile_only \
    -a /tmp/qwen3-models/qwen3_0_6B
```

### Alternatives

_No response_

### Additional context

_No response_

### RFC (Optional)

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performace degradation Qwen0.6 on Qnn Backend. #18933

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performace degradation Qwen0.6 on Qnn Backend. #18933

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions