Qualcomm AI Engine Direct - Improve GA Qwen 2.5 #14047
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Qwen 2.5 0.5B: Quant Config: 16a8w->16a4w_block; PPL 12.05 -> 13.81; Token Rate: 131 -> 164
Qwen 2.5 1.5B: Quant Config: 16a8w->16a4w_block; PPL 9.33 -> 9.83; Token Rate: 34 -> 50
Commands
QWEN2.5 0.5B
Default example using hybrid mode
QWEN2.5 1.5B
Default example using kv mode
Test Results:
Qwen 2.5 0.5B
prompt = "I would like to learn python, could you teach me with one simple program?"
prompt = "请你替我产生一段简单的C++程式码,并从中解释物件导向的概念"
Qwen 2.5 1.5B
prompt = "I would like to learn python, could you teach me with one simple program?"
prompt = "请你替我产生一段简单的C++程式码,并从中解释物件导向的概念"