test: add qwen3 decode A3/A5 PTO cases#491
test: add qwen3 decode A3/A5 PTO cases#491HecreReed wants to merge 3 commits intohw-native-sys:mainfrom
Conversation
|
/run a3 qwen3_decode_incore_0 qwen3_decode_incore_1 qwen3_decode_incore_2 qwen3_decode_incore_3 qwen3_decode_incore_4 qwen3_decode_incore_5 qwen3_decode_incore_6 qwen3_decode_incore_7 qwen3_decode_incore_8 qwen3_decode_incore_9 qwen3_decode_incore_10 qwen3_decode_incore_11 qwen3_decode_incore_12 qwen3_decode_incore_13 qwen3_decode_incore_14 qwen3_decode_incore_15 qwen3_decode_incore_16 |
|
/run a5 qwen3_decode_incore_0 qwen3_decode_incore_1 qwen3_decode_incore_2 qwen3_decode_incore_3 qwen3_decode_incore_4 qwen3_decode_incore_5 qwen3_decode_incore_6 qwen3_decode_incore_7 qwen3_decode_incore_8 qwen3_decode_incore_9 qwen3_decode_incore_10 qwen3_decode_incore_11 qwen3_decode_incore_12 qwen3_decode_incore_13 qwen3_decode_incore_14 qwen3_decode_incore_15 qwen3_decode_incore_16 --pto-level=level3 |
There was a problem hiding this comment.
Code Review
This pull request introduces support for Qwen3 decode kernels for A3 and A5 architectures by adding the necessary test case generation logic and PTO kernel fragments. The changes include updates to the test case generation script, the addition of new sample directories, and modifications to the runop.sh script to handle these new targets. My feedback suggests moving the growing configuration dictionary in the generation script to an external file for better maintainability and verifying the glob pattern in the shell script to avoid potential redundant file processing.
| "qwen3_decode_incore_4": { | ||
| "v11": 1, | ||
| "v12": 0, | ||
| "v13": 1, | ||
| }, | ||
| "qwen3_decode_incore_5": { | ||
| "v4": 1, | ||
| "v5": 1, | ||
| "v6": 1, | ||
| "v7": 0, | ||
| }, | ||
| "qwen3_decode_incore_6": { | ||
| "v5": 1, | ||
| "v6": 1, | ||
| "v7": 0, | ||
| }, | ||
| "qwen3_decode_incore_7": { | ||
| "v4": 1, | ||
| "v5": 1, | ||
| "v6": 1, | ||
| "v7": 0, | ||
| }, | ||
| "qwen3_decode_incore_8": { | ||
| "v5": 2, | ||
| "v6": 1, | ||
| }, | ||
| "qwen3_decode_incore_9": { | ||
| "v4": 1, | ||
| "v5": 64, | ||
| }, | ||
| "qwen3_decode_incore_10": { | ||
| "v4": 1, | ||
| "v5": 64, | ||
| }, | ||
| "qwen3_decode_incore_12": { | ||
| "v4": 256, | ||
| }, | ||
| "qwen3_decode_incore_13": { | ||
| "v4": 256, | ||
| }, | ||
| "qwen3_decode_incore_15": { | ||
| "v4": 128, | ||
| }, | ||
| "qwen3_decode_incore_16": { | ||
| "v4": 1, | ||
| "v5": 128, | ||
| }, |
| fi | ||
|
|
||
| for asset in "${sample_dir}"/*_golden.py "${sample_dir}"/*_compare.py; do | ||
| for asset in "${sample_dir}"/*_golden.py "${sample_dir}"/*_compare.py "${sample_dir}"/*_golden_*.py; do |
Codex Review该评论由 review 机器人自动更新。
SummaryReview failed at stage Findings未生成结构化 findings,因为 review 过程提前失败。 Log Tail |
A3 板测成功
|
A5 板测成功
|
Summary
qwen3_decode_incore_*.ptofragments regenerated frompypto-lib/examples/models/qwen3/qwen3_32b_decode.pyrunop.shandgenerate_testcase.pyso these direct.ptosamples use the right default flags and golden assetsValidation
python3 -m py_compile test/npu_validation/scripts/generate_testcase.py test/samples/Qwen3DecodeA3/qwen3_decode_golden_lib.py test/samples/Qwen3DecodeA5/qwen3_decode_golden_lib.py test/samples/Qwen3DecodeA3/*_golden.py test/samples/Qwen3DecodeA5/*_golden.pybash -n test/samples/runop.shptoas -> generate_testcase -> custom golden