Refactor: merge paged_attention examples/st into unified test_*.py#556
Merged
ChaoWao merged 1 commit intohw-native-sys:mainfrom Apr 15, 2026
Merged
Refactor: merge paged_attention examples/st into unified test_*.py#556ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoWao merged 1 commit intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors paged attention kernels to support production-scale tile configurations via runtime dispatch and optimizes performance through improved pipeline overlap and zero-copy UB reshapes using TRESHAPE. It also introduces profiling in the orchestration layer and updates test suites. Review feedback highlights a critical issue in the dispatch logic where small-scale configurations are incorrectly handled when the tile size is 16. Furthermore, several newly added comments in the kernel entry points are misleading as they reference out-of-bounds arguments.
examples/a2a3/tensormap_and_ringbuffer/paged_attention/kernels/aic/aic_pv_matmul.cpp
Outdated
Show resolved
Hide resolved
examples/a2a3/tensormap_and_ringbuffer/paged_attention/kernels/aic/aic_pv_matmul.cpp
Outdated
Show resolved
Hide resolved
examples/a2a3/tensormap_and_ringbuffer/paged_attention/kernels/aic/aic_qk_matmul.cpp
Outdated
Show resolved
Hide resolved
examples/a2a3/tensormap_and_ringbuffer/paged_attention/kernels/aiv/aiv_online_update.cpp
Outdated
Show resolved
Hide resolved
ecac927 to
8f5bc46
Compare
…and profiling - Add runtime dispatch for 16x128 and 64x128 tile configs in AIC/AIV kernels (qk_matmul, pv_matmul, softmax_prepare, online_update) - Add ENABLE_PROFILING conditional compilation and platform-isolated cycle count macros in orchestration - Use TRESHAPE for zero-copy UB scalar layout conversion in online_update, eliminating GM round-trip - Add pipeline overlap (separate MTE2 events) in qk/pv matmul kernels - Add production-scale cases (Case1/2/3) to paged_attention and batch_paged_attention tests - Add multi-round cases to multi_round_paged_attention test - Fix benchmark_rounds.sh case names for bgemm and batch_paged_attention
ChaoWao
approved these changes
Apr 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Update: tensormap_and_ringbuffer PA kernels with multi-tile dispatch and profiling
Add runtime dispatch for 16x128 and 64x128 tile configs in AIC/AIV
kernels (qk_matmul, pv_matmul, softmax_prepare, online_update)
Add ENABLE_PROFILING conditional compilation and platform-isolated
cycle count macros in orchestration
Use TRESHAPE for zero-copy UB scalar layout conversion in
online_update, eliminating GM round-trip
Add pipeline overlap (separate MTE2 events) in qk/pv matmul kernels
Add production-scale cases (Case1/2/3) to paged_attention and
batch_paged_attention tests
Add multi-round cases to multi_round_paged_attention test
Fix benchmark_rounds.sh case names for bgemm and batch_paged_attention
tensormap_and_ringbuffer 用例清单
全量 Case 表
examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/examples/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/tests/st/统计汇总
examples/a2a3/tensormap_and_ringbuffer/
tests/st/a2a3/tensormap_and_ringbuffer/
总计
关键差异备注
迁移后变更
变更明细
examples/.../bgemm/test_bgemm.pyexamples/.../vector_example/test_vector_example.pytests/st/.../mixed_example/test_mixed_example.pytests/st/.../mixed_example/test_mixed_example.pyexamples/.../paged_attention/test_paged_attention.pyexamples/.../paged_attention/test_paged_attention.pyexamples/.../paged_attention/test_paged_attention.pyexamples/.../paged_attention/test_paged_attention.pyexamples/.../batch_paged_attention/test_batch_paged_attention.pyexamples/.../batch_paged_attention/test_batch_paged_attention.pyexamples/.../batch_paged_attention/test_batch_paged_attention.pyexamples/.../batch_paged_attention/test_batch_paged_attention.pyexamples/.../batch_paged_attention/test_batch_paged_attention.pytests/st/.../multi_round_paged_attention/test_multi_round_paged_attention.pytests/st/.../multi_round_paged_attention/test_multi_round_paged_attention.pytests/st/.../multi_round_paged_attention/test_multi_round_paged_attention.pytests/st/.../multi_round_paged_attention/test_multi_round_paged_attention.pyexamples/.../paged_attention_ringbuffer/test_paged_attention_ringbuffer.pyexamples/.../scalar_data_test/test_scalar_data.pytests/st/.../spmd_basic/test_spmd_basic.pytests/st/.../spmd_sync_start/test_spmd_sync_start.pytests/st/.../spmd_sync_start_stress/test_spmd_sync_start_stress.pytests/st/.../spmd_sync_start_edge/test_spmd_sync_start_edge.pytests/st/.../spmd_sync_start_aiv/test_spmd_sync_start_aiv.pytests/st/.../spmd_multiblock_mix/test_spmd_multiblock_mix.pytests/st/.../spmd_multiblock_aiv/test_spmd_multiblock_aiv.pytests/st/.../spmd_starvation/test_spmd_starvation.pyexamples/.../paged_attention/test_paged_attention.pyexamples/.../paged_attention/test_paged_attention.pyexamples/.../paged_attention/test_paged_attention.pytests/st/.../paged_attention_unroll/test_paged_attention_unroll.pytests/st/.../paged_attention_unroll/test_paged_attention_unroll.pytests/st/.../paged_attention_unroll/test_paged_attention_unroll.pytests/st/.../paged_attention_unroll_4dims/test_paged_attention_unroll_4dims.pytests/st/.../paged_attention_unroll_4dims/test_paged_attention_unroll_4dims.pytests/st/.../paged_attention_unroll_4dims/test_paged_attention_unroll_4dims.pyexamples/.../batch_paged_attention/test_batch_paged_attention.pyexamples/.../batch_paged_attention/test_batch_paged_attention.pyexamples/.../batch_paged_attention/test_batch_paged_attention.pytests/st/.../benchmark_bgemm/test_benchmark_bgemm.pytests/st/.../benchmark_bgemm/test_benchmark_bgemm.pytests/st/.../benchmark_bgemm/test_benchmark_bgemm.pytests/st/.../benchmark_bgemm/test_benchmark_bgemm.pytests/st/.../benchmark_bgemm/test_benchmark_bgemm.pytests/st/.../alternating_matmul_add/test_alternating_matmul_add.pytests/st/.../alternating_matmul_add/test_alternating_matmul_add.pytests/st/.../alternating_matmul_add/test_alternating_matmul_add.pytests/st/.../test_explicit_fatal.pytests/st/.../test_l3_dependency.pytests/st/.../test_l3_group.py迁移后统计
迁移变更小结
golden.py+kernel_config.py删除,统一为@scene_test类_PA_KERNELS路径保持指向examples/.../paged_attention/kernels(与迁移前一致)