Add: SPMD paged attention with TPUSH/TPOP MIX kernel pipeline#596
Merged
ChaoWao merged 1 commit intohw-native-sys:mainfrom Apr 20, 2026
Merged
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a Paged Attention MIX kernel using AIC and AIV cores with TPUSH/TPOP communication, including orchestration logic and integration tests. The review identifies critical risks of 32-bit integer overflow during pointer arithmetic in the QK and PV steps when handling large KV caches. Additionally, it is recommended to replace the inefficient scalar loop used for zeroing the destination buffer with a vector-based approach to improve performance on the AIV core.
ad15be6 to
ac14bd8
Compare
Add a new scene test for paged attention using TPUSH/TPOP inter-core communication between AIC (Cube) and AIV (Vector) cores. The kernel implements a cooperative FlashAttention-style pipeline: - AIC: QK matmul -> TPUSH(sij) -> TPOP(pij) -> PV matmul -> TPUSH(oi) - AIV: TPOP(sij) -> online softmax -> TPUSH(pij) -> TPOP(oi) -> update Uses QK-first / SF-first software pipelining to overlap Cube matmuls with Vector softmax across iterations. Hardware block_num is fixed at 24 with stride-loop work distribution over batch * q_loop logical blocks. Supports q_tile=16 and q_tile=64 static dispatch paths.
ac14bd8 to
09840d3
Compare
ChaoWao
approved these changes
Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
batch * q_looplogical blocks