Add: SPMD paged attention with TPUSH/TPOP MIX kernel pipeline by chenshengxin2026 · Pull Request #596 · hw-native-sys/simpler

chenshengxin2026 · 2026-04-20T04:00:36Z

Summary

Add scene test for SPMD paged attention using TPUSH/TPOP inter-core communication between AIC (Cube) and AIV (Vector) cores
Implements cooperative FlashAttention-style pipeline: AIC handles QK/PV matmuls, AIV handles online softmax and output accumulation
Uses QK-first / SF-first software pipelining to overlap Cube matmuls with Vector softmax across KV block iterations
Hardware block_num fixed at 24 with stride-loop distribution over batch * q_loop logical blocks
Supports q_tile=16 and q_tile=64 static dispatch, adapting to num_heads at runtime

gemini-code-assist

Code Review

This pull request implements a Paged Attention MIX kernel using AIC and AIV cores with TPUSH/TPOP communication, including orchestration logic and integration tests. The review identifies critical risks of 32-bit integer overflow during pointer arithmetic in the QK and PV steps when handling large KV caches. Additionally, it is recommended to replace the inefficient scalar loop used for zeroing the destination buffer with a vector-based approach to improve performance on the AIV core.

Add a new scene test for paged attention using TPUSH/TPOP inter-core communication between AIC (Cube) and AIV (Vector) cores. The kernel implements a cooperative FlashAttention-style pipeline: - AIC: QK matmul -> TPUSH(sij) -> TPOP(pij) -> PV matmul -> TPUSH(oi) - AIV: TPOP(sij) -> online softmax -> TPUSH(pij) -> TPOP(oi) -> update Uses QK-first / SF-first software pipelining to overlap Cube matmuls with Vector softmax across iterations. Hardware block_num is fixed at 24 with stride-loop work distribution over batch * q_loop logical blocks. Supports q_tile=16 and q_tile=64 static dispatch paths.

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

chenshengxin2026 force-pushed the add/spmd-paged-attention-tpush branch 4 times, most recently from ad15be6 to ac14bd8 Compare April 20, 2026 06:48

chenshengxin2026 force-pushed the add/spmd-paged-attention-tpush branch from ac14bd8 to 09840d3 Compare April 20, 2026 06:52

ChaoWao approved these changes Apr 20, 2026

View reviewed changes

ChaoWao merged commit 6c87797 into hw-native-sys:main Apr 20, 2026
14 checks passed

ChaoWao mentioned this pull request Apr 21, 2026

[Feature] Develop PagedAttention SPMD Example (Aligned with AscendC paged_attention_antiquantkv) #487

Closed

9 tasks

learning-chip mentioned this pull request Apr 22, 2026

High-perf SPMD paged attention (GQA) matching CANN IFA performance #655

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add: SPMD paged attention with TPUSH/TPOP MIX kernel pipeline#596

Add: SPMD paged attention with TPUSH/TPOP MIX kernel pipeline#596
ChaoWao merged 1 commit intohw-native-sys:mainfrom
chenshengxin2026:add/spmd-paged-attention-tpush

chenshengxin2026 commented Apr 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chenshengxin2026 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenshengxin2026 commented Apr 20, 2026 •

edited

Loading