Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Refactors tensor creation to use pre-calculated offsets instead of local_tile operations, ensuring correct memory addressing for both block table and non-block table scenarios.

Updates pointer advancement logic to maintain proper synchronization between K, ZOH, and active mask tensors during block iteration.

Adds debug output to help diagnose memory addressing issues during development.

Refactors tensor creation to use pre-calculated offsets instead of local_tile operations, ensuring correct memory addressing for both block table and non-block table scenarios.

Updates pointer advancement logic to maintain proper synchronization between K, ZOH, and active mask tensors during block iteration.

Adds debug output to help diagnose memory addressing issues during development.
@LoserCheems LoserCheems requested review from Evanwu1125, SNHuan, Copilot and wubingheng111 and removed request for Copilot July 1, 2025 13:43
@LoserCheems LoserCheems added the bug Something isn't working label Jul 1, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes tensor addressing for ZOH and active mask in the splitkv kernel, refactoring tensor creation to use pre-calculated offsets and updating pointer advancement logic.

  • Introduces new offset calculations (col_offset_zoh and col_offset_am) to correctly compute tensor pointers.
  • Replaces local_tile calls with direct make_tensor constructs and updates pointer arithmetic for ZOH and active mask tensors.
  • Adds debug print statements for monitoring tensor pointer values during execution.

make_coord(m_block, _)
); // (kBlockM, kBlockN, nblocksN)
make_stride(params.active_mask_row_stride, params.active_mask_col_stride)
);
Copy link

Copilot AI Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider wrapping the debug print statements with a compile-time flag (e.g., #ifdef DEBUG) to avoid unintentional output in production builds.

Suggested change
);
);
#ifdef DEBUG

Copilot uses AI. Check for mistakes.
Comment on lines 1070 to +1076
binfo.actual_seqlen_k - n_block * kBlockN);
if (tidx == 0 && bidh == 0 && bidb == 0) {
printf("Before copy_ZOH: n_block=%d, seqlen_q_offset=%d, seqlen_k_offset=%d\n",
n_block,
binfo.actual_seqlen_q - m_block * kBlockM,
binfo.actual_seqlen_k - n_block * kBlockN);
}
Copy link

Copilot AI Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, consider guarding these debug prints with a compile-time condition to prevent extra overhead in production runs.

Suggested change
binfo.actual_seqlen_k - n_block * kBlockN);
if (tidx == 0 && bidh == 0 && bidb == 0) {
printf("Before copy_ZOH: n_block=%d, seqlen_q_offset=%d, seqlen_k_offset=%d\n",
n_block,
binfo.actual_seqlen_q - m_block * kBlockM,
binfo.actual_seqlen_k - n_block * kBlockN);
}
binfo.actual_seqlen_k - n_block * kBlockN);
#ifdef DEBUG
if (tidx == 0 && bidh == 0 && bidb == 0) {
printf("Before copy_ZOH: n_block=%d, seqlen_q_offset=%d, seqlen_k_offset=%d\n",
n_block,
binfo.actual_seqlen_q - m_block * kBlockM,
binfo.actual_seqlen_k - n_block * kBlockN);
}
#endif

Copilot uses AI. Check for mistakes.
@LoserCheems LoserCheems merged commit d4ab537 into main Jul 1, 2025
@LoserCheems LoserCheems deleted the Fix-bug branch October 27, 2025 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants