Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

This pull request refines the handling of causal masks and predicates in the compute_attn_1rowblock function within the flash_attention_fwd_kernel.h file. The changes improve code clarity, maintainability, and correctness by introducing more descriptive comments, ensuring consistent handling of optional causal masks, and streamlining predicate allocation.

Improvements to causal mask handling:

  • Simplified the creation of causal_mask_smem_ptr by reformatting the ternary operation for better readability. ([csrc/src/flash_attention_fwd_kernel.hL238-R240](https://github.com/flash-algo/flash-sparse-attention/pull/8/files#diff-f28f785166139b70460fd44ef7644554c0b332c25122b50ef571322572e86237L238-R240))
  • Updated the identity tensor for causal masks (cCausalMask) to use dummy 1×1 dimensions when no causal mask is provided, ensuring consistent tensor allocation. ([csrc/src/flash_attention_fwd_kernel.hL334-R356](https://github.com/flash-algo/flash-sparse-attention/pull/8/files#diff-f28f785166139b70460fd44ef7644554c0b332c25122b50ef571322572e86237L334-R356))

Predicate allocation and comments:

  • Added detailed comments to describe the purpose of identity tensors and predicates, such as cQ, cKV, and cZeroHold, improving code readability. ([csrc/src/flash_attention_fwd_kernel.hL334-R356](https://github.com/flash-algo/flash-sparse-attention/pull/8/files#diff-f28f785166139b70460fd44ef7644554c0b332c25122b50ef571322572e86237L334-R356))
  • Ensured predicates like tCausalMaskpCausalMask are always allocated, even when not used, to simplify the logic and avoid conditional tensor creation. ([csrc/src/flash_attention_fwd_kernel.hL334-R356](https://github.com/flash-algo/flash-sparse-attention/pull/8/files#diff-f28f785166139b70460fd44ef7644554c0b332c25122b50ef571322572e86237L334-R356))

@LoserCheems LoserCheems merged commit 5c3c4f7 into main May 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants