Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

This pull request introduces a new BlockInfo struct in the csrc/src/block_info.h file to manage sequence length and padding information for query and key sequences. The struct is designed for use in GPU kernels, with support for variable-length sequences and efficient offset calculations.

Key Changes:

New BlockInfo Struct:

  • Added a templated BlockInfo struct with a boolean template parameter Varlen to handle variable-length sequences. It includes logic for initializing sequence-related parameters like cumulative sequence lengths, actual sequence lengths, and padding. ([csrc/src/block_info.hR1-R49](https://github.com/flash-algo/flash-sparse-attention/pull/4/files#diff-394b0098afb828601be7aad41de4a4125220493eb2b3bf4e768797a39757beaeR1-R49))

Offset Calculation Methods:

  • Introduced q_offset and k_offset methods to compute memory offsets for query and key sequences based on batch and row strides. These methods are optimized for GPU execution with __forceinline__ and __device__ qualifiers. ([csrc/src/block_info.hR1-R49](https://github.com/flash-algo/flash-sparse-attention/pull/4/files#diff-394b0098afb828601be7aad41de4a4125220493eb2b3bf4e768797a39757beaeR1-R49))

Namespace and Header Updates:

  • Wrapped the new struct in the FLASH_NAMESPACE namespace for modularity and included necessary headers like namespace_config.h. ([csrc/src/block_info.hR1-R49](https://github.com/flash-algo/flash-sparse-attention/pull/4/files#diff-394b0098afb828601be7aad41de4a4125220493eb2b3bf4e768797a39757beaeR1-R49))

@LoserCheems LoserCheems merged commit fdecf09 into main May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants