-
Notifications
You must be signed in to change notification settings - Fork 39
Simplify attention mask and bias parameter naming #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Removes redundant "attn_" prefixes from mask and bias parameter names to improve code readability and consistency. Also removes unused keep_window_size field from Mask_params struct.
Renames attention mask and bias parameters from `attn_mask`/`attn_bias` to `mask`/`bias` for improved clarity and consistency throughout the flash attention API. Removes the unused `keep_window_size` parameter from function signatures and parameter structures to clean up the interface.
…h-dmattn into Support-backward
Renames attn_mask_offset to mask_offset and attn_bias_offset to bias_offset to improve code readability and reduce verbosity while maintaining the same functionality.
Simplifies parameter naming by removing the "attn_" prefix from mask and bias related variables throughout the flash attention kernel. Updates all references to use the shorter naming convention: - attn_mask_* becomes mask_* - attn_bias_* becomes bias_* Improves code readability and maintains consistency across parameter names while preserving all existing functionality.
Changes row_stride and col_stride parameters from int to index_t template type in mask_offset and bias_offset methods. Ensures type consistency across all stride parameters and eliminates potential type mismatches in offset calculations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Refactors attention mask and bias parameter names by removing redundant "attn_" prefixes to improve code readability and consistency across the flash attention implementation. Additionally removes the unused keep_window_size parameter to streamline the API interface.
- Remove "attn_" prefix from mask and bias parameter names throughout the codebase
- Update corresponding function signatures and struct member names for consistency
- Eliminate unused
keep_window_sizeparameter from the API
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| csrc/src/flash_fwd_kernel.h | Update parameter references to use simplified mask/bias naming in kernel functions |
| csrc/src/flash.h | Rename struct members in Mask_params and Bias_params, remove unused keep_window_size field |
| csrc/src/block_info.h | Update function names from attn_mask_offset/attn_bias_offset to mask_offset/bias_offset |
| csrc/flash_api.cpp | Update function parameters and internal references to use simplified naming, remove keep_window_size parameter |
| __forceinline__ __device__ index_t mask_offset(const index_t batch_stride, const index_t row_stride, const index_t col_stride, const int bidb) const { | ||
| index_t offset = sum_s_q == -1 ? bidb * batch_stride : uint32_t(sum_s_q) * row_stride; | ||
| sum_s_k == -1 ? offset += leftpad_k * col_stride : offset += uint32_t(sum_s_k + leftpad_k) * col_stride; | ||
| return offset; | ||
| } | ||
|
|
||
| template <typename index_t> | ||
| __forceinline__ __device__ index_t attn_bias_offset(const index_t batch_stride, const int row_stride, const int col_stride, const int bidb | ||
| ) const { | ||
| __forceinline__ __device__ index_t bias_offset(const index_t batch_stride, const index_t row_stride, const index_t col_stride, const int bidb) const { |
Copilot
AI
Jul 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter type for row_stride was changed from int to index_t, but this change is inconsistent with the col_stride parameter which remains int. For consistency, all stride parameters should have the same type.
| template <typename index_t> | ||
| __forceinline__ __device__ index_t attn_bias_offset(const index_t batch_stride, const int row_stride, const int col_stride, const int bidb | ||
| ) const { | ||
| __forceinline__ __device__ index_t bias_offset(const index_t batch_stride, const index_t row_stride, const index_t col_stride, const int bidb) const { |
Copilot
AI
Jul 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter type for row_stride and col_stride were changed from int to index_t, but this creates an inconsistency where bidb remains int while other parameters use index_t. Consider using consistent types for all index-related parameters.
Remove redundant prefixes from attention mask and bias parameters to enhance code readability and consistency. Eliminate unused parameters to streamline the interface across the flash attention API.