Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Enhance clarity and consistency by standardizing parameter names and ordering across various attention functions. Update the API to utilize an auto backend selection function and adjust example configurations to reflect common model setups. Ensure all changes align with established naming conventions and improve overall documentation.

Renames `softmax_scale` to `scale` and `q/k/v` to `query/key/value` for consistency across all flash attention function variants.

Reorders parameters to place `is_causal` before `scale` in function signatures, improving API consistency and alignment with common attention interface patterns.

Updates all function calls, documentation strings, and parameter passing to reflect the standardized naming convention.
Replaces direct function import with auto backend selection approach for better flexibility.

Changes parameter names from abbreviated forms to full descriptive names for improved clarity.

Updates num_heads from 12 to 16 in examples to reflect more common model configurations.

Renames softmax_scale parameter to scale for consistency with standard naming conventions.
Updates parameter names to use consistent naming conventions across CUDA, Triton, and Flex attention implementations.

Changes 'softmax_scale' to 'scale' and converts positional arguments to keyword arguments for better API consistency and clarity.

Fixes tensor dimension ordering in Flex attention by adding transpose operations to match expected input format.
Changes `softmax_scale` to `scale` parameter name for consistency across CUDA, Triton, and Flex attention implementations.

Updates Flex attention to use keyword arguments and adds tensor transposes to match expected input format.

Removes unused return value from Flex attention call to align with other implementations.
@LoserCheems LoserCheems requested review from Evanwu1125, SNHuan, Copilot and wubingheng111 and removed request for Copilot August 7, 2025 15:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR standardizes parameter naming and improves API consistency across attention functions by renaming parameters and reordering function signatures. The changes focus on making the API more consistent by using scale instead of softmax_scale, query/key/value instead of q/k/v, and standardizing parameter order with is_causal before scale.

  • Standardize parameter names: softmax_scalescale, q/k/vquery/key/value
  • Reorder function parameters to place is_causal before scale for consistency
  • Update documentation examples to use auto backend selection and common model configurations

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
flash_dmattn/flash_dmattn_triton.py Reorders parameters in FlashDMAttnFunc.forward and triton_dmattn_func
flash_dmattn/flash_dmattn_interface.py Renames softmax_scale to scale and q/k/v to query/key/value across all attention functions
flash_dmattn/flash_dmattn_flex.py Reorders parameters to place is_causal before scale
benchmarks/benchmark_forward_performance.py Updates function calls to use new parameter names and ordering
benchmarks/benchmark_forward_equivalence.py Updates function calls to use new parameter names and ordering
README_zh.md Updates example code to use flash_dmattn_func_auto and standardized parameter names
README.md Updates example code to use flash_dmattn_func_auto and standardized parameter names

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants