Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Updates terminology throughout the performance benchmark to use "SDPA" (Scaled Dot Product Attention) instead of "Flash Attention" for consistency with PyTorch's official naming convention.

Changes function names, variable references, command-line options, and display text to reflect that the baseline implementation uses PyTorch's SDPA rather than specifically Flash Attention, improving clarity about the actual underlying implementation being benchmarked.

Updates terminology throughout the performance benchmark to use "SDPA" (Scaled Dot Product Attention) instead of "Flash Attention" for consistency with PyTorch's official naming convention.

Changes function names, variable references, command-line options, and display text to reflect that the baseline implementation uses PyTorch's SDPA rather than specifically Flash Attention, improving clarity about the actual underlying implementation being benchmarked.
@LoserCheems LoserCheems requested review from Copilot and removed request for Copilot July 10, 2025 12:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Updates terminology from “Flash Attention” to “SDPA” across the benchmark suite to align with PyTorch’s official naming and improve clarity.

  • Renamed function, variables, flags, and display text from Flash Attention to SDPA.
  • Adjusted command-line options (--test-type) and titles/headers in output.
  • Updated comments and docstrings to reference SDPA baseline.
Comments suppressed due to low confidence (3)

benchmarks/benchmark_forward_performance.py:535

  • [nitpick] Consider renaming the variable run_flash to run_sdpa (and updating its usage) to better reflect that it's selecting SDPA runs rather than Flash Attention.
    run_flash = test_type in ['all', 'sdpa', 'sdpa-vs-cuda', 'sdpa-vs-triton', 'sdpa-vs-flex']

benchmarks/benchmark_forward_performance.py:813

  • [nitpick] The tuple for 'sdpa' still references 'flash_attention'; consider renaming it (e.g., 'sdpa_attention') or matching the new function name scaled_dot_product_attention_cuda for consistency.
            'sdpa': ('flash_attention', results['flash_attention_status'], results['flash_attention_times']),

benchmarks/benchmark_forward_performance.py:172

  • Update the docstring to reflect that this function returns a tuple (attn_outputs, time_ms) or ("OOM", 0) rather than just the attention output, so the documentation matches the actual return values.
    Returns:

@LoserCheems LoserCheems merged commit bf4dddc into main Jul 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants