Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Reorders tensor variable declarations to group related functionality together and improves code formatting consistency.

Moves tensor fragment declarations closer to their logical usage groups and reformats conditional statements with proper indentation for enhanced code maintainability.

Reorders tensor variable declarations to group related functionality together and improves code formatting consistency.

Moves tensor fragment declarations closer to their logical usage groups and reformats conditional statements with proper indentation for enhanced code maintainability.
Copilot AI review requested due to automatic review settings July 11, 2025 09:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Refactors variable declarations in flash_fwd_kernel.h to improve readability by grouping related tensor fragments and shared-memory copy operations, and reformats a multi-line call for consistent indentation.

  • Rewrites the std::min call within the Is_causal block into a properly indented multi-line expression.
  • Moves the tOrVt tensor fragment declaration to sit alongside other partition fragments.
  • Relocates the SMEM V copy declarations (smem_tiled_copy_V, smem_thr_copy_V, tOsVt) to group them with the K copy logic.

Simplifies the flash forward splitkv kernel by removing the Append_KV template parameter and associated logic. This reduces template instantiation complexity and removes one level of nested switch statements.

Also updates debug printf statements to include function names for better debugging clarity and adjusts block size constants for improved performance characteristics.
Updates block size calculation to use smaller values for better memory efficiency.

Comments out split-KV functionality to avoid extra memory overhead as mentioned in the existing comment about always setting num_splits back to 1.
@LoserCheems LoserCheems merged commit a04dd1b into main Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants