Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Uncomments code that forces num_splits to 1 and resets accumulator tensors to avoid extra memory overhead from Split-KV operations.

Updates error messages to reference "FlashDynamicMaskAttention" instead of "FlashAttention" for consistency with the library's actual implementation.

Uncomments code that forces num_splits to 1 and resets accumulator tensors to avoid extra memory overhead from Split-KV operations.

Updates error messages to reference "FlashDynamicMaskAttention" instead of "FlashAttention" for consistency with the library's actual implementation.
Copilot AI review requested due to automatic review settings July 25, 2025 15:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables Split-KV avoidance by uncommenting code that forces the number of splits to 1 and resets accumulator tensors, while also updating error messages for consistency with the FlashDynamicMaskAttention library implementation.

  • Uncomments code to force num_splits = 1 and reset accumulator tensors to avoid Split-KV memory overhead
  • Updates error messages to reference "FlashDynamicMaskAttention" instead of "FlashAttention"

auto q_dtype = q.dtype();
TORCH_CHECK(q_dtype == torch::kFloat16 || q_dtype == torch::kBFloat16,
"FlashAttention only support fp16 and bf16 data type");
TORCH_CHECK(q_dtype == torch::kFloat16 || q_dtype == torch::kBFloat16, "FlashDynamicMaskAttention only support fp16 and bf16 data type");
Copy link

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar error: "only support" should be "only supports" to maintain subject-verb agreement.

Suggested change
TORCH_CHECK(q_dtype == torch::kFloat16 || q_dtype == torch::kBFloat16, "FlashDynamicMaskAttention only support fp16 and bf16 data type");
TORCH_CHECK(q_dtype == torch::kFloat16 || q_dtype == torch::kBFloat16, "FlashDynamicMaskAttention only supports fp16 and bf16 data type");

Copilot uses AI. Check for mistakes.
@LoserCheems LoserCheems self-assigned this Jul 25, 2025
@LoserCheems LoserCheems merged commit 68e6034 into main Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants