Fixes mask comparison and scaling logic in attention kernel #58

LoserCheems · 2025-07-04T05:40:26Z

Simplifies mask comparison from > 0 to direct boolean evaluation for consistency.

Removes trailing whitespace for code cleanliness.

Corrects duplicate softmax scaling application that was causing incorrect attention computations.

Improves code readability by moving comment to more appropriate location.

Simplifies mask comparison from `> 0` to direct boolean evaluation for consistency. Removes trailing whitespace for code cleanliness. Corrects duplicate softmax scaling application that was causing incorrect attention computations. Improves code readability by moving comment to more appropriate location.

Copilot

Pull Request Overview

Fixes inconsistent mask evaluation and corrects softmax scaling in the forward attention kernel, while cleaning up whitespace and refining inline comments.

Simplify mask check from a boolean comparison to a direct sum-based test
Ensure softmax scaling is only applied once by conditioning on non--inf entries
Move and clarify comments; remove trailing whitespace for readability

Copilot · 2025-07-04T05:41:23Z

flash_dmattn/flash_dmattn_triton.py


        # Check if any element in mask is non-zero
-        any_active = tl.sum(mask > 0) > 0
+        any_active = tl.sum(mask) > 0


Consider using tl.any(mask) (or tl.any(mask != 0)) instead of tl.sum(mask) > 0 to more directly express 'any active elements' and potentially improve efficiency.

Suggested change

any_active = tl.sum(mask) > 0

any_active = tl.any(mask)

Copilot · 2025-07-04T05:41:24Z

flash_dmattn/flash_dmattn_triton.py

        # Slightly faster to multiply the softmax_scale in the tl.exp below since the compiler
        # can then fuse the mult and add into an fma instruction. But if we have bias we need to
        # to multiply with softmax_scale here.
        acc_s = acc_s * softmax_scale + bias


This unconditional scaling duplicates the conditional scaling on the next line, resulting in double application of softmax_scale. Remove this line so that the tl.where version is the single scaling step.

Suggested change

acc_s = acc_s * softmax_scale + bias

# Removed duplicate scaling to avoid double application of softmax_scale.

Copilot · 2025-07-04T05:41:24Z

flash_dmattn/flash_dmattn_triton.py


-        # update acc_o
        if any_active:
+            # load v


[nitpick] The comment # load v is vague; consider expanding it to clarify what data is being loaded and why, e.g., # load V tensor block for value projection.

Suggested change

# load v

# Load the V tensor block for value projection in the attention mechanism.

LoserCheems requested review from Evanwu1125, SNHuan, Copilot and wubingheng111 and removed request for Copilot July 4, 2025 05:40

LoserCheems assigned LoserCheems, Copilot, Evanwu1125, SNHuan and wubingheng111 Jul 4, 2025

Copilot AI reviewed Jul 4, 2025

View reviewed changes

LoserCheems merged commit 4e5053f into main Jul 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes mask comparison and scaling logic in attention kernel #58

Fixes mask comparison and scaling logic in attention kernel #58

Uh oh!

LoserCheems commented Jul 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 4, 2025

Uh oh!

Copilot AI Jul 4, 2025

Uh oh!

Copilot AI Jul 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	acc_s = acc_s * softmax_scale + bias
	# Removed duplicate scaling to avoid double application of softmax_scale.

	# load v
	# Load the V tensor block for value projection in the attention mechanism.

Fixes mask comparison and scaling logic in attention kernel #58

Fixes mask comparison and scaling logic in attention kernel #58

Uh oh!

Conversation

LoserCheems commented Jul 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants