Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Simplifies mask comparison from > 0 to direct boolean evaluation for consistency.

Removes trailing whitespace for code cleanliness.

Corrects duplicate softmax scaling application that was causing incorrect attention computations.

Improves code readability by moving comment to more appropriate location.

Simplifies mask comparison from `> 0` to direct boolean evaluation for consistency.

Removes trailing whitespace for code cleanliness.

Corrects duplicate softmax scaling application that was causing incorrect attention computations.

Improves code readability by moving comment to more appropriate location.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Fixes inconsistent mask evaluation and corrects softmax scaling in the forward attention kernel, while cleaning up whitespace and refining inline comments.

  • Simplify mask check from a boolean comparison to a direct sum-based test
  • Ensure softmax scaling is only applied once by conditioning on non--inf entries
  • Move and clarify comments; remove trailing whitespace for readability


# Check if any element in mask is non-zero
any_active = tl.sum(mask > 0) > 0
any_active = tl.sum(mask) > 0
Copy link

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using tl.any(mask) (or tl.any(mask != 0)) instead of tl.sum(mask) > 0 to more directly express 'any active elements' and potentially improve efficiency.

Suggested change
any_active = tl.sum(mask) > 0
any_active = tl.any(mask)

Copilot uses AI. Check for mistakes.
# Slightly faster to multiply the softmax_scale in the tl.exp below since the compiler
# can then fuse the mult and add into an fma instruction. But if we have bias we need to
# to multiply with softmax_scale here.
acc_s = acc_s * softmax_scale + bias
Copy link

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unconditional scaling duplicates the conditional scaling on the next line, resulting in double application of softmax_scale. Remove this line so that the tl.where version is the single scaling step.

Suggested change
acc_s = acc_s * softmax_scale + bias
# Removed duplicate scaling to avoid double application of softmax_scale.

Copilot uses AI. Check for mistakes.

# update acc_o
if any_active:
# load v
Copy link

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment # load v is vague; consider expanding it to clarify what data is being loaded and why, e.g., # load V tensor block for value projection.

Suggested change
# load v
# Load the V tensor block for value projection in the attention mechanism.

Copilot uses AI. Check for mistakes.
@LoserCheems LoserCheems merged commit 4e5053f into main Jul 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants