-
Notifications
You must be signed in to change notification settings - Fork 39
Fixes mask comparison and scaling logic in attention kernel #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Simplifies mask comparison from `> 0` to direct boolean evaluation for consistency. Removes trailing whitespace for code cleanliness. Corrects duplicate softmax scaling application that was causing incorrect attention computations. Improves code readability by moving comment to more appropriate location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Fixes inconsistent mask evaluation and corrects softmax scaling in the forward attention kernel, while cleaning up whitespace and refining inline comments.
- Simplify mask check from a boolean comparison to a direct sum-based test
- Ensure softmax scaling is only applied once by conditioning on non-
-infentries - Move and clarify comments; remove trailing whitespace for readability
|
|
||
| # Check if any element in mask is non-zero | ||
| any_active = tl.sum(mask > 0) > 0 | ||
| any_active = tl.sum(mask) > 0 |
Copilot
AI
Jul 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using tl.any(mask) (or tl.any(mask != 0)) instead of tl.sum(mask) > 0 to more directly express 'any active elements' and potentially improve efficiency.
| any_active = tl.sum(mask) > 0 | |
| any_active = tl.any(mask) |
| # Slightly faster to multiply the softmax_scale in the tl.exp below since the compiler | ||
| # can then fuse the mult and add into an fma instruction. But if we have bias we need to | ||
| # to multiply with softmax_scale here. | ||
| acc_s = acc_s * softmax_scale + bias |
Copilot
AI
Jul 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unconditional scaling duplicates the conditional scaling on the next line, resulting in double application of softmax_scale. Remove this line so that the tl.where version is the single scaling step.
| acc_s = acc_s * softmax_scale + bias | |
| # Removed duplicate scaling to avoid double application of softmax_scale. |
|
|
||
| # update acc_o | ||
| if any_active: | ||
| # load v |
Copilot
AI
Jul 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The comment # load v is vague; consider expanding it to clarify what data is being loaded and why, e.g., # load V tensor block for value projection.
| # load v | |
| # Load the V tensor block for value projection in the attention mechanism. |
Simplifies mask comparison from
> 0to direct boolean evaluation for consistency.Removes trailing whitespace for code cleanliness.
Corrects duplicate softmax scaling application that was causing incorrect attention computations.
Improves code readability by moving comment to more appropriate location.