Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVPTX] Fix the error in a pattern match in v4i8 comparisons. #81308

Merged
merged 2 commits into from
Feb 12, 2024

Conversation

Artem-B
Copy link
Member

@Artem-B Artem-B commented Feb 9, 2024

The replacement should've had BFE() as the arguments for the comparison, not the source register.

While at that, tighten the patterns a bit, and expand them no cover variants with immediate arguments. Also change the default lowering of bfe() to use unsigned variant, so the value of the upper bits is predictable.

Unsigned BFE is also cheaper. in SASS, signed BFE expands to UPRMT + USGXT, while unsigned is just UPRMT.

Comment on lines +271 to +280
; CHECK-NEXT: bfe.u32 %r11, %r1, 24, 8;
; CHECK-NEXT: bfe.u32 %r12, %r1, 16, 8;
; CHECK-NEXT: bfe.u32 %r13, %r1, 8, 8;
; CHECK-NEXT: bfe.u32 %r14, %r1, 0, 8;
; CHECK-NEXT: bfe.u32 %r15, %r2, 0, 8;
; CHECK-NEXT: selp.b32 %r16, %r14, %r15, %p4;
; CHECK-NEXT: bfe.u32 %r17, %r2, 8, 8;
; CHECK-NEXT: selp.b32 %r18, %r13, %r17, %p3;
; CHECK-NEXT: bfi.b32 %r19, %r18, %r16, 8, 8;
; CHECK-NEXT: bfe.u32 %r20, %r2, 16, 8;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These additional BFE instructions are a side effect of the signed difference of BFE we use during lowering of extractelement. The pattern-matched comparison folds extract/trunc/sext into a signed BFE, while loading/storing of the elements uses just extractelement value which is lowered as BFE.u32. While PTX is more verbose, ptxas appears to do a reasonably good job here. For sm_80 the difference on the SASS level is that the old code used USEL (select), while the new one uses predicated PRMT. The number of instructions is about the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -2262 to -2263
def: Pat<(setgt (sext_inreg (trunc Int32Regs:$a), i8), (sext_inreg (trunc Int32Regs:$b), i8)),
(SETP_s32rr Int32Regs:$a, Int32Regs:$b, CmpGT)>;
Copy link
Member Author

@Artem-B Artem-B Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the root cause of the problem. We should've used bfe($a/$b, 8, 8) instead of passing $a/$b to SETP as is.
The tests worked because $a/$b happened to be the result of bfe() we've lowered for extractement(), so PTX output looked sane.

Copy link
Contributor

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand all the details of the original bug but the code change looks good to me. Thanks for checking that SASS is not worse for cases where we have more ptx insts!

Copy link
Contributor

@justinfargnoli justinfargnoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

The replacement should've had BFE() as the arguments for the comparison, not the source register.

While at that, tighten the patterns a bit, and expand them no cover variants with immediate arguments.
Also change the default lowering of bfe() to use unsigned variant, so the value of the upper bits is predictable.
@Artem-B Artem-B merged commit 8799d71 into llvm:main Feb 12, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants