-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVPTX] Fix the error in a pattern match in v4i8 comparisons. #81308
Conversation
; CHECK-NEXT: bfe.u32 %r11, %r1, 24, 8; | ||
; CHECK-NEXT: bfe.u32 %r12, %r1, 16, 8; | ||
; CHECK-NEXT: bfe.u32 %r13, %r1, 8, 8; | ||
; CHECK-NEXT: bfe.u32 %r14, %r1, 0, 8; | ||
; CHECK-NEXT: bfe.u32 %r15, %r2, 0, 8; | ||
; CHECK-NEXT: selp.b32 %r16, %r14, %r15, %p4; | ||
; CHECK-NEXT: bfe.u32 %r17, %r2, 8, 8; | ||
; CHECK-NEXT: selp.b32 %r18, %r13, %r17, %p3; | ||
; CHECK-NEXT: bfi.b32 %r19, %r18, %r16, 8, 8; | ||
; CHECK-NEXT: bfe.u32 %r20, %r2, 16, 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These additional BFE instructions are a side effect of the signed difference of BFE we use during lowering of extractelement
. The pattern-matched comparison folds extract/trunc/sext into a signed BFE, while loading/storing of the elements uses just extractelement value which is lowered as BFE.u32. While PTX is more verbose, ptxas appears to do a reasonably good job here. For sm_80 the difference on the SASS level is that the old code used USEL (select), while the new one uses predicated PRMT
. The number of instructions is about the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example: https://godbolt.org/z/qo1YM4xqf
def: Pat<(setgt (sext_inreg (trunc Int32Regs:$a), i8), (sext_inreg (trunc Int32Regs:$b), i8)), | ||
(SETP_s32rr Int32Regs:$a, Int32Regs:$b, CmpGT)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the root cause of the problem. We should've used bfe($a/$b, 8, 8)
instead of passing $a/$b
to SETP
as is.
The tests worked because $a/$b happened to be the result of bfe()
we've lowered for extractement()
, so PTX output looked sane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand all the details of the original bug but the code change looks good to me. Thanks for checking that SASS is not worse for cases where we have more ptx insts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
The replacement should've had BFE() as the arguments for the comparison, not the source register. While at that, tighten the patterns a bit, and expand them no cover variants with immediate arguments. Also change the default lowering of bfe() to use unsigned variant, so the value of the upper bits is predictable.
The replacement should've had BFE() as the arguments for the comparison, not the source register.
While at that, tighten the patterns a bit, and expand them no cover variants with immediate arguments. Also change the default lowering of bfe() to use unsigned variant, so the value of the upper bits is predictable.
Unsigned BFE is also cheaper. in SASS, signed BFE expands to
UPRMT + USGXT
, while unsigned is justUPRMT
.