-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PowerPC] Use rldimi/rlwimi to optimize build_vector #67640
base: main
Are you sure you want to change the base?
Conversation
Leverage rldimi/rlwimi instructions to generate better code for BUILD_VECTOR: - For v16i8, four groups of (i8 << 24) | (i8 << 16) | (i8 << 8) | i8 to construct a vector. - For v8i16, four groups of (i16 << 16) | i16 to construct a vector. We already have patterns for v4i32 and v2i64 construction.
; P9LE-NEXT: sldi 4, 4, 1 | ||
; P9LE-NEXT: li 7, 16 | ||
; P9LE-NEXT: add 6, 3, 4 | ||
; P9LE-NEXT: lxsihzx 4, 3, 4 | ||
; P9LE-NEXT: addis 3, 2, .LCPI2_0@toc@ha | ||
; P9LE-NEXT: lxsihzx 2, 6, 7 | ||
; P9LE-NEXT: li 6, 0 | ||
; P9LE-NEXT: addi 3, 3, .LCPI2_0@toc@l | ||
; P9LE-NEXT: mtvsrd 3, 6 | ||
; P9LE-NEXT: lxv 0, 0(3) | ||
; P9LE-NEXT: li 3, 0 | ||
; P9LE-NEXT: vmrghh 4, 3, 4 | ||
; P9LE-NEXT: vmrghh 2, 3, 2 | ||
; P9LE-NEXT: vsplth 3, 3, 3 | ||
; P9LE-NEXT: xxmrglw 3, 4, 3 | ||
; P9LE-NEXT: xxperm 3, 2, 0 | ||
; P9LE-NEXT: xxspltw 2, 3, 2 | ||
; P9LE-NEXT: vadduwm 2, 3, 2 | ||
; P9LE-NEXT: vextuwrx 3, 3, 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these code should be dead. I tried opt
on it, the loop is gone, and then use current llc, they're removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? What is the purpose of a codegen test case that doesn't show the desired codegen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be more generally useful (i.e. in the target independent DAG combiner).
; P9LE-NEXT: sldi 4, 4, 1 | ||
; P9LE-NEXT: li 7, 16 | ||
; P9LE-NEXT: add 6, 3, 4 | ||
; P9LE-NEXT: lxsihzx 4, 3, 4 | ||
; P9LE-NEXT: addis 3, 2, .LCPI2_0@toc@ha | ||
; P9LE-NEXT: lxsihzx 2, 6, 7 | ||
; P9LE-NEXT: li 6, 0 | ||
; P9LE-NEXT: addi 3, 3, .LCPI2_0@toc@l | ||
; P9LE-NEXT: mtvsrd 3, 6 | ||
; P9LE-NEXT: lxv 0, 0(3) | ||
; P9LE-NEXT: li 3, 0 | ||
; P9LE-NEXT: vmrghh 4, 3, 4 | ||
; P9LE-NEXT: vmrghh 2, 3, 2 | ||
; P9LE-NEXT: vsplth 3, 3, 3 | ||
; P9LE-NEXT: xxmrglw 3, 4, 3 | ||
; P9LE-NEXT: xxperm 3, 2, 0 | ||
; P9LE-NEXT: xxspltw 2, 3, 2 | ||
; P9LE-NEXT: vadduwm 2, 3, 2 | ||
; P9LE-NEXT: vextuwrx 3, 3, 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? What is the purpose of a codegen test case that doesn't show the desired codegen?
SDLoc dl(Op); | ||
|
||
// There are already patterns for v4i32 and v2i64 construction. | ||
if (VT == MVT::v16i8 || VT == MVT::v8i16) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please flip the condition and make this an early exit.
for (int i = 0; i < 4; ++i) { | ||
for (int j = 0; j < EltsFor32; ++j) { | ||
SDValue Elt = LittleEndian | ||
? Op.getOperand(i * EltsFor32 + EltsFor32 - j - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These expressions are far from obvious. Please initialize variables with these expressions with comments explaining the expressions and then use the variables. The reader should not be forced to work out the details of this math without context.
Leverage rldimi/rlwimi instructions to generate better code for BUILD_VECTOR:
(i8 << 24) | (i8 << 16) | (i8 << 8) | i8
to construct a vector.(i16 << 16) | i16
to construct a vector.We already have patterns for v4i32 and v2i64 construction.
Migrated from https://reviews.llvm.org/D94467