-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Optimize gather/scatter to unit-stride memop + shuffle #66279
Conversation
If we have a gather or a scatter whose index describes a permutation of the lanes, we can lower this as a shuffle + a unit strided memory operation. For RISCV, this replaces a indexed load/store with a unit strided memory operation and a vrgather (at worst). I did not both to implement the vp.scatter and vp.gather variants of these transforms because they'd only be legal when EVL was VLMAX. Given that, they should have been transformed to the non-vp variants anyways. I haven't checked to see if they actually are.
@llvm/pr-subscribers-backend-risc-v ChangesIf we have a gather or a scatter whose index describes a permutation of the lanes, we can lower this as a shuffle + a unit strided memory operation. For RISCV, this replaces a indexed load/store with a unit strided memory operation and a vrgather (at worst).I did not both to implement the vp.scatter and vp.gather variants of these transforms because they'd only be legal when EVL was VLMAX. Given that, they should have been transformed to the non-vp variants anyways. I haven't checked to see if they actually are.Patch is 26.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66279.diff 3 Files Affected:
<pre> +/// Match the index vector of a scatter or gather node as the shuffle mask
SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
+define <8 x i16> @mgather_shuffle_reverse(ptr %base) {
+define <8 x i16> @mgather_shuffle_rotate(ptr %base) {
+define <8 x i16> @mgather_shuffle_vrgather(ptr %base) {
+define void @mscatter_unit_stride(<8 x i16> %val, ptr %base) {
+define void @mscatter_unit_stride_with_offset(<8 x i16> %val, ptr %base) { |
// Create the shuffle mask and check all bits active | ||
assert(ShuffleMask.empty()); | ||
BitVector ActiveLanes(NumElems); | ||
for (const auto Idx : enumerate(Index->ops())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using the ops iterator but then discarding the value() part? Can we just use for (unsigned i = 0; i < Index->getNumOperands(); ++i)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a change to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If we have a gather or a scatter whose index describes a permutation of the lanes, we can lower this as a shuffle + a unit strided memory operation. For RISCV, this replaces a indexed load/store with a unit strided memory operation and a vrgather (at worst). I did not bother to implement the vp.scatter and vp.gather variants of these transforms because they'd only be legal when EVL was VLMAX. Given that, they should have been transformed to the non-vp variants anyways. I haven't checked to see if they actually are.
Pushed as ff2622b. |
…66279) If we have a gather or a scatter whose index describes a permutation of the lanes, we can lower this as a shuffle + a unit strided memory operation. For RISCV, this replaces a indexed load/store with a unit strided memory operation and a vrgather (at worst). I did not bother to implement the vp.scatter and vp.gather variants of these transforms because they'd only be legal when EVL was VLMAX. Given that, they should have been transformed to the non-vp variants anyways. I haven't checked to see if they actually are.
…66279) If we have a gather or a scatter whose index describes a permutation of the lanes, we can lower this as a shuffle + a unit strided memory operation. For RISCV, this replaces a indexed load/store with a unit strided memory operation and a vrgather (at worst). I did not bother to implement the vp.scatter and vp.gather variants of these transforms because they'd only be legal when EVL was VLMAX. Given that, they should have been transformed to the non-vp variants anyways. I haven't checked to see if they actually are.
…66279) If we have a gather or a scatter whose index describes a permutation of the lanes, we can lower this as a shuffle + a unit strided memory operation. For RISCV, this replaces a indexed load/store with a unit strided memory operation and a vrgather (at worst). I did not bother to implement the vp.scatter and vp.gather variants of these transforms because they'd only be legal when EVL was VLMAX. Given that, they should have been transformed to the non-vp variants anyways. I haven't checked to see if they actually are.
If we have a gather or a scatter whose index describes a permutation of the lanes, we can lower this as a shuffle + a unit strided memory operation. For RISCV, this replaces a indexed load/store with a unit strided memory operation and a vrgather (at worst).
I did not both to implement the vp.scatter and vp.gather variants of these transforms because they'd only be legal when EVL was VLMAX. Given that, they should have been transformed to the non-vp variants anyways. I haven't checked to see if they actually are.