-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SelectionDAG] Use unaligned store to move AVX registers onto stack for extractelement
#78422
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-backend-x86 Author: Manish Kausik H (Nirhar) ChangesPrior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers when the function was marked as a no-realign-stack function. This lead to misalignment between the stack and the instruction generated. This patch fixes the issue. Fixes #77730 Full diff: https://github.com/llvm/llvm-project/pull/78422.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index adfeea073bff65..4de78f75bbb7df 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -21,6 +21,7 @@
#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/CodeGen/ISDOpcodes.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineJumpTableInfo.h"
#include "llvm/CodeGen/MachineMemOperand.h"
@@ -1425,9 +1426,18 @@ SDValue SelectionDAGLegalize::ExpandExtractFromVectorThroughStack(SDValue Op) {
if (!Ch.getNode()) {
// Store the value to a temporary stack slot, then LOAD the returned part.
+ auto &MF = DAG.getMachineFunction();
+ auto &MFI = MF.getFrameInfo();
StackPtr = DAG.CreateStackTemporary(VecVT);
- Ch = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr,
- MachinePointerInfo());
+ int FI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
+ MachinePointerInfo PtrInfo =
+ MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FI);
+
+ MachineMemOperand *StoreMMO = MF.getMachineMemOperand(PtrInfo, MachineMemOperand::MOStore,
+ MFI.getObjectSize(FI),
+ MFI.getObjectAlign(FI));
+
+ Ch = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, StoreMMO);
}
SDValue NewLoad;
diff --git a/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll b/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
new file mode 100644
index 00000000000000..70d92e0879c1e7
--- /dev/null
+++ b/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
@@ -0,0 +1,23 @@
+; RUN: llc < %s | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
+
+target triple = "x86_64-unknown-linux-gnu"
+
+;CHECK: %bb.0:
+;CHECK-NEXT: # kill: def $edi killed $edi def $rdi
+;CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0
+;CHECK-NEXT: vmovups %ymm0, -40(%rsp)
+;CHECK-NEXT: andl $31, %edi
+;CHECK-NEXT: movzbl -40(%rsp,%rdi), %eax
+;CHECK-NEXT: vzeroupper
+;CHECK-NEXT: retq
+
+define i32 @foo(i32 %arg1) #0 {
+entry:
+ %a = extractelement <32 x i8> zeroinitializer, i32 %arg1
+ %b = zext i8 %a to i32
+ ret i32 %b
+}
+
+attributes #0 = { "no-realign-stack" "target-cpu"="skylake-avx512" }
\ No newline at end of file
|
@llvm/pr-subscribers-llvm-selectiondag Author: Manish Kausik H (Nirhar) ChangesPrior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers when the function was marked as a no-realign-stack function. This lead to misalignment between the stack and the instruction generated. This patch fixes the issue. Fixes #77730 Full diff: https://github.com/llvm/llvm-project/pull/78422.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index adfeea073bff652..4de78f75bbb7dfa 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -21,6 +21,7 @@
#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/CodeGen/ISDOpcodes.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineJumpTableInfo.h"
#include "llvm/CodeGen/MachineMemOperand.h"
@@ -1425,9 +1426,18 @@ SDValue SelectionDAGLegalize::ExpandExtractFromVectorThroughStack(SDValue Op) {
if (!Ch.getNode()) {
// Store the value to a temporary stack slot, then LOAD the returned part.
+ auto &MF = DAG.getMachineFunction();
+ auto &MFI = MF.getFrameInfo();
StackPtr = DAG.CreateStackTemporary(VecVT);
- Ch = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr,
- MachinePointerInfo());
+ int FI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
+ MachinePointerInfo PtrInfo =
+ MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FI);
+
+ MachineMemOperand *StoreMMO = MF.getMachineMemOperand(PtrInfo, MachineMemOperand::MOStore,
+ MFI.getObjectSize(FI),
+ MFI.getObjectAlign(FI));
+
+ Ch = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, StoreMMO);
}
SDValue NewLoad;
diff --git a/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll b/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
new file mode 100644
index 000000000000000..70d92e0879c1e7a
--- /dev/null
+++ b/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
@@ -0,0 +1,23 @@
+; RUN: llc < %s | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
+
+target triple = "x86_64-unknown-linux-gnu"
+
+;CHECK: %bb.0:
+;CHECK-NEXT: # kill: def $edi killed $edi def $rdi
+;CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0
+;CHECK-NEXT: vmovups %ymm0, -40(%rsp)
+;CHECK-NEXT: andl $31, %edi
+;CHECK-NEXT: movzbl -40(%rsp,%rdi), %eax
+;CHECK-NEXT: vzeroupper
+;CHECK-NEXT: retq
+
+define i32 @foo(i32 %arg1) #0 {
+entry:
+ %a = extractelement <32 x i8> zeroinitializer, i32 %arg1
+ %b = zext i8 %a to i32
+ ret i32 %b
+}
+
+attributes #0 = { "no-realign-stack" "target-cpu"="skylake-avx512" }
\ No newline at end of file
|
dce9660
to
8f18d84
Compare
cc @RKSimon @danilaml @HaohaiWen @KanRobert for your reviews |
✅ With the latest revision this PR passed the C/C++ code formatter. |
8f18d84
to
3ea928b
Compare
llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
Outdated
Show resolved
Hide resolved
044a00d
to
bc292d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one minor
Can you make a follow up PR that fixes similar issue in ExpandInsertToVectorThroughStack (and possibly ExpandVectorBuildThroughStack, although I don't have a test for that). |
@danilaml Sure, I plan on doing so! |
bc292d3
to
d0cacaf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you check the build failure in CodeGen/AArch64/sve-extract-fixed-vector.ll
d0cacaf
to
f8219a7
Compare
@RKSimon I regenerated the test file |
f8219a7
to
0065c31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @davemgreen suggested - please add basic handling for scalable vector types.
0065c31
to
f2b0b1b
Compare
f2b0b1b
to
80b8940
Compare
80b8940
to
933c3f8
Compare
…or `extractelement` Prior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers when the function was marked as a no-realign-stack function. This lead to misalignment between the stack and the instruction generated. This patch fixes the issue. Fixes llvm#77730
933c3f8
to
a6614ec
Compare
uint64_t ObjectSize = isObjectScalable ? ~UINT64_C(0) : MFI.getObjectSize(FI); | ||
MachineMemOperand *MMO = MF.getMachineMemOperand( | ||
PtrInfo, MachineMemOperand::MOStore, ObjectSize, MFI.getObjectAlign(FI)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can fold to directly return MF.getMachineMemOperand
…or `extractelement` (llvm#78422) Prior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers when the function was marked as a no-realign-stack function. This lead to misalignment between the stack and the instruction generated. This patch fixes the issue. Fixes llvm#77730
Prior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers when the function was marked as a no-realign-stack function. This lead to misalignment between the stack and the instruction generated. This patch fixes the issue.
Fixes #77730