Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SelectionDAG] Use unaligned store to move AVX registers onto stack for extractelement #78422

Merged
merged 1 commit into from
Feb 2, 2024

Conversation

Nirhar
Copy link
Contributor

@Nirhar Nirhar commented Jan 17, 2024

Prior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers when the function was marked as a no-realign-stack function. This lead to misalignment between the stack and the instruction generated. This patch fixes the issue.

Fixes #77730

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Jan 17, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 17, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-backend-x86

Author: Manish Kausik H (Nirhar)

Changes

Prior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers when the function was marked as a no-realign-stack function. This lead to misalignment between the stack and the instruction generated. This patch fixes the issue.

Fixes #77730


Full diff: https://github.com/llvm/llvm-project/pull/78422.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+12-2)
  • (added) llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll (+23)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index adfeea073bff65..4de78f75bbb7df 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -21,6 +21,7 @@
 #include "llvm/Analysis/ConstantFolding.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/CodeGen/ISDOpcodes.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineJumpTableInfo.h"
 #include "llvm/CodeGen/MachineMemOperand.h"
@@ -1425,9 +1426,18 @@ SDValue SelectionDAGLegalize::ExpandExtractFromVectorThroughStack(SDValue Op) {
 
   if (!Ch.getNode()) {
     // Store the value to a temporary stack slot, then LOAD the returned part.
+    auto &MF = DAG.getMachineFunction();
+    auto &MFI = MF.getFrameInfo();
     StackPtr = DAG.CreateStackTemporary(VecVT);
-    Ch = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr,
-                      MachinePointerInfo());
+    int FI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
+    MachinePointerInfo PtrInfo =
+      MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FI);
+    
+    MachineMemOperand *StoreMMO = MF.getMachineMemOperand(PtrInfo, MachineMemOperand::MOStore, 
+                            MFI.getObjectSize(FI),
+                            MFI.getObjectAlign(FI));
+
+    Ch = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, StoreMMO);
   }
 
   SDValue NewLoad;
diff --git a/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll b/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
new file mode 100644
index 00000000000000..70d92e0879c1e7
--- /dev/null
+++ b/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
@@ -0,0 +1,23 @@
+; RUN: llc < %s  | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
+
+target triple = "x86_64-unknown-linux-gnu"
+
+;CHECK: %bb.0:   
+;CHECK-NEXT: # kill: def $edi killed $edi def $rdi                             
+;CHECK-NEXT: vxorps	%xmm0, %xmm0, %xmm0
+;CHECK-NEXT: vmovups	%ymm0, -40(%rsp)
+;CHECK-NEXT: andl	$31, %edi
+;CHECK-NEXT: movzbl	-40(%rsp,%rdi), %eax
+;CHECK-NEXT: vzeroupper
+;CHECK-NEXT: retq
+
+define i32 @foo(i32 %arg1) #0 {
+entry:
+  %a = extractelement <32 x i8> zeroinitializer, i32 %arg1
+  %b = zext i8 %a to i32
+  ret i32 %b
+}
+
+attributes #0 = { "no-realign-stack" "target-cpu"="skylake-avx512" }
\ No newline at end of file

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 17, 2024

@llvm/pr-subscribers-llvm-selectiondag

Author: Manish Kausik H (Nirhar)

Changes

Prior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers when the function was marked as a no-realign-stack function. This lead to misalignment between the stack and the instruction generated. This patch fixes the issue.

Fixes #77730


Full diff: https://github.com/llvm/llvm-project/pull/78422.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (+12-2)
  • (added) llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll (+23)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
index adfeea073bff652..4de78f75bbb7dfa 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -21,6 +21,7 @@
 #include "llvm/Analysis/ConstantFolding.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/CodeGen/ISDOpcodes.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineJumpTableInfo.h"
 #include "llvm/CodeGen/MachineMemOperand.h"
@@ -1425,9 +1426,18 @@ SDValue SelectionDAGLegalize::ExpandExtractFromVectorThroughStack(SDValue Op) {
 
   if (!Ch.getNode()) {
     // Store the value to a temporary stack slot, then LOAD the returned part.
+    auto &MF = DAG.getMachineFunction();
+    auto &MFI = MF.getFrameInfo();
     StackPtr = DAG.CreateStackTemporary(VecVT);
-    Ch = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr,
-                      MachinePointerInfo());
+    int FI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
+    MachinePointerInfo PtrInfo =
+      MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FI);
+    
+    MachineMemOperand *StoreMMO = MF.getMachineMemOperand(PtrInfo, MachineMemOperand::MOStore, 
+                            MFI.getObjectSize(FI),
+                            MFI.getObjectAlign(FI));
+
+    Ch = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, StoreMMO);
   }
 
   SDValue NewLoad;
diff --git a/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll b/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
new file mode 100644
index 000000000000000..70d92e0879c1e7a
--- /dev/null
+++ b/llvm/test/CodeGen/X86/unaligned_extract_from_vector_through_stack.ll
@@ -0,0 +1,23 @@
+; RUN: llc < %s  | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
+
+target triple = "x86_64-unknown-linux-gnu"
+
+;CHECK: %bb.0:   
+;CHECK-NEXT: # kill: def $edi killed $edi def $rdi                             
+;CHECK-NEXT: vxorps	%xmm0, %xmm0, %xmm0
+;CHECK-NEXT: vmovups	%ymm0, -40(%rsp)
+;CHECK-NEXT: andl	$31, %edi
+;CHECK-NEXT: movzbl	-40(%rsp,%rdi), %eax
+;CHECK-NEXT: vzeroupper
+;CHECK-NEXT: retq
+
+define i32 @foo(i32 %arg1) #0 {
+entry:
+  %a = extractelement <32 x i8> zeroinitializer, i32 %arg1
+  %b = zext i8 %a to i32
+  ret i32 %b
+}
+
+attributes #0 = { "no-realign-stack" "target-cpu"="skylake-avx512" }
\ No newline at end of file

@Nirhar
Copy link
Contributor Author

Nirhar commented Jan 17, 2024

cc @RKSimon @danilaml @HaohaiWen @KanRobert for your reviews

Copy link

github-actions bot commented Jan 17, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@Nirhar Nirhar force-pushed the alignment-error branch 2 times, most recently from 044a00d to bc292d3 Compare January 22, 2024 17:40
@Nirhar Nirhar requested review from arsenm and RKSimon January 22, 2024 17:44
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp Outdated Show resolved Hide resolved
@danilaml
Copy link
Collaborator

Can you make a follow up PR that fixes similar issue in ExpandInsertToVectorThroughStack (and possibly ExpandVectorBuildThroughStack, although I don't have a test for that).

@Nirhar
Copy link
Contributor Author

Nirhar commented Jan 23, 2024

@danilaml Sure, I plan on doing so!

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you check the build failure in CodeGen/AArch64/sve-extract-fixed-vector.ll

@Nirhar
Copy link
Contributor Author

Nirhar commented Jan 25, 2024

@RKSimon I regenerated the test file sve-extract-fixed-vector.ll with update-llc-test-checks.py to accomodate new changes.

@Nirhar Nirhar requested a review from RKSimon January 25, 2024 13:45
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @davemgreen suggested - please add basic handling for scalable vector types.

…or `extractelement`

Prior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers
when the function was marked as a no-realign-stack function. This lead to misalignment
between the stack and the instruction generated. This patch fixes the issue.

Fixes llvm#77730
uint64_t ObjectSize = isObjectScalable ? ~UINT64_C(0) : MFI.getObjectSize(FI);
MachineMemOperand *MMO = MF.getMachineMemOperand(
PtrInfo, MachineMemOperand::MOStore, ObjectSize, MFI.getObjectAlign(FI));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can fold to directly return MF.getMachineMemOperand

@arsenm arsenm merged commit a768bc6 into llvm:main Feb 2, 2024
4 checks passed
agozillon pushed a commit to agozillon/llvm-project that referenced this pull request Feb 5, 2024
…or `extractelement` (llvm#78422)

Prior to this patch, SelectionDAG generated aligned move onto stacks for
AVX registers when the function was marked as a no-realign-stack
function. This lead to misalignment between the stack and the
instruction generated. This patch fixes the issue.

Fixes llvm#77730
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SelectionDAG: 32 byte aligned store to 16 byte aligned stack generated for no-realign-stack functions
6 participants