Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] Resolve FIXME: Add FPCW as a rounding control register #82452

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

AtariDreams
Copy link
Contributor

@AtariDreams AtariDreams commented Feb 21, 2024

To prevent tests from breaking, another fix had to be made: Now, we check if the instruction after a waiting instruction is a call, and if so, we insert the wait.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 21, 2024

@llvm/pr-subscribers-backend-x86

Author: AtariDreams (AtariDreams)

Changes

In tests, this seems to only cause the wait instruction to disappear, which I do not see as necessary in these particular cases.


Patch is 29.15 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/82452.diff

10 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ISelLoweringCall.cpp (+1-3)
  • (modified) llvm/test/CodeGen/X86/fp-intrinsics.ll (-19)
  • (modified) llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll (-9)
  • (modified) llvm/test/CodeGen/X86/fp-strict-scalar-cmp.ll (-1)
  • (modified) llvm/test/CodeGen/X86/fp-strict-scalar.ll (-2)
  • (modified) llvm/test/CodeGen/X86/fp128-cast-strict.ll (+1-3)
  • (modified) llvm/test/CodeGen/X86/fp80-strict-libcalls.ll (-48)
  • (modified) llvm/test/CodeGen/X86/half-constrained.ll (-7)
  • (modified) llvm/test/CodeGen/X86/ldexp-f80.ll (-1)
  • (modified) llvm/test/CodeGen/X86/vec-strict-128.ll (-2)
diff --git a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
index be8275c92e11ae..27784dc99202a0 100644
--- a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+++ b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
@@ -670,9 +670,7 @@ const MCPhysReg *X86TargetLowering::getScratchRegisters(CallingConv::ID) const {
 }
 
 ArrayRef<MCPhysReg> X86TargetLowering::getRoundingControlRegisters() const {
-  // FIXME: We should def X86::FPCW for x87 as well. But it affects a lot of lit
-  // tests at the moment, which is not what we expected.
-  static const MCPhysReg RCRegs[] = {X86::MXCSR};
+  static const MCPhysReg RCRegs[] = {X86::MXCSR, X86::FPCW};
   return RCRegs;
 }
 
diff --git a/llvm/test/CodeGen/X86/fp-intrinsics.ll b/llvm/test/CodeGen/X86/fp-intrinsics.ll
index d2b45ee1e03e63..1387cac7fe1a70 100644
--- a/llvm/test/CodeGen/X86/fp-intrinsics.ll
+++ b/llvm/test/CodeGen/X86/fp-intrinsics.ll
@@ -301,7 +301,6 @@ define double @f6() #0 {
 ; X87-NEXT:    fstpl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll pow
 ; X87-NEXT:    addl $28, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -413,7 +412,6 @@ define double @f8() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll sin
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -464,7 +462,6 @@ define double @f9() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll cos
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -515,7 +512,6 @@ define double @f10() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll exp
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -566,7 +562,6 @@ define double @f11() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll exp2
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -617,7 +612,6 @@ define double @f12() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll log
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -668,7 +662,6 @@ define double @f13() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll log10
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -719,7 +712,6 @@ define double @f14() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll log2
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -770,7 +762,6 @@ define double @f15() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll rint
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -818,7 +809,6 @@ define double @f16() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll nearbyint
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -867,7 +857,6 @@ define double @f19() #0 {
 ; X87-NEXT:    fstpl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fld1
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll fmod
 ; X87-NEXT:    addl $28, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1607,7 +1596,6 @@ define i32 @f23(double %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll lrint
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1655,7 +1643,6 @@ define i32 @f24(float %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstps (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll lrintf
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1703,7 +1690,6 @@ define i64 @f25(double %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll llrint
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1751,7 +1737,6 @@ define i64 @f26(float %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstps (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll llrintf
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1799,7 +1784,6 @@ define i32 @f27(double %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll lround
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1846,7 +1830,6 @@ define i32 @f28(float %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstps (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll lroundf
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1893,7 +1876,6 @@ define i64 @f29(double %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll llround
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1940,7 +1922,6 @@ define i64 @f30(float %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstps (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll llroundf
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
diff --git a/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll b/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
index 1bc308bef8cccf..c9dd8a7374569f 100644
--- a/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
@@ -7,7 +7,6 @@ define float @ceil(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _ceil
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -24,7 +23,6 @@ define float @cos(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _cos
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -41,7 +39,6 @@ define float @exp(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _exp
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -58,7 +55,6 @@ define float @floor(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _floor
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -78,7 +74,6 @@ define float @frem(float %x, float %y) #0 {
 ; CHECK-NEXT:    fxch %st(1)
 ; CHECK-NEXT:    fstpl {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _fmod
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -95,7 +90,6 @@ define float @log(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _log
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -112,7 +106,6 @@ define float @log10(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _log10
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -132,7 +125,6 @@ define float @pow(float %x, float %y) #0 {
 ; CHECK-NEXT:    fxch %st(1)
 ; CHECK-NEXT:    fstpl {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _pow
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -149,7 +141,6 @@ define float @sin(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _sin
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
diff --git a/llvm/test/CodeGen/X86/fp-strict-scalar-cmp.ll b/llvm/test/CodeGen/X86/fp-strict-scalar-cmp.ll
index cb1876fee05aea..47a83e9ae581a2 100644
--- a/llvm/test/CodeGen/X86/fp-strict-scalar-cmp.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-scalar-cmp.ll
@@ -4186,7 +4186,6 @@ define void @foo(float %0, float %1) #0 {
 ; X87-CMOV-NEXT:    flds {{[0-9]+}}(%esp)
 ; X87-CMOV-NEXT:    fucompi %st(1), %st
 ; X87-CMOV-NEXT:    fstp %st(0)
-; X87-CMOV-NEXT:    wait
 ; X87-CMOV-NEXT:    ja bar # TAILCALL
 ; X87-CMOV-NEXT:  # %bb.1:
 ; X87-CMOV-NEXT:    retl
diff --git a/llvm/test/CodeGen/X86/fp-strict-scalar.ll b/llvm/test/CodeGen/X86/fp-strict-scalar.ll
index f1be74f5c3ac4c..5939bb4189b7ab 100644
--- a/llvm/test/CodeGen/X86/fp-strict-scalar.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-scalar.ll
@@ -660,7 +660,6 @@ define double @fma_f64(double %a, double %b, double %c) nounwind strictfp {
 ; X87-NEXT:    fstpl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstpl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll fma
 ; X87-NEXT:    addl $24, %esp
 ; X87-NEXT:    retl
@@ -717,7 +716,6 @@ define float @fma_f32(float %a, float %b, float %c) nounwind strictfp {
 ; X87-NEXT:    fstps {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstps {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstps (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll fmaf
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    retl
diff --git a/llvm/test/CodeGen/X86/fp128-cast-strict.ll b/llvm/test/CodeGen/X86/fp128-cast-strict.ll
index f141153d059acb..a7cc2ae49e005f 100644
--- a/llvm/test/CodeGen/X86/fp128-cast-strict.ll
+++ b/llvm/test/CodeGen/X86/fp128-cast-strict.ll
@@ -28,7 +28,7 @@ define dso_local void @TestFPExtF16_F128() nounwind strictfp {
 ; X64-AVX512-LABEL: TestFPExtF16_F128:
 ; X64-AVX512:       # %bb.0: # %entry
 ; X64-AVX512-NEXT:    pushq %rax
-; X64-AVX512-NEXT:    vmovsh vf16(%rip), %xmm0
+; X64-AVX512-NEXT:    vmovsh {{.*#+}} xmm0 = mem[0],zero,zero,zero,zero,zero,zero,zero
 ; X64-AVX512-NEXT:    callq __extendhftf2@PLT
 ; X64-AVX512-NEXT:    vmovaps %xmm0, vf128(%rip)
 ; X64-AVX512-NEXT:    popq %rax
@@ -167,7 +167,6 @@ define dso_local void @TestFPExtF80_F128() nounwind strictfp {
 ; X64-SSE-NEXT:    subq $24, %rsp
 ; X64-SSE-NEXT:    fldt vf80(%rip)
 ; X64-SSE-NEXT:    fstpt (%rsp)
-; X64-SSE-NEXT:    wait
 ; X64-SSE-NEXT:    callq __extendxftf2@PLT
 ; X64-SSE-NEXT:    movaps %xmm0, vf128(%rip)
 ; X64-SSE-NEXT:    addq $24, %rsp
@@ -178,7 +177,6 @@ define dso_local void @TestFPExtF80_F128() nounwind strictfp {
 ; X64-AVX-NEXT:    subq $24, %rsp
 ; X64-AVX-NEXT:    fldt vf80(%rip)
 ; X64-AVX-NEXT:    fstpt (%rsp)
-; X64-AVX-NEXT:    wait
 ; X64-AVX-NEXT:    callq __extendxftf2@PLT
 ; X64-AVX-NEXT:    vmovaps %xmm0, vf128(%rip)
 ; X64-AVX-NEXT:    addq $24, %rsp
diff --git a/llvm/test/CodeGen/X86/fp80-strict-libcalls.ll b/llvm/test/CodeGen/X86/fp80-strict-libcalls.ll
index 4d50b15e5c185b..9462acd53c4392 100644
--- a/llvm/test/CodeGen/X86/fp80-strict-libcalls.ll
+++ b/llvm/test/CodeGen/X86/fp80-strict-libcalls.ll
@@ -12,7 +12,6 @@ define x86_fp80 @fma(x86_fp80 %x, x86_fp80 %y, x86_fp80 %z) nounwind strictfp {
 ; X86-NEXT:    fstpt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll fmal
 ; X86-NEXT:    addl $36, %esp
 ; X86-NEXT:    retl
@@ -26,7 +25,6 @@ define x86_fp80 @fma(x86_fp80 %x, x86_fp80 %y, x86_fp80 %z) nounwind strictfp {
 ; X64-NEXT:    fstpt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq fmal@PLT
 ; X64-NEXT:    addq $56, %rsp
 ; X64-NEXT:    retq
@@ -43,7 +41,6 @@ define x86_fp80 @frem(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll fmodl
 ; X86-NEXT:    addl $24, %esp
 ; X86-NEXT:    retl
@@ -55,7 +52,6 @@ define x86_fp80 @frem(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq fmodl@PLT
 ; X64-NEXT:    addq $40, %rsp
 ; X64-NEXT:    retq
@@ -70,7 +66,6 @@ define x86_fp80 @ceil(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll ceill
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -80,7 +75,6 @@ define x86_fp80 @ceil(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq ceill@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -95,7 +89,6 @@ define x86_fp80 @cos(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll cosl
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -105,7 +98,6 @@ define x86_fp80 @cos(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq cosl@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -120,7 +112,6 @@ define x86_fp80 @exp(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll expl
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -130,7 +121,6 @@ define x86_fp80 @exp(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq expl@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -145,7 +135,6 @@ define x86_fp80 @exp2(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll exp2l
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -155,7 +144,6 @@ define x86_fp80 @exp2(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq exp2l@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -170,7 +158,6 @@ define x86_fp80 @floor(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll floorl
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -180,7 +167,6 @@ define x86_fp80 @floor(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq floorl@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -195,7 +181,6 @@ define x86_fp80 @log(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll logl
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -205,7 +190,6 @@ define x86_fp80 @log(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq logl@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -220,7 +204,6 @@ define x86_fp80 @log10(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll log10l
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -230,7 +213,6 @@ define x86_fp80 @log10(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq log10l@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -245,7 +227,6 @@ define x86_fp80 @log2(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll log2l
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -255,7 +236,6 @@ define x86_fp80 @log2(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq log2l@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -272,7 +252,6 @@ define x86_fp80 @maxnum(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll fmaxl
 ; X86-NEXT:    addl $24, %esp
 ; X86-NEXT:    retl
@@ -284,7 +263,6 @@ define x86_fp80 @maxnum(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq fmaxl@PLT
 ; X64-NEXT:    addq $40, %rsp
 ; X64-NEXT:    retq
@@ -301,7 +279,6 @@ define x86_fp80 @minnum(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll fminl
 ; X86-NEXT:    addl $24, %esp
 ; X86-NEXT:    retl
@@ -313,7 +290,6 @@ define x86_fp80 @minnum(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq fminl@PLT
 ; X64-NEXT:    addq $40, %rsp
 ; X64-NEXT:    retq
@@ -328,7 +304,6 @@ define x86_fp80 @nearbyint(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll nearbyintl
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -338,7 +313,6 @@ define x86_fp80 @nearbyint(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq nearbyintl@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -355,7 +329,6 @@ define x86_fp80 @pow(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll powl
 ; X86-NEXT:    addl $24, %esp
 ; X86-NEXT:    retl
@@ -367,7 +340,6 @@ define x86_fp80 @pow(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq powl@PLT
 ; X64-NEXT:    addq $40, %rsp
 ; X64-NEXT:    retq
@@ -385,7 +357,6 @@ define x86_fp80 @powi(x86_fp80 %x, i32 %y) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
 ; X86-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll __powixf2
 ; X86-NEXT:    addl $16, %esp
 ; X86-NEXT:    retl
@@ -395,7 +366,6 @@ define x86_fp80 @powi(x86_fp80 %x, i32 %y) nounwind strictfp {
 ; X64-NEXT:...
[truncated]

Copy link

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Discourse for more information.

@AtariDreams AtariDreams changed the title [X86] Resolve FIXME: Add X86::FPCW as a rounding control register [X86] Resolve FIXME: Add FPCW as a rounding control register Feb 21, 2024
@phoebewang
Copy link
Contributor

In tests, this seems to only cause the wait instruction to disappear, which I do not see as necessary in these particular cases.

I think the wait instructions are intended. The affected tests are all for constrained intrinsics which care the precise exception. Per to SDM

Synchronization problems occur in the time between the moment when the exception is signaled and when it is
actually handled. Because of concurrent execution, integer or system instructions can be executed during this
time.

We don't know what instructions a function call may execute, so should emit wait for conservative.

@AtariDreams
Copy link
Contributor Author

In tests, this seems to only cause the wait instruction to disappear, which I do not see as necessary in these particular cases.

I think the wait instructions are intended. The affected tests are all for constrained intrinsics which care the precise exception. Per to SDM

Synchronization problems occur in the time between the moment when the exception is signaled and when it is
actually handled. Because of concurrent execution, integer or system instructions can be executed during this
time.

We don't know what instructions a function call may execute, so should emit wait for conservative.

In that case, let's fix the wait instruction to do that.

@AtariDreams
Copy link
Contributor Author

In tests, this seems to only cause the wait instruction to disappear, which I do not see as necessary in these particular cases.

I think the wait instructions are intended. The affected tests are all for constrained intrinsics which care the precise exception. Per to SDM

Synchronization problems occur in the time between the moment when the exception is signaled and when it is
actually handled. Because of concurrent execution, integer or system instructions can be executed during this
time.

We don't know what instructions a function call may execute, so should emit wait for conservative.

In that case, I think I have a solution. Question: do you know if implicit-defs for function calls are counted as operands in a call instruction?

@AtariDreams
Copy link
Contributor Author

@phoebewang So what do you think about that now?

@topperc
Copy link
Collaborator

topperc commented Feb 21, 2024

Is it possible to test this change? I don't know what getRoundingControlRegisters is used for.

@AtariDreams
Copy link
Contributor Author

Is it possible to test this change? I don't know what getRoundingControlRegisters is used for.

It's for adding those registers to the usedregs array so they aren't marked as killed between function calls.

This had the side effect of causing the wait insertion pass to not consider call sites as places to insert wait before, so as a result I'm sure other workarounds had to be done.

@topperc
Copy link
Collaborator

topperc commented Feb 21, 2024

Is it possible to test this change? I don't know what getRoundingControlRegisters is used for.

It's for adding those registers to the usedregs array so they aren't marked as killed between function calls.

This had the side effect of causing the wait insertion pass to not consider call sites as places to insert wait before, so as a result I'm sure other workarounds had to be done.

My point is that there should be some other test change that proves that this patch does something useful. Making a change and then another change to undo the only observed effect doesn't seem useful by itself.

@phoebewang
Copy link
Contributor

Agreed with @topperc. The origion code was to solve #59305. The problem seems exist on X87 too. You may modify llvm/test/CodeGen/X86/pr59305.ll by adding ; RUN: llc -mtriple=i686-pc-linux < %s | FileCheck %s

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Contributor

@KanRobert KanRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sorry for missing phoebe's suggestion

The reason adding fpcr broke tests is because that caused LLVM to no longer kill the instruction before a call, which prevented LLVM from treating x87 as an operand, which meant the call was not eligible for a wait before it as a result.

This patch now has LLVM add wait to the end of a x87 instruction is a call is immediately after.

To prevent tests from breaking, another fix had to be made: Now, we check if the instruction after a waiting instruction is a call, and if so, we insert the wait.
@AtariDreams
Copy link
Contributor Author

@phoebewang @KanRobert I do not have commit permissions.

@phoebewang phoebewang merged commit 3e40c96 into llvm:main Mar 5, 2024
4 checks passed
@AtariDreams AtariDreams deleted the other branch March 5, 2024 18:01
@MaskRay
Copy link
Member

MaskRay commented Mar 5, 2024

This commit is associated with a users.noreply.github.com email address. Per
https://discourse.llvm.org/t/hidden-emails-on-github-should-we-do-something-about-it/74223/32 , proper email addresses are preferred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants