[X86] Resolve FIXME: Add FPCW as a rounding control register #82452

AtariDreams · 2024-02-21T01:54:33Z

To prevent tests from breaking, another fix had to be made: Now, we check if the instruction after a waiting instruction is a call, and if so, we insert the wait.

llvmbot · 2024-02-21T01:55:04Z

@llvm/pr-subscribers-backend-x86

Author: AtariDreams (AtariDreams)

Changes

In tests, this seems to only cause the wait instruction to disappear, which I do not see as necessary in these particular cases.

Patch is 29.15 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/82452.diff

10 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLoweringCall.cpp (+1-3)
(modified) llvm/test/CodeGen/X86/fp-intrinsics.ll (-19)
(modified) llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll (-9)
(modified) llvm/test/CodeGen/X86/fp-strict-scalar-cmp.ll (-1)
(modified) llvm/test/CodeGen/X86/fp-strict-scalar.ll (-2)
(modified) llvm/test/CodeGen/X86/fp128-cast-strict.ll (+1-3)
(modified) llvm/test/CodeGen/X86/fp80-strict-libcalls.ll (-48)
(modified) llvm/test/CodeGen/X86/half-constrained.ll (-7)
(modified) llvm/test/CodeGen/X86/ldexp-f80.ll (-1)
(modified) llvm/test/CodeGen/X86/vec-strict-128.ll (-2)

diff --git a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
index be8275c92e11ae..27784dc99202a0 100644
--- a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+++ b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
@@ -670,9 +670,7 @@ const MCPhysReg *X86TargetLowering::getScratchRegisters(CallingConv::ID) const {
 }
 
 ArrayRef<MCPhysReg> X86TargetLowering::getRoundingControlRegisters() const {
-  // FIXME: We should def X86::FPCW for x87 as well. But it affects a lot of lit
-  // tests at the moment, which is not what we expected.
-  static const MCPhysReg RCRegs[] = {X86::MXCSR};
+  static const MCPhysReg RCRegs[] = {X86::MXCSR, X86::FPCW};
   return RCRegs;
 }
 
diff --git a/llvm/test/CodeGen/X86/fp-intrinsics.ll b/llvm/test/CodeGen/X86/fp-intrinsics.ll
index d2b45ee1e03e63..1387cac7fe1a70 100644
--- a/llvm/test/CodeGen/X86/fp-intrinsics.ll
+++ b/llvm/test/CodeGen/X86/fp-intrinsics.ll
@@ -301,7 +301,6 @@ define double @f6() #0 {
 ; X87-NEXT:    fstpl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll pow
 ; X87-NEXT:    addl $28, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -413,7 +412,6 @@ define double @f8() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll sin
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -464,7 +462,6 @@ define double @f9() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll cos
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -515,7 +512,6 @@ define double @f10() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll exp
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -566,7 +562,6 @@ define double @f11() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll exp2
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -617,7 +612,6 @@ define double @f12() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll log
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -668,7 +662,6 @@ define double @f13() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll log10
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -719,7 +712,6 @@ define double @f14() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll log2
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -770,7 +762,6 @@ define double @f15() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll rint
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -818,7 +809,6 @@ define double @f16() #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{\.?LCPI[0-9]+_[0-9]+}}
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll nearbyint
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -867,7 +857,6 @@ define double @f19() #0 {
 ; X87-NEXT:    fstpl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fld1
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll fmod
 ; X87-NEXT:    addl $28, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1607,7 +1596,6 @@ define i32 @f23(double %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll lrint
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1655,7 +1643,6 @@ define i32 @f24(float %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstps (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll lrintf
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1703,7 +1690,6 @@ define i64 @f25(double %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll llrint
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1751,7 +1737,6 @@ define i64 @f26(float %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstps (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll llrintf
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1799,7 +1784,6 @@ define i32 @f27(double %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll lround
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1846,7 +1830,6 @@ define i32 @f28(float %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstps (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll lroundf
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1893,7 +1876,6 @@ define i64 @f29(double %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    fldl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll llround
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
@@ -1940,7 +1922,6 @@ define i64 @f30(float %x) #0 {
 ; X87-NEXT:    .cfi_def_cfa_offset 16
 ; X87-NEXT:    flds {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstps (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll llroundf
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    .cfi_def_cfa_offset 4
diff --git a/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll b/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
index 1bc308bef8cccf..c9dd8a7374569f 100644
--- a/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll
@@ -7,7 +7,6 @@ define float @ceil(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _ceil
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -24,7 +23,6 @@ define float @cos(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _cos
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -41,7 +39,6 @@ define float @exp(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _exp
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -58,7 +55,6 @@ define float @floor(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _floor
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -78,7 +74,6 @@ define float @frem(float %x, float %y) #0 {
 ; CHECK-NEXT:    fxch %st(1)
 ; CHECK-NEXT:    fstpl {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _fmod
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -95,7 +90,6 @@ define float @log(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _log
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -112,7 +106,6 @@ define float @log10(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _log10
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -132,7 +125,6 @@ define float @pow(float %x, float %y) #0 {
 ; CHECK-NEXT:    fxch %st(1)
 ; CHECK-NEXT:    fstpl {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _pow
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
@@ -149,7 +141,6 @@ define float @sin(float %x) #0 {
 ; CHECK-NEXT:    subl $12, %esp
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    fstpl (%esp)
-; CHECK-NEXT:    wait
 ; CHECK-NEXT:    calll _sin
 ; CHECK-NEXT:    fstps {{[0-9]+}}(%esp)
 ; CHECK-NEXT:    flds {{[0-9]+}}(%esp)
diff --git a/llvm/test/CodeGen/X86/fp-strict-scalar-cmp.ll b/llvm/test/CodeGen/X86/fp-strict-scalar-cmp.ll
index cb1876fee05aea..47a83e9ae581a2 100644
--- a/llvm/test/CodeGen/X86/fp-strict-scalar-cmp.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-scalar-cmp.ll
@@ -4186,7 +4186,6 @@ define void @foo(float %0, float %1) #0 {
 ; X87-CMOV-NEXT:    flds {{[0-9]+}}(%esp)
 ; X87-CMOV-NEXT:    fucompi %st(1), %st
 ; X87-CMOV-NEXT:    fstp %st(0)
-; X87-CMOV-NEXT:    wait
 ; X87-CMOV-NEXT:    ja bar # TAILCALL
 ; X87-CMOV-NEXT:  # %bb.1:
 ; X87-CMOV-NEXT:    retl
diff --git a/llvm/test/CodeGen/X86/fp-strict-scalar.ll b/llvm/test/CodeGen/X86/fp-strict-scalar.ll
index f1be74f5c3ac4c..5939bb4189b7ab 100644
--- a/llvm/test/CodeGen/X86/fp-strict-scalar.ll
+++ b/llvm/test/CodeGen/X86/fp-strict-scalar.ll
@@ -660,7 +660,6 @@ define double @fma_f64(double %a, double %b, double %c) nounwind strictfp {
 ; X87-NEXT:    fstpl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstpl {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstpl (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll fma
 ; X87-NEXT:    addl $24, %esp
 ; X87-NEXT:    retl
@@ -717,7 +716,6 @@ define float @fma_f32(float %a, float %b, float %c) nounwind strictfp {
 ; X87-NEXT:    fstps {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstps {{[0-9]+}}(%esp)
 ; X87-NEXT:    fstps (%esp)
-; X87-NEXT:    wait
 ; X87-NEXT:    calll fmaf
 ; X87-NEXT:    addl $12, %esp
 ; X87-NEXT:    retl
diff --git a/llvm/test/CodeGen/X86/fp128-cast-strict.ll b/llvm/test/CodeGen/X86/fp128-cast-strict.ll
index f141153d059acb..a7cc2ae49e005f 100644
--- a/llvm/test/CodeGen/X86/fp128-cast-strict.ll
+++ b/llvm/test/CodeGen/X86/fp128-cast-strict.ll
@@ -28,7 +28,7 @@ define dso_local void @TestFPExtF16_F128() nounwind strictfp {
 ; X64-AVX512-LABEL: TestFPExtF16_F128:
 ; X64-AVX512:       # %bb.0: # %entry
 ; X64-AVX512-NEXT:    pushq %rax
-; X64-AVX512-NEXT:    vmovsh vf16(%rip), %xmm0
+; X64-AVX512-NEXT:    vmovsh {{.*#+}} xmm0 = mem[0],zero,zero,zero,zero,zero,zero,zero
 ; X64-AVX512-NEXT:    callq __extendhftf2@PLT
 ; X64-AVX512-NEXT:    vmovaps %xmm0, vf128(%rip)
 ; X64-AVX512-NEXT:    popq %rax
@@ -167,7 +167,6 @@ define dso_local void @TestFPExtF80_F128() nounwind strictfp {
 ; X64-SSE-NEXT:    subq $24, %rsp
 ; X64-SSE-NEXT:    fldt vf80(%rip)
 ; X64-SSE-NEXT:    fstpt (%rsp)
-; X64-SSE-NEXT:    wait
 ; X64-SSE-NEXT:    callq __extendxftf2@PLT
 ; X64-SSE-NEXT:    movaps %xmm0, vf128(%rip)
 ; X64-SSE-NEXT:    addq $24, %rsp
@@ -178,7 +177,6 @@ define dso_local void @TestFPExtF80_F128() nounwind strictfp {
 ; X64-AVX-NEXT:    subq $24, %rsp
 ; X64-AVX-NEXT:    fldt vf80(%rip)
 ; X64-AVX-NEXT:    fstpt (%rsp)
-; X64-AVX-NEXT:    wait
 ; X64-AVX-NEXT:    callq __extendxftf2@PLT
 ; X64-AVX-NEXT:    vmovaps %xmm0, vf128(%rip)
 ; X64-AVX-NEXT:    addq $24, %rsp
diff --git a/llvm/test/CodeGen/X86/fp80-strict-libcalls.ll b/llvm/test/CodeGen/X86/fp80-strict-libcalls.ll
index 4d50b15e5c185b..9462acd53c4392 100644
--- a/llvm/test/CodeGen/X86/fp80-strict-libcalls.ll
+++ b/llvm/test/CodeGen/X86/fp80-strict-libcalls.ll
@@ -12,7 +12,6 @@ define x86_fp80 @fma(x86_fp80 %x, x86_fp80 %y, x86_fp80 %z) nounwind strictfp {
 ; X86-NEXT:    fstpt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll fmal
 ; X86-NEXT:    addl $36, %esp
 ; X86-NEXT:    retl
@@ -26,7 +25,6 @@ define x86_fp80 @fma(x86_fp80 %x, x86_fp80 %y, x86_fp80 %z) nounwind strictfp {
 ; X64-NEXT:    fstpt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq fmal@PLT
 ; X64-NEXT:    addq $56, %rsp
 ; X64-NEXT:    retq
@@ -43,7 +41,6 @@ define x86_fp80 @frem(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll fmodl
 ; X86-NEXT:    addl $24, %esp
 ; X86-NEXT:    retl
@@ -55,7 +52,6 @@ define x86_fp80 @frem(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq fmodl@PLT
 ; X64-NEXT:    addq $40, %rsp
 ; X64-NEXT:    retq
@@ -70,7 +66,6 @@ define x86_fp80 @ceil(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll ceill
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -80,7 +75,6 @@ define x86_fp80 @ceil(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq ceill@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -95,7 +89,6 @@ define x86_fp80 @cos(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll cosl
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -105,7 +98,6 @@ define x86_fp80 @cos(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq cosl@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -120,7 +112,6 @@ define x86_fp80 @exp(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll expl
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -130,7 +121,6 @@ define x86_fp80 @exp(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq expl@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -145,7 +135,6 @@ define x86_fp80 @exp2(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll exp2l
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -155,7 +144,6 @@ define x86_fp80 @exp2(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq exp2l@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -170,7 +158,6 @@ define x86_fp80 @floor(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll floorl
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -180,7 +167,6 @@ define x86_fp80 @floor(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq floorl@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -195,7 +181,6 @@ define x86_fp80 @log(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll logl
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -205,7 +190,6 @@ define x86_fp80 @log(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq logl@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -220,7 +204,6 @@ define x86_fp80 @log10(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll log10l
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -230,7 +213,6 @@ define x86_fp80 @log10(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq log10l@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -245,7 +227,6 @@ define x86_fp80 @log2(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll log2l
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -255,7 +236,6 @@ define x86_fp80 @log2(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq log2l@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -272,7 +252,6 @@ define x86_fp80 @maxnum(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll fmaxl
 ; X86-NEXT:    addl $24, %esp
 ; X86-NEXT:    retl
@@ -284,7 +263,6 @@ define x86_fp80 @maxnum(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq fmaxl@PLT
 ; X64-NEXT:    addq $40, %rsp
 ; X64-NEXT:    retq
@@ -301,7 +279,6 @@ define x86_fp80 @minnum(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll fminl
 ; X86-NEXT:    addl $24, %esp
 ; X86-NEXT:    retl
@@ -313,7 +290,6 @@ define x86_fp80 @minnum(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq fminl@PLT
 ; X64-NEXT:    addq $40, %rsp
 ; X64-NEXT:    retq
@@ -328,7 +304,6 @@ define x86_fp80 @nearbyint(x86_fp80 %x) nounwind strictfp {
 ; X86-NEXT:    subl $12, %esp
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll nearbyintl
 ; X86-NEXT:    addl $12, %esp
 ; X86-NEXT:    retl
@@ -338,7 +313,6 @@ define x86_fp80 @nearbyint(x86_fp80 %x) nounwind strictfp {
 ; X64-NEXT:    subq $24, %rsp
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq nearbyintl@PLT
 ; X64-NEXT:    addq $24, %rsp
 ; X64-NEXT:    retq
@@ -355,7 +329,6 @@ define x86_fp80 @pow(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X86-NEXT:    fldt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll powl
 ; X86-NEXT:    addl $24, %esp
 ; X86-NEXT:    retl
@@ -367,7 +340,6 @@ define x86_fp80 @pow(x86_fp80 %x, x86_fp80 %y) nounwind strictfp {
 ; X64-NEXT:    fldt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt {{[0-9]+}}(%rsp)
 ; X64-NEXT:    fstpt (%rsp)
-; X64-NEXT:    wait
 ; X64-NEXT:    callq powl@PLT
 ; X64-NEXT:    addq $40, %rsp
 ; X64-NEXT:    retq
@@ -385,7 +357,6 @@ define x86_fp80 @powi(x86_fp80 %x, i32 %y) nounwind strictfp {
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
 ; X86-NEXT:    movl %eax, {{[0-9]+}}(%esp)
 ; X86-NEXT:    fstpt (%esp)
-; X86-NEXT:    wait
 ; X86-NEXT:    calll __powixf2
 ; X86-NEXT:    addl $16, %esp
 ; X86-NEXT:    retl
@@ -395,7 +366,6 @@ define x86_fp80 @powi(x86_fp80 %x, i32 %y) nounwind strictfp {
 ; X64-NEXT:...
[truncated]

github-actions · 2024-02-21T01:55:43Z

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Discourse for more information.

phoebewang · 2024-02-21T08:41:17Z

In tests, this seems to only cause the wait instruction to disappear, which I do not see as necessary in these particular cases.

I think the wait instructions are intended. The affected tests are all for constrained intrinsics which care the precise exception. Per to SDM

Synchronization problems occur in the time between the moment when the exception is signaled and when it is
actually handled. Because of concurrent execution, integer or system instructions can be executed during this
time.

We don't know what instructions a function call may execute, so should emit wait for conservative.

AtariDreams · 2024-02-21T19:20:00Z

In tests, this seems to only cause the wait instruction to disappear, which I do not see as necessary in these particular cases.

I think the wait instructions are intended. The affected tests are all for constrained intrinsics which care the precise exception. Per to SDM

Synchronization problems occur in the time between the moment when the exception is signaled and when it is
actually handled. Because of concurrent execution, integer or system instructions can be executed during this
time.

We don't know what instructions a function call may execute, so should emit wait for conservative.

In that case, let's fix the wait instruction to do that.

AtariDreams · 2024-02-21T19:27:36Z

In tests, this seems to only cause the wait instruction to disappear, which I do not see as necessary in these particular cases.

I think the wait instructions are intended. The affected tests are all for constrained intrinsics which care the precise exception. Per to SDM

Synchronization problems occur in the time between the moment when the exception is signaled and when it is
actually handled. Because of concurrent execution, integer or system instructions can be executed during this
time.

We don't know what instructions a function call may execute, so should emit wait for conservative.

In that case, I think I have a solution. Question: do you know if implicit-defs for function calls are counted as operands in a call instruction?

AtariDreams · 2024-02-21T20:48:04Z

@phoebewang So what do you think about that now?

llvm/lib/Target/X86/X86RegisterInfo.td

topperc · 2024-02-21T21:27:11Z

Is it possible to test this change? I don't know what getRoundingControlRegisters is used for.

AtariDreams · 2024-02-21T22:06:33Z

Is it possible to test this change? I don't know what getRoundingControlRegisters is used for.

It's for adding those registers to the usedregs array so they aren't marked as killed between function calls.

This had the side effect of causing the wait insertion pass to not consider call sites as places to insert wait before, so as a result I'm sure other workarounds had to be done.

topperc · 2024-02-21T22:41:41Z

Is it possible to test this change? I don't know what getRoundingControlRegisters is used for.

It's for adding those registers to the usedregs array so they aren't marked as killed between function calls.

This had the side effect of causing the wait insertion pass to not consider call sites as places to insert wait before, so as a result I'm sure other workarounds had to be done.

My point is that there should be some other test change that proves that this patch does something useful. Making a change and then another change to undo the only observed effect doesn't seem useful by itself.

phoebewang · 2024-02-22T03:21:55Z

Agreed with @topperc. The origion code was to solve #59305. The problem seems exist on X87 too. You may modify llvm/test/CodeGen/X86/pr59305.ll by adding ; RUN: llc -mtriple=i686-pc-linux < %s | FileCheck %s

llvm/lib/Target/X86/X86InsertWait.cpp

phoebewang

LGTM, thanks!

llvm/lib/Target/X86/X86InstrInfo.cpp

KanRobert

LGTM, sorry for missing phoebe's suggestion

The reason adding fpcr broke tests is because that caused LLVM to no longer kill the instruction before a call, which prevented LLVM from treating x87 as an operand, which meant the call was not eligible for a wait before it as a result. This patch now has LLVM add wait to the end of a x87 instruction is a call is immediately after. To prevent tests from breaking, another fix had to be made: Now, we check if the instruction after a waiting instruction is a call, and if so, we insert the wait.

AtariDreams · 2024-03-04T16:54:25Z

@phoebewang @KanRobert I do not have commit permissions.

MaskRay · 2024-03-05T18:39:40Z

This commit is associated with a users.noreply.github.com email address. Per
https://discourse.llvm.org/t/hidden-emails-on-github-should-we-do-something-about-it/74223/32 , proper email addresses are preferred.

llvmbot added the backend:X86 label Feb 21, 2024

AtariDreams force-pushed the other branch from a227ca1 to 6c2452a Compare February 21, 2024 01:59

AtariDreams changed the title ~~[X86] Resolve FIXME: Add X86::FPCW as a rounding control register~~ [X86] Resolve FIXME: Add FPCW as a rounding control register Feb 21, 2024

AtariDreams force-pushed the other branch from 6c2452a to 7e29bab Compare February 21, 2024 02:28

RKSimon requested review from phoebewang and RKSimon February 21, 2024 09:40

AtariDreams force-pushed the other branch 5 times, most recently from 09d9ee3 to 16fcef2 Compare February 21, 2024 19:19

AtariDreams force-pushed the other branch 5 times, most recently from 93c408b to 60ac384 Compare February 21, 2024 20:41

topperc reviewed Feb 21, 2024

View reviewed changes

llvm/lib/Target/X86/X86RegisterInfo.td Outdated Show resolved Hide resolved

AtariDreams force-pushed the other branch 2 times, most recently from 4b4552e to 986e88a Compare February 22, 2024 00:50

phoebewang reviewed Feb 22, 2024

View reviewed changes

llvm/lib/Target/X86/X86InsertWait.cpp Outdated Show resolved Hide resolved

AtariDreams force-pushed the other branch 9 times, most recently from 3d0c0f6 to 542ce93 Compare February 23, 2024 00:36

AtariDreams force-pushed the other branch 3 times, most recently from 48d68f5 to 82e6233 Compare March 1, 2024 20:09

AtariDreams requested review from phoebewang and topperc March 1, 2024 20:54

phoebewang reviewed Mar 2, 2024

View reviewed changes

llvm/lib/Target/X86/X86InsertWait.cpp Outdated Show resolved Hide resolved

AtariDreams force-pushed the other branch from 82e6233 to 882a31b Compare March 3, 2024 16:35

AtariDreams requested a review from phoebewang March 3, 2024 16:36

AtariDreams force-pushed the other branch 2 times, most recently from 9938093 to 7df048c Compare March 3, 2024 16:41

phoebewang approved these changes Mar 4, 2024

View reviewed changes

KanRobert requested changes Mar 4, 2024

View reviewed changes

llvm/lib/Target/X86/X86InstrInfo.cpp Show resolved Hide resolved

KanRobert approved these changes Mar 4, 2024

View reviewed changes

AtariDreams force-pushed the other branch from 7df048c to 637777e Compare March 4, 2024 16:48

phoebewang merged commit 3e40c96 into llvm:main Mar 5, 2024
4 checks passed

AtariDreams deleted the other branch March 5, 2024 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[X86] Resolve FIXME: Add FPCW as a rounding control register #82452

[X86] Resolve FIXME: Add FPCW as a rounding control register #82452

AtariDreams commented Feb 21, 2024 •

edited

llvmbot commented Feb 21, 2024

github-actions bot commented Feb 21, 2024

phoebewang commented Feb 21, 2024

AtariDreams commented Feb 21, 2024

AtariDreams commented Feb 21, 2024

AtariDreams commented Feb 21, 2024

topperc commented Feb 21, 2024

AtariDreams commented Feb 21, 2024

topperc commented Feb 21, 2024

phoebewang commented Feb 22, 2024

phoebewang left a comment

KanRobert left a comment

AtariDreams commented Mar 4, 2024

MaskRay commented Mar 5, 2024

[X86] Resolve FIXME: Add FPCW as a rounding control register #82452

[X86] Resolve FIXME: Add FPCW as a rounding control register #82452

Conversation

AtariDreams commented Feb 21, 2024 • edited

llvmbot commented Feb 21, 2024

github-actions bot commented Feb 21, 2024

phoebewang commented Feb 21, 2024

AtariDreams commented Feb 21, 2024

AtariDreams commented Feb 21, 2024

AtariDreams commented Feb 21, 2024

topperc commented Feb 21, 2024

AtariDreams commented Feb 21, 2024

topperc commented Feb 21, 2024

phoebewang commented Feb 22, 2024

phoebewang left a comment

Choose a reason for hiding this comment

KanRobert left a comment

Choose a reason for hiding this comment

AtariDreams commented Mar 4, 2024

MaskRay commented Mar 5, 2024

AtariDreams commented Feb 21, 2024 •

edited