Skip to content

Don't use libcalls for i128<->fp conversion on 32-bit targets#196330

Open
nikic wants to merge 3 commits into
llvm:mainfrom
nikic:i128tof
Open

Don't use libcalls for i128<->fp conversion on 32-bit targets#196330
nikic wants to merge 3 commits into
llvm:mainfrom
nikic:i128tof

Conversation

@nikic
Copy link
Copy Markdown
Contributor

@nikic nikic commented May 7, 2026

The libcalls are only available on 64-bit.

I wonder whether I should be dropping some of the test coverage affected here. The generated implementations are very large, and I'm not sure all of these tests really provide value. It's not like anyone will be able to determine correctness based on the generated assembly.

Fixes #196107.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

🪟 Windows x64 Test Results

  • 134521 tests passed
  • 3260 tests skipped

✅ The build succeeded and all tests passed.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

🐧 Linux x64 Test Results

  • 195139 tests passed
  • 5193 tests skipped

✅ The build succeeded and all tests passed.

@llvmorg-github-actions
Copy link
Copy Markdown

llvmorg-github-actions Bot commented May 7, 2026

@llvm/pr-subscribers-backend-webassembly

@llvm/pr-subscribers-backend-risc-v

Author: Nikita Popov (nikic)

Changes

The libcalls are only available on 64-bit.

I wonder whether I should be dropping some of the test coverage affected here. The generated implementations are very large, and I'm not sure all of these tests really provide value. It's not like anyone will be able to determine correctness based on the generated assembly.

Fixes #196107.


Patch is 1.29 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/196330.diff

18 Files Affected:

  • (modified) llvm/lib/CodeGen/ExpandIRInsts.cpp (+6-1)
  • (modified) llvm/lib/CodeGen/TargetLoweringBase.cpp (+2-1)
  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (-2)
  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (-2)
  • (modified) llvm/test/CodeGen/ARM/fpclamptosat.ll (+4957-1257)
  • (modified) llvm/test/CodeGen/ARM/fpclamptosat_vec.ll (+2804-968)
  • (modified) llvm/test/CodeGen/ARM/fptosi-sat-scalar.ll (+1895-961)
  • (modified) llvm/test/CodeGen/ARM/fptoui-sat-scalar.ll (+947-679)
  • (modified) llvm/test/CodeGen/Thumb2/mve-fpclamptosat_vec.ll (+2406-569)
  • (modified) llvm/test/CodeGen/Thumb2/mve-fptosi-sat-vector.ll (+2136-1845)
  • (modified) llvm/test/CodeGen/Thumb2/mve-fptoui-sat-vector.ll (+1075-1011)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vqdmulh.ll (+224-47)
  • (modified) llvm/test/CodeGen/WebAssembly/fpclamptosat.ll (+1494-254)
  • (modified) llvm/test/CodeGen/WebAssembly/fpclamptosat_vec.ll (+1990-594)
  • (modified) llvm/test/CodeGen/X86/avx512fp16-cvt.ll (+367-58)
  • (modified) llvm/test/CodeGen/X86/fp128-cast.ll (+771-77)
  • (modified) llvm/test/CodeGen/X86/fptosi-sat-scalar.ll (+2316-912)
  • (modified) llvm/test/CodeGen/X86/fptoui-sat-scalar.ll (+1207-731)
diff --git a/llvm/lib/CodeGen/ExpandIRInsts.cpp b/llvm/lib/CodeGen/ExpandIRInsts.cpp
index be7400fac1daf..c7946417cf1b8 100644
--- a/llvm/lib/CodeGen/ExpandIRInsts.cpp
+++ b/llvm/lib/CodeGen/ExpandIRInsts.cpp
@@ -1133,7 +1133,12 @@ static void scalarize(Instruction *I,
     else if (auto *CastI = dyn_cast<CastInst>(I))
       NewOp = Builder.CreateCast(CastI->getOpcode(), Ext,
                                  I->getType()->getScalarType());
-    else
+    else if (auto *II = dyn_cast<IntrinsicInst>(I)) {
+      assert(II->getIntrinsicID() == Intrinsic::fptoui_sat ||
+             II->getIntrinsicID() == Intrinsic::fptosi_sat);
+      NewOp = Builder.CreateIntrinsic(VTy->getElementType(),
+                                      II->getIntrinsicID(), Ext);
+    } else
       llvm_unreachable("Unsupported instruction type");
 
     Result = Builder.CreateInsertElement(Result, NewOp, Idx);
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 7dcdf7f952e27..0d9e64b3aba03 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1048,7 +1048,8 @@ TargetLoweringBase::TargetLoweringBase(const TargetMachine &tm,
   // division.
   MaxDivRemBitWidthSupported = 128;
 
-  MaxLargeFPConvertBitWidthSupported = 128;
+  MaxLargeFPConvertBitWidthSupported =
+      TM.getTargetTriple().isArch64Bit() ? 128 : 64;
 
   MinCmpXchgSizeInBits = 0;
   SupportsUnalignedAtomics = false;
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index d126616312748..3d469e018c25d 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1895,8 +1895,6 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
 
   setMaxDivRemBitWidthSupported(Subtarget.is64Bit() ? 128 : 64);
 
-  setMaxLargeFPConvertBitWidthSupported(Subtarget.is64Bit() ? 128 : 64);
-
   // Disable strict node mutation.
   IsStrictFPEnabled = true;
   EnableExtLdPromotion = true;
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 5a1171b2b4ee6..7954102fbd91f 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -174,8 +174,6 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
 
   setMaxDivRemBitWidthSupported(Subtarget.is64Bit() ? 128 : 64);
 
-  setMaxLargeFPConvertBitWidthSupported(128);
-
   // Set up the register classes.
   addRegisterClass(MVT::i8, &X86::GR8RegClass);
   addRegisterClass(MVT::i16, &X86::GR16RegClass);
diff --git a/llvm/test/CodeGen/ARM/fpclamptosat.ll b/llvm/test/CodeGen/ARM/fpclamptosat.ll
index a6f0a03fc7e5b..3cb1c30f63a7b 100644
--- a/llvm/test/CodeGen/ARM/fpclamptosat.ll
+++ b/llvm/test/CodeGen/ARM/fpclamptosat.ll
@@ -943,126 +943,536 @@ entry:
 
 define i64 @stest_f64i64(double %x) {
 ; SOFT-LABEL: stest_f64i64:
-; SOFT:       @ %bb.0: @ %entry
+; SOFT:       @ %bb.0: @ %entryfp-to-i-entry
 ; SOFT-NEXT:    .save {r4, r5, r6, r7, lr}
 ; SOFT-NEXT:    push {r4, r5, r6, r7, lr}
-; SOFT-NEXT:    .pad #4
-; SOFT-NEXT:    sub sp, #4
-; SOFT-NEXT:    bl __fixdfti
+; SOFT-NEXT:    .pad #68
+; SOFT-NEXT:    sub sp, #68
+; SOFT-NEXT:    movs r7, #1
 ; SOFT-NEXT:    movs r4, #0
-; SOFT-NEXT:    mvns r5, r4
-; SOFT-NEXT:    ldr r6, .LCPI18_0
-; SOFT-NEXT:    adds r7, r0, #1
+; SOFT-NEXT:    ldr r3, .LCPI18_1
+; SOFT-NEXT:    mov r5, r3
+; SOFT-NEXT:    subs r5, #52
+; SOFT-NEXT:    mov r6, r4
+; SOFT-NEXT:    sbcs r6, r4
+; SOFT-NEXT:    lsls r2, r1, #1
+; SOFT-NEXT:    lsrs r2, r2, #21
+; SOFT-NEXT:    subs r5, r2, r5
+; SOFT-NEXT:    mov r5, r4
+; SOFT-NEXT:    sbcs r5, r6
+; SOFT-NEXT:    str r7, [sp, #28] @ 4-byte Spill
+; SOFT-NEXT:    bhs .LBB18_2
+; SOFT-NEXT:  @ %bb.1:
+; SOFT-NEXT:    mov r0, r4
+; SOFT-NEXT:    mov r6, r4
+; SOFT-NEXT:    mov r2, r4
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    b .LBB18_5
+; SOFT-NEXT:  .LBB18_2: @ %fp-to-i-if-check.exp.size
+; SOFT-NEXT:    asrs r5, r1, #31
+; SOFT-NEXT:    str r5, [sp, #24] @ 4-byte Spill
+; SOFT-NEXT:    orrs r5, r7
+; SOFT-NEXT:    str r5, [sp, #20] @ 4-byte Spill
+; SOFT-NEXT:    ldr r5, .LCPI18_0
+; SOFT-NEXT:    mvns r6, r5
+; SOFT-NEXT:    mvns r1, r1
+; SOFT-NEXT:    orrs r1, r6
+; SOFT-NEXT:    subs r1, r5, r1
+; SOFT-NEXT:    mvns r0, r0
+; SOFT-NEXT:    mvns r0, r0
+; SOFT-NEXT:    cmp r3, r2
+; SOFT-NEXT:    bls .LBB18_4
+; SOFT-NEXT:  @ %bb.3: @ %fp-to-i-if-exp.small
+; SOFT-NEXT:    subs r2, r3, r2
+; SOFT-NEXT:    bl __aeabi_llsr
 ; SOFT-NEXT:    mov r7, r1
-; SOFT-NEXT:    sbcs r7, r6
+; SOFT-NEXT:    str r1, [sp, #12] @ 4-byte Spill
+; SOFT-NEXT:    mov r5, r0
+; SOFT-NEXT:    str r0, [sp, #8] @ 4-byte Spill
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    ldr r6, [sp, #20] @ 4-byte Reload
+; SOFT-NEXT:    mov r2, r6
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    str r0, [sp, #16] @ 4-byte Spill
+; SOFT-NEXT:    str r1, [sp, #4] @ 4-byte Spill
+; SOFT-NEXT:    mov r0, r7
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    mov r2, r6
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    mov r7, r1
+; SOFT-NEXT:    ldr r1, [sp, #4] @ 4-byte Reload
+; SOFT-NEXT:    adds r6, r0, r1
+; SOFT-NEXT:    adcs r7, r4
+; SOFT-NEXT:    mov r0, r5
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    ldr r5, [sp, #24] @ 4-byte Reload
+; SOFT-NEXT:    mov r2, r5
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    adds r6, r0, r6
+; SOFT-NEXT:    adcs r1, r4
+; SOFT-NEXT:    adds r0, r7, r1
+; SOFT-NEXT:    str r0, [sp, #20] @ 4-byte Spill
+; SOFT-NEXT:    mov r7, r4
+; SOFT-NEXT:    adcs r7, r4
+; SOFT-NEXT:    ldr r0, [sp, #12] @ 4-byte Reload
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    mov r2, r5
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    mov r5, r1
+; SOFT-NEXT:    ldr r1, [sp, #20] @ 4-byte Reload
+; SOFT-NEXT:    adds r0, r0, r1
+; SOFT-NEXT:    str r0, [sp, #20] @ 4-byte Spill
+; SOFT-NEXT:    adcs r5, r7
+; SOFT-NEXT:    ldr r0, [sp, #24] @ 4-byte Reload
+; SOFT-NEXT:    mov r1, r0
+; SOFT-NEXT:    ldr r2, [sp, #8] @ 4-byte Reload
+; SOFT-NEXT:    ldr r3, [sp, #12] @ 4-byte Reload
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    mov r2, r0
+; SOFT-NEXT:    ldr r0, [sp, #16] @ 4-byte Reload
+; SOFT-NEXT:    ldr r3, [sp, #20] @ 4-byte Reload
+; SOFT-NEXT:    adds r2, r3, r2
+; SOFT-NEXT:    adcs r1, r5
+; SOFT-NEXT:    b .LBB18_5
+; SOFT-NEXT:  .LBB18_4: @ %fp-to-i-if-exp.large
+; SOFT-NEXT:    str r4, [sp, #60]
+; SOFT-NEXT:    str r4, [sp, #44]
+; SOFT-NEXT:    str r4, [sp, #40]
+; SOFT-NEXT:    str r4, [sp, #36]
+; SOFT-NEXT:    str r4, [sp, #32]
+; SOFT-NEXT:    add r3, sp, #48
+; SOFT-NEXT:    stm r3!, {r0, r1, r4}
+; SOFT-NEXT:    ldr r0, .LCPI18_2
+; SOFT-NEXT:    adds r6, r2, r0
+; SOFT-NEXT:    movs r7, #31
+; SOFT-NEXT:    lsrs r0, r6, #3
+; SOFT-NEXT:    ands r6, r7
+; SOFT-NEXT:    movs r1, #12
+; SOFT-NEXT:    ands r1, r0
+; SOFT-NEXT:    add r0, sp, #32
+; SOFT-NEXT:    adds r0, #16
+; SOFT-NEXT:    subs r5, r0, r1
+; SOFT-NEXT:    ldr r0, [r5, #4]
+; SOFT-NEXT:    str r0, [sp, #16] @ 4-byte Spill
+; SOFT-NEXT:    ldr r0, [r5, #8]
+; SOFT-NEXT:    ldr r1, [r5, #12]
+; SOFT-NEXT:    lsrs r3, r0, #1
+; SOFT-NEXT:    lsls r0, r6
+; SOFT-NEXT:    eors r7, r6
+; SOFT-NEXT:    ldr r2, [sp, #16] @ 4-byte Reload
+; SOFT-NEXT:    lsrs r2, r2, #1
+; SOFT-NEXT:    lsrs r2, r7
+; SOFT-NEXT:    orrs r2, r0
+; SOFT-NEXT:    lsls r1, r6
+; SOFT-NEXT:    lsrs r3, r7
+; SOFT-NEXT:    orrs r3, r1
+; SOFT-NEXT:    ldr r0, [sp, #20] @ 4-byte Reload
+; SOFT-NEXT:    ldr r1, [sp, #24] @ 4-byte Reload
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    str r0, [sp, #12] @ 4-byte Spill
+; SOFT-NEXT:    str r1, [sp, #8] @ 4-byte Spill
+; SOFT-NEXT:    ldr r1, [sp, #16] @ 4-byte Reload
+; SOFT-NEXT:    lsls r1, r6
+; SOFT-NEXT:    ldr r5, [r5]
+; SOFT-NEXT:    lsrs r0, r5, #1
+; SOFT-NEXT:    lsrs r0, r7
+; SOFT-NEXT:    orrs r0, r1
+; SOFT-NEXT:    mov r3, r0
+; SOFT-NEXT:    lsls r5, r6
+; SOFT-NEXT:    ldr r0, [sp, #24] @ 4-byte Reload
+; SOFT-NEXT:    mov r1, r0
+; SOFT-NEXT:    mov r2, r5
+; SOFT-NEXT:    mov r6, r3
+; SOFT-NEXT:    str r3, [sp, #4] @ 4-byte Spill
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    ldr r2, [sp, #12] @ 4-byte Reload
+; SOFT-NEXT:    adds r0, r0, r2
+; SOFT-NEXT:    str r0, [sp, #12] @ 4-byte Spill
+; SOFT-NEXT:    ldr r0, [sp, #8] @ 4-byte Reload
+; SOFT-NEXT:    adcs r1, r0
+; SOFT-NEXT:    str r1, [sp, #8] @ 4-byte Spill
+; SOFT-NEXT:    mov r0, r6
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    ldr r6, [sp, #20] @ 4-byte Reload
+; SOFT-NEXT:    mov r2, r6
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    str r0, [sp] @ 4-byte Spill
+; SOFT-NEXT:    mov r7, r1
+; SOFT-NEXT:    mov r0, r5
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    mov r2, r6
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    str r0, [sp, #16] @ 4-byte Spill
+; SOFT-NEXT:    ldr r0, [sp] @ 4-byte Reload
+; SOFT-NEXT:    adds r6, r0, r1
+; SOFT-NEXT:    adcs r7, r4
+; SOFT-NEXT:    mov r0, r5
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    ldr r2, [sp, #24] @ 4-byte Reload
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    adds r6, r0, r6
+; SOFT-NEXT:    adcs r1, r4
+; SOFT-NEXT:    adds r5, r7, r1
+; SOFT-NEXT:    mov r7, r4
+; SOFT-NEXT:    adcs r7, r4
+; SOFT-NEXT:    ldr r0, [sp, #4] @ 4-byte Reload
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    ldr r2, [sp, #24] @ 4-byte Reload
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    mov r2, r0
+; SOFT-NEXT:    ldr r0, [sp, #16] @ 4-byte Reload
+; SOFT-NEXT:    adds r2, r2, r5
+; SOFT-NEXT:    adcs r1, r7
+; SOFT-NEXT:    ldr r3, [sp, #12] @ 4-byte Reload
+; SOFT-NEXT:    adds r2, r2, r3
+; SOFT-NEXT:    ldr r3, [sp, #8] @ 4-byte Reload
+; SOFT-NEXT:    adcs r1, r3
+; SOFT-NEXT:  .LBB18_5: @ %fp-to-i-cleanup
+; SOFT-NEXT:    mvns r3, r4
+; SOFT-NEXT:    ldr r5, .LCPI18_3
+; SOFT-NEXT:    adds r7, r0, #1
+; SOFT-NEXT:    mov r7, r6
+; SOFT-NEXT:    sbcs r7, r5
 ; SOFT-NEXT:    mov r7, r2
 ; SOFT-NEXT:    sbcs r7, r4
-; SOFT-NEXT:    mov r7, r3
+; SOFT-NEXT:    mov r7, r1
 ; SOFT-NEXT:    sbcs r7, r4
-; SOFT-NEXT:    bge .LBB18_8
-; SOFT-NEXT:  @ %bb.1: @ %entry
-; SOFT-NEXT:    bge .LBB18_9
-; SOFT-NEXT:  .LBB18_2: @ %entry
-; SOFT-NEXT:    bge .LBB18_10
-; SOFT-NEXT:  .LBB18_3: @ %entry
-; SOFT-NEXT:    blt .LBB18_5
-; SOFT-NEXT:  .LBB18_4: @ %entry
-; SOFT-NEXT:    mov r0, r5
-; SOFT-NEXT:  .LBB18_5: @ %entry
-; SOFT-NEXT:    movs r6, #1
-; SOFT-NEXT:    lsls r6, r6, #31
+; SOFT-NEXT:    bge .LBB18_15
+; SOFT-NEXT:  @ %bb.6: @ %fp-to-i-cleanup
+; SOFT-NEXT:    ldr r7, [sp, #28] @ 4-byte Reload
+; SOFT-NEXT:    bge .LBB18_16
+; SOFT-NEXT:  .LBB18_7: @ %fp-to-i-cleanup
+; SOFT-NEXT:    bge .LBB18_17
+; SOFT-NEXT:  .LBB18_8: @ %fp-to-i-cleanup
+; SOFT-NEXT:    blt .LBB18_10
+; SOFT-NEXT:  .LBB18_9: @ %fp-to-i-cleanup
+; SOFT-NEXT:    mov r0, r3
+; SOFT-NEXT:  .LBB18_10: @ %fp-to-i-cleanup
+; SOFT-NEXT:    lsls r5, r7, #31
 ; SOFT-NEXT:    rsbs r7, r0, #0
-; SOFT-NEXT:    mov r7, r6
-; SOFT-NEXT:    sbcs r7, r1
 ; SOFT-NEXT:    mov r7, r5
+; SOFT-NEXT:    sbcs r7, r6
+; SOFT-NEXT:    mov r7, r3
 ; SOFT-NEXT:    sbcs r7, r2
-; SOFT-NEXT:    sbcs r5, r3
-; SOFT-NEXT:    bge .LBB18_11
-; SOFT-NEXT:  @ %bb.6: @ %entry
-; SOFT-NEXT:    bge .LBB18_12
-; SOFT-NEXT:  .LBB18_7: @ %entry
-; SOFT-NEXT:    add sp, #4
-; SOFT-NEXT:    pop {r4, r5, r6, r7, pc}
-; SOFT-NEXT:  .LBB18_8: @ %entry
-; SOFT-NEXT:    mov r3, r4
-; SOFT-NEXT:    blt .LBB18_2
-; SOFT-NEXT:  .LBB18_9: @ %entry
-; SOFT-NEXT:    mov r2, r4
-; SOFT-NEXT:    blt .LBB18_3
-; SOFT-NEXT:  .LBB18_10: @ %entry
-; SOFT-NEXT:    mov r1, r6
-; SOFT-NEXT:    bge .LBB18_4
-; SOFT-NEXT:    b .LBB18_5
-; SOFT-NEXT:  .LBB18_11: @ %entry
+; SOFT-NEXT:    sbcs r3, r1
+; SOFT-NEXT:    blt .LBB18_12
+; SOFT-NEXT:  @ %bb.11: @ %fp-to-i-cleanup
 ; SOFT-NEXT:    mov r0, r4
-; SOFT-NEXT:    blt .LBB18_7
-; SOFT-NEXT:  .LBB18_12: @ %entry
+; SOFT-NEXT:  .LBB18_12: @ %fp-to-i-cleanup
+; SOFT-NEXT:    blt .LBB18_14
+; SOFT-NEXT:  @ %bb.13: @ %fp-to-i-cleanup
+; SOFT-NEXT:    mov r6, r5
+; SOFT-NEXT:  .LBB18_14: @ %fp-to-i-cleanup
 ; SOFT-NEXT:    mov r1, r6
-; SOFT-NEXT:    add sp, #4
+; SOFT-NEXT:    add sp, #68
 ; SOFT-NEXT:    pop {r4, r5, r6, r7, pc}
+; SOFT-NEXT:  .LBB18_15: @ %fp-to-i-cleanup
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    ldr r7, [sp, #28] @ 4-byte Reload
+; SOFT-NEXT:    blt .LBB18_7
+; SOFT-NEXT:  .LBB18_16: @ %fp-to-i-cleanup
+; SOFT-NEXT:    mov r2, r4
+; SOFT-NEXT:    blt .LBB18_8
+; SOFT-NEXT:  .LBB18_17: @ %fp-to-i-cleanup
+; SOFT-NEXT:    mov r6, r5
+; SOFT-NEXT:    bge .LBB18_9
+; SOFT-NEXT:    b .LBB18_10
 ; SOFT-NEXT:    .p2align 2
-; SOFT-NEXT:  @ %bb.13:
+; SOFT-NEXT:  @ %bb.18:
 ; SOFT-NEXT:  .LCPI18_0:
+; SOFT-NEXT:    .long 1048575 @ 0xfffff
+; SOFT-NEXT:  .LCPI18_1:
+; SOFT-NEXT:    .long 1075 @ 0x433
+; SOFT-NEXT:  .LCPI18_2:
+; SOFT-NEXT:    .long 4294966221 @ 0xfffffbcd
+; SOFT-NEXT:  .LCPI18_3:
 ; SOFT-NEXT:    .long 2147483647 @ 0x7fffffff
 ;
 ; VFP2-LABEL: stest_f64i64:
-; VFP2:       @ %bb.0: @ %entry
-; VFP2-NEXT:    .save {r4, r5, r7, lr}
-; VFP2-NEXT:    push {r4, r5, r7, lr}
-; VFP2-NEXT:    bl __fixdfti
-; VFP2-NEXT:    subs.w r4, r0, #-1
-; VFP2-NEXT:    mvn lr, #-2147483648
-; VFP2-NEXT:    sbcs.w r4, r1, lr
+; VFP2:       @ %bb.0: @ %entryfp-to-i-entry
+; VFP2-NEXT:    .save {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; VFP2-NEXT:    push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; VFP2-NEXT:    .pad #52
+; VFP2-NEXT:    sub sp, #52
+; VFP2-NEXT:    vmov r3, r1, d0
 ; VFP2-NEXT:    mov.w r12, #0
-; VFP2-NEXT:    sbcs r4, r2, #0
-; VFP2-NEXT:    sbcs r4, r3, #0
+; VFP2-NEXT:    movw r2, #1023
+; VFP2-NEXT:    ubfx r0, r1, #20, #11
+; VFP2-NEXT:    cmp r0, r2
+; VFP2-NEXT:    bhs .LBB18_2
+; VFP2-NEXT:  @ %bb.1:
+; VFP2-NEXT:    movs r0, #0
+; VFP2-NEXT:    movs r3, #0
+; VFP2-NEXT:    mov.w lr, #0
+; VFP2-NEXT:    movs r2, #0
+; VFP2-NEXT:    b .LBB18_5
+; VFP2-NEXT:  .LBB18_2: @ %fp-to-i-if-check.exp.size
+; VFP2-NEXT:    movs r2, #1
+; VFP2-NEXT:    mov r4, r1
+; VFP2-NEXT:    orr.w r10, r2, r1, asr #31
+; VFP2-NEXT:    bfi r4, r2, #20, #12
+; VFP2-NEXT:    asrs r1, r1, #31
+; VFP2-NEXT:    movw r2, #1074
+; VFP2-NEXT:    cmp r0, r2
+; VFP2-NEXT:    bhi .LBB18_4
+; VFP2-NEXT:  @ %bb.3: @ %fp-to-i-if-exp.small
+; VFP2-NEXT:    movw r2, #1075
+; VFP2-NEXT:    subs r2, r2, r0
+; VFP2-NEXT:    lsr.w lr, r3, r2
+; VFP2-NEXT:    rsb.w r3, r2, #32
+; VFP2-NEXT:    lsr.w r2, r4, r2
+; VFP2-NEXT:    lsl.w r3, r4, r3
+; VFP2-NEXT:    orr.w r5, lr, r3
+; VFP2-NEXT:    movw r3, #1043
+; VFP2-NEXT:    subs r0, r3, r0
+; VFP2-NEXT:    it pl
+; VFP2-NEXT:    lsrpl.w r5, r4, r0
+; VFP2-NEXT:    umull r0, r3, r5, r10
 ; VFP2-NEXT:    mov.w r4, #0
+; VFP2-NEXT:    it pl
+; VFP2-NEXT:    movpl r2, #0
+; VFP2-NEXT:    mov.w lr, #0
+; VFP2-NEXT:    umlal r3, r4, r2, r10
+; VFP2-NEXT:    umull r8, r7, r5, r1
+; VFP2-NEXT:    adds.w r6, r8, r3
+; VFP2-NEXT:    adcs.w r6, r4, r7
+; VFP2-NEXT:    umlal r3, r4, r5, r1
+; VFP2-NEXT:    adc r6, lr, #0
+; VFP2-NEXT:    mla r7, r1, r2, r7
+; VFP2-NEXT:    umlal r4, r6, r2, r1
+; VFP2-NEXT:    mla r1, r1, r5, r7
+; VFP2-NEXT:    adds.w lr, r4, r8
+; VFP2-NEXT:    adc.w r2, r6, r1
+; VFP2-NEXT:    b .LBB18_5
+; VFP2-NEXT:  .LBB18_4: @ %fp-to-i-if-exp.large
+; VFP2-NEXT:    add r2, sp, #32
+; VFP2-NEXT:    subw r0, r0, #1075
+; VFP2-NEXT:    mov.w lr, #0
+; VFP2-NEXT:    stm.w r2, {r3, r4, lr}
+; VFP2-NEXT:    movs r2, #12
+; VFP2-NEXT:    add r3, sp, #16
+; VFP2-NEXT:    and.w r2, r2, r0, lsr #3
+; VFP2-NEXT:    adds r3, #16
+; VFP2-NEXT:    str.w lr, [sp, #44]
+; VFP2-NEXT:    strd lr, lr, [sp, #24]
+; VFP2-NEXT:    subs r2, r3, r2
+; VFP2-NEXT:    strd lr, lr, [sp, #16]
+; VFP2-NEXT:    ldrd r3, r4, [r2]
+; VFP2-NEXT:    str r4, [sp, #4] @ 4-byte Spill
+; VFP2-NEXT:    ldr r7, [r2, #8]
+; VFP2-NEXT:    str r7, [sp, #12] @ 4-byte Spill
+; VFP2-NEXT:    lsrs r7, r3, #1
+; VFP2-NEXT:    ldr r2, [r2, #12]
+; VFP2-NEXT:    str r2, [sp, #8] @ 4-byte Spill
+; VFP2-NEXT:    and r2, r0, #31
+; VFP2-NEXT:    eor r12, r2, #31
+; VFP2-NEXT:    lsl.w r5, r3, r2
+; VFP2-NEXT:    lsr.w r0, r7, r12
+; VFP2-NEXT:    lsl.w r7, r4, r2
+; VFP2-NEXT:    orr.w r11, r7, r0
+; VFP2-NEXT:    umull r0, r3, r5, r10
+; VFP2-NEXT:    movs r7, #0
+; VFP2-NEXT:    umull r4, r8, r5, r1
+; VFP2-NEXT:    umlal r3, r7, r11, r10
+; VFP2-NEXT:    str r4, [sp] @ 4-byte Spill
+; VFP2-NEXT:    adds.w r9, r4, r3
+; VFP2-NEXT:    adcs.w r6, r7, r8
+; VFP2-NEXT:    umlal r3, r7, r5, r1
+; VFP2-NEXT:    ldr r6, [sp, #8] @ 4-byte Reload
+; VFP2-NEXT:    adc r9, lr, #0
+; VFP2-NEXT:    umlal r7, r9, r11, r1
+; VFP2-NEXT:    lsl.w lr, r6, r2
+; VFP2-NEXT:    ldr r6, [sp, #12] @ 4-byte Reload
+; VFP2-NEXT:    lsrs r4, r6, #1
+; VFP2-NEXT:    lsl.w r2, r6, r2
+; VFP2-NEXT:    lsr.w r4, r4, r12
+; VFP2-NEXT:    orr.w lr, lr, r4
+; VFP2-NEXT:    ldr r4, [sp, #4] @ 4-byte Reload
+; VFP2-NEXT:    ldr r6, [sp] @ 4-byte Reload
+; VFP2-NEXT:    lsrs.w r4, r4, #1
+; VFP2-NEXT:    lsr.w r4, r4, r12
+; VFP2-NEXT:    orrs r2, r4
+; VFP2-NEXT:    umull r12, r4, r10, r2
+; VFP2-NEXT:    mla r4, r10, lr, r4
+; VFP2-NEXT:    mla r2, r1, r2, r4
+; VFP2-NEXT:    mla r4, r1, r11, r8
+; VFP2-NEXT:    mla r1, r1, r5, r4
+; VFP2-NEXT:    adds.w r5, r6, r12
+; VFP2-NEXT:    mov.w r12, #0
+; VFP2-NEXT:    adcs r1, r2
+; VFP2-NEXT:    adds.w lr, r7, r5
+; VFP2-NEXT:    adc.w r2, r9, r1
+; VFP2-NEXT:  .LBB18_5: @ %fp-to-i-cleanup
+; VFP2-NEXT:    subs.w r7, r0, #-1
+; VFP2-NEXT:    mvn r1, #-2147483648
+; VFP2-NEXT:    sbcs.w r7, r3, r1
+; VFP2-NEXT:    mov.w r6, #-2147483648
+; VFP2-NEXT:    sbcs r7, lr, #0
+; VFP2-NEXT:    sbcs r7, r2, #0
 ; VFP2-NEXT:    it lt
-; VFP2-NEXT:    movlt r4, #1
-; VFP2-NEXT:    cmp r4, #0
-; VFP2-NEXT:    itet eq
-; VFP2-NEXT:    moveq r3, r4
-; VFP2-NEXT:    movne r4, r2
-; VFP2-NEXT:    moveq r1, lr
-; VFP2-NEXT:    mov.w r2, #-1
-; VFP2-NEXT:    it eq
-; VFP2-NEXT:    moveq r0, r2
+; VFP2-NEXT:    movlt r1, r3
+; VFP2-NEXT:    mov.w r7, #0
+; VFP2-NEXT:    mov.w r3, #-1
+; VFP2-NEXT:    it ge
+; VFP2-NEXT:    movge r0, r3
+; VFP2-NEXT:    it lt
+; VFP2-NEXT:    movlt r7, #1
+; VFP2-NEXT:    cmp r7, #0
+; VFP2-NEXT:    ite eq
+; VFP2-NEXT:    moveq r2, r7
+; VFP2-NEXT:    movne r7, lr
 ; VFP2-NEXT:    rsbs r5, r0, #0
-; VFP2-NEXT:    mov.w lr, #-2147483648
-; VFP2-NEXT:    sbcs.w r5, lr, r1
-; VFP2-NEXT:    sbcs.w r4, r2, r4
-; VFP2-NEXT:    sbcs r2, r3
+; VFP2-NEXT:    sbcs.w r5, r6, r1
+; VFP2-NEXT:    sbcs.w r7, r3, r7
+; VFP2-NEXT:    sbcs.w r2, r3, r2
 ; VFP2-NEXT:    itt ge
 ; VFP2-NEXT:    movge r0, r12
-; VFP2-NEXT:    movge r1, lr
-; VFP2-NEXT:    pop {r4, r5, r7, pc}
+; VFP2-NEXT:    movge r1, r6
+; VFP2-NEXT:    add sp, #52
+; VFP2-NEXT:    pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}
 ;
 ; FULL-LABEL: stest_f64i64:
-; FULL:       @ %bb.0: @ %entry
-; FULL-NEXT:    .save {r4, r5, r7, lr}
-; FULL-NEXT:    push {r4, r5, r7, lr}
-; FULL-NEXT:    bl __fixdfti
-; FULL-NEXT:    subs.w lr, r0, #-1
-; FULL-NEXT:    mvn r12, #-2147483648
-; FULL-NEXT:    sbcs.w lr, r1, r12
-; FULL-NEXT:    sbcs lr, r2, #0
-; FULL-NEXT:    sbcs lr, r3, #0
-; FULL-NEXT:    cset lr, lt
-; FULL-NEXT:    cmp.w lr, #0
-; FULL-NEXT:    csel r5, r3, lr, ne
-; FULL-NEXT:    mov.w r3, #-1
-; FULL-NEXT:    csel r0, r0, r3, ne
-; FULL-NEXT:    csel r1, r1, r12, ne
-; FULL-NEXT:    csel r2, r2, lr, ne
-; FULL-NEXT:    rsbs r4, r0, #0
-; FULL-NEXT:    mov.w r12, #-2147483648
-; FULL-NEXT:    sbcs.w r4, r12, r1
-; FULL-NEXT:    sbcs.w r2, r3, r2
-; FULL-NEXT:    sbcs.w r2, r3, r5
-; FULL-NEXT:    it ge
-; FULL-NEXT:    movge r0, #0
-; FULL-NEXT:    csel r1, r1, r12, lt
-; FULL-NEXT:    pop {r4, r5, r7, pc}
+; FULL:       @ %bb.0: @ %entryfp-to-i-entry
+; FULL-NEXT:    .save {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; FULL-NEXT:    push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; FULL-NEXT:    .pad #52
+; FULL-NEXT:    sub sp, #52
+; FULL-NEXT:    vmov r3, r1, d0
+; FULL-NEXT:    mov.w r12, #0
+; FULL-NEXT:    movw r0, #1023
+; FULL-NEXT:    ubfx r2, r1, #20, #11
+; FULL-NEXT:    cmp r2, r0
+; FULL-NEXT:    bhs .LBB18_2
+; FULL-NEXT:  @ %bb.1:
+; FULL-NEXT:    movs r2, #0
+; FULL-NEXT:    movs r3, #0
+; FULL-NEXT:    movs r0, #0
+; FULL-NEXT:    movs r1, #0
+; FULL-NEXT:    b .LBB18_5
+; FULL-NEXT:  .LBB18_2: @ %fp-to-i-if-check.exp.size
+; FULL-NEXT:    movs r0, #1
+; FULL-NEXT:    mov r4,...
[truncated]

@llvmorg-github-actions
Copy link
Copy Markdown

@llvm/pr-subscribers-backend-x86

Author: Nikita Popov (nikic)

Changes

The libcalls are only available on 64-bit.

I wonder whether I should be dropping some of the test coverage affected here. The generated implementations are very large, and I'm not sure all of these tests really provide value. It's not like anyone will be able to determine correctness based on the generated assembly.

Fixes #196107.


Patch is 1.29 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/196330.diff

18 Files Affected:

  • (modified) llvm/lib/CodeGen/ExpandIRInsts.cpp (+6-1)
  • (modified) llvm/lib/CodeGen/TargetLoweringBase.cpp (+2-1)
  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (-2)
  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (-2)
  • (modified) llvm/test/CodeGen/ARM/fpclamptosat.ll (+4957-1257)
  • (modified) llvm/test/CodeGen/ARM/fpclamptosat_vec.ll (+2804-968)
  • (modified) llvm/test/CodeGen/ARM/fptosi-sat-scalar.ll (+1895-961)
  • (modified) llvm/test/CodeGen/ARM/fptoui-sat-scalar.ll (+947-679)
  • (modified) llvm/test/CodeGen/Thumb2/mve-fpclamptosat_vec.ll (+2406-569)
  • (modified) llvm/test/CodeGen/Thumb2/mve-fptosi-sat-vector.ll (+2136-1845)
  • (modified) llvm/test/CodeGen/Thumb2/mve-fptoui-sat-vector.ll (+1075-1011)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vqdmulh.ll (+224-47)
  • (modified) llvm/test/CodeGen/WebAssembly/fpclamptosat.ll (+1494-254)
  • (modified) llvm/test/CodeGen/WebAssembly/fpclamptosat_vec.ll (+1990-594)
  • (modified) llvm/test/CodeGen/X86/avx512fp16-cvt.ll (+367-58)
  • (modified) llvm/test/CodeGen/X86/fp128-cast.ll (+771-77)
  • (modified) llvm/test/CodeGen/X86/fptosi-sat-scalar.ll (+2316-912)
  • (modified) llvm/test/CodeGen/X86/fptoui-sat-scalar.ll (+1207-731)
diff --git a/llvm/lib/CodeGen/ExpandIRInsts.cpp b/llvm/lib/CodeGen/ExpandIRInsts.cpp
index be7400fac1daf..c7946417cf1b8 100644
--- a/llvm/lib/CodeGen/ExpandIRInsts.cpp
+++ b/llvm/lib/CodeGen/ExpandIRInsts.cpp
@@ -1133,7 +1133,12 @@ static void scalarize(Instruction *I,
     else if (auto *CastI = dyn_cast<CastInst>(I))
       NewOp = Builder.CreateCast(CastI->getOpcode(), Ext,
                                  I->getType()->getScalarType());
-    else
+    else if (auto *II = dyn_cast<IntrinsicInst>(I)) {
+      assert(II->getIntrinsicID() == Intrinsic::fptoui_sat ||
+             II->getIntrinsicID() == Intrinsic::fptosi_sat);
+      NewOp = Builder.CreateIntrinsic(VTy->getElementType(),
+                                      II->getIntrinsicID(), Ext);
+    } else
       llvm_unreachable("Unsupported instruction type");
 
     Result = Builder.CreateInsertElement(Result, NewOp, Idx);
diff --git a/llvm/lib/CodeGen/TargetLoweringBase.cpp b/llvm/lib/CodeGen/TargetLoweringBase.cpp
index 7dcdf7f952e27..0d9e64b3aba03 100644
--- a/llvm/lib/CodeGen/TargetLoweringBase.cpp
+++ b/llvm/lib/CodeGen/TargetLoweringBase.cpp
@@ -1048,7 +1048,8 @@ TargetLoweringBase::TargetLoweringBase(const TargetMachine &tm,
   // division.
   MaxDivRemBitWidthSupported = 128;
 
-  MaxLargeFPConvertBitWidthSupported = 128;
+  MaxLargeFPConvertBitWidthSupported =
+      TM.getTargetTriple().isArch64Bit() ? 128 : 64;
 
   MinCmpXchgSizeInBits = 0;
   SupportsUnalignedAtomics = false;
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index d126616312748..3d469e018c25d 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1895,8 +1895,6 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
 
   setMaxDivRemBitWidthSupported(Subtarget.is64Bit() ? 128 : 64);
 
-  setMaxLargeFPConvertBitWidthSupported(Subtarget.is64Bit() ? 128 : 64);
-
   // Disable strict node mutation.
   IsStrictFPEnabled = true;
   EnableExtLdPromotion = true;
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 5a1171b2b4ee6..7954102fbd91f 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -174,8 +174,6 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
 
   setMaxDivRemBitWidthSupported(Subtarget.is64Bit() ? 128 : 64);
 
-  setMaxLargeFPConvertBitWidthSupported(128);
-
   // Set up the register classes.
   addRegisterClass(MVT::i8, &X86::GR8RegClass);
   addRegisterClass(MVT::i16, &X86::GR16RegClass);
diff --git a/llvm/test/CodeGen/ARM/fpclamptosat.ll b/llvm/test/CodeGen/ARM/fpclamptosat.ll
index a6f0a03fc7e5b..3cb1c30f63a7b 100644
--- a/llvm/test/CodeGen/ARM/fpclamptosat.ll
+++ b/llvm/test/CodeGen/ARM/fpclamptosat.ll
@@ -943,126 +943,536 @@ entry:
 
 define i64 @stest_f64i64(double %x) {
 ; SOFT-LABEL: stest_f64i64:
-; SOFT:       @ %bb.0: @ %entry
+; SOFT:       @ %bb.0: @ %entryfp-to-i-entry
 ; SOFT-NEXT:    .save {r4, r5, r6, r7, lr}
 ; SOFT-NEXT:    push {r4, r5, r6, r7, lr}
-; SOFT-NEXT:    .pad #4
-; SOFT-NEXT:    sub sp, #4
-; SOFT-NEXT:    bl __fixdfti
+; SOFT-NEXT:    .pad #68
+; SOFT-NEXT:    sub sp, #68
+; SOFT-NEXT:    movs r7, #1
 ; SOFT-NEXT:    movs r4, #0
-; SOFT-NEXT:    mvns r5, r4
-; SOFT-NEXT:    ldr r6, .LCPI18_0
-; SOFT-NEXT:    adds r7, r0, #1
+; SOFT-NEXT:    ldr r3, .LCPI18_1
+; SOFT-NEXT:    mov r5, r3
+; SOFT-NEXT:    subs r5, #52
+; SOFT-NEXT:    mov r6, r4
+; SOFT-NEXT:    sbcs r6, r4
+; SOFT-NEXT:    lsls r2, r1, #1
+; SOFT-NEXT:    lsrs r2, r2, #21
+; SOFT-NEXT:    subs r5, r2, r5
+; SOFT-NEXT:    mov r5, r4
+; SOFT-NEXT:    sbcs r5, r6
+; SOFT-NEXT:    str r7, [sp, #28] @ 4-byte Spill
+; SOFT-NEXT:    bhs .LBB18_2
+; SOFT-NEXT:  @ %bb.1:
+; SOFT-NEXT:    mov r0, r4
+; SOFT-NEXT:    mov r6, r4
+; SOFT-NEXT:    mov r2, r4
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    b .LBB18_5
+; SOFT-NEXT:  .LBB18_2: @ %fp-to-i-if-check.exp.size
+; SOFT-NEXT:    asrs r5, r1, #31
+; SOFT-NEXT:    str r5, [sp, #24] @ 4-byte Spill
+; SOFT-NEXT:    orrs r5, r7
+; SOFT-NEXT:    str r5, [sp, #20] @ 4-byte Spill
+; SOFT-NEXT:    ldr r5, .LCPI18_0
+; SOFT-NEXT:    mvns r6, r5
+; SOFT-NEXT:    mvns r1, r1
+; SOFT-NEXT:    orrs r1, r6
+; SOFT-NEXT:    subs r1, r5, r1
+; SOFT-NEXT:    mvns r0, r0
+; SOFT-NEXT:    mvns r0, r0
+; SOFT-NEXT:    cmp r3, r2
+; SOFT-NEXT:    bls .LBB18_4
+; SOFT-NEXT:  @ %bb.3: @ %fp-to-i-if-exp.small
+; SOFT-NEXT:    subs r2, r3, r2
+; SOFT-NEXT:    bl __aeabi_llsr
 ; SOFT-NEXT:    mov r7, r1
-; SOFT-NEXT:    sbcs r7, r6
+; SOFT-NEXT:    str r1, [sp, #12] @ 4-byte Spill
+; SOFT-NEXT:    mov r5, r0
+; SOFT-NEXT:    str r0, [sp, #8] @ 4-byte Spill
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    ldr r6, [sp, #20] @ 4-byte Reload
+; SOFT-NEXT:    mov r2, r6
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    str r0, [sp, #16] @ 4-byte Spill
+; SOFT-NEXT:    str r1, [sp, #4] @ 4-byte Spill
+; SOFT-NEXT:    mov r0, r7
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    mov r2, r6
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    mov r7, r1
+; SOFT-NEXT:    ldr r1, [sp, #4] @ 4-byte Reload
+; SOFT-NEXT:    adds r6, r0, r1
+; SOFT-NEXT:    adcs r7, r4
+; SOFT-NEXT:    mov r0, r5
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    ldr r5, [sp, #24] @ 4-byte Reload
+; SOFT-NEXT:    mov r2, r5
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    adds r6, r0, r6
+; SOFT-NEXT:    adcs r1, r4
+; SOFT-NEXT:    adds r0, r7, r1
+; SOFT-NEXT:    str r0, [sp, #20] @ 4-byte Spill
+; SOFT-NEXT:    mov r7, r4
+; SOFT-NEXT:    adcs r7, r4
+; SOFT-NEXT:    ldr r0, [sp, #12] @ 4-byte Reload
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    mov r2, r5
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    mov r5, r1
+; SOFT-NEXT:    ldr r1, [sp, #20] @ 4-byte Reload
+; SOFT-NEXT:    adds r0, r0, r1
+; SOFT-NEXT:    str r0, [sp, #20] @ 4-byte Spill
+; SOFT-NEXT:    adcs r5, r7
+; SOFT-NEXT:    ldr r0, [sp, #24] @ 4-byte Reload
+; SOFT-NEXT:    mov r1, r0
+; SOFT-NEXT:    ldr r2, [sp, #8] @ 4-byte Reload
+; SOFT-NEXT:    ldr r3, [sp, #12] @ 4-byte Reload
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    mov r2, r0
+; SOFT-NEXT:    ldr r0, [sp, #16] @ 4-byte Reload
+; SOFT-NEXT:    ldr r3, [sp, #20] @ 4-byte Reload
+; SOFT-NEXT:    adds r2, r3, r2
+; SOFT-NEXT:    adcs r1, r5
+; SOFT-NEXT:    b .LBB18_5
+; SOFT-NEXT:  .LBB18_4: @ %fp-to-i-if-exp.large
+; SOFT-NEXT:    str r4, [sp, #60]
+; SOFT-NEXT:    str r4, [sp, #44]
+; SOFT-NEXT:    str r4, [sp, #40]
+; SOFT-NEXT:    str r4, [sp, #36]
+; SOFT-NEXT:    str r4, [sp, #32]
+; SOFT-NEXT:    add r3, sp, #48
+; SOFT-NEXT:    stm r3!, {r0, r1, r4}
+; SOFT-NEXT:    ldr r0, .LCPI18_2
+; SOFT-NEXT:    adds r6, r2, r0
+; SOFT-NEXT:    movs r7, #31
+; SOFT-NEXT:    lsrs r0, r6, #3
+; SOFT-NEXT:    ands r6, r7
+; SOFT-NEXT:    movs r1, #12
+; SOFT-NEXT:    ands r1, r0
+; SOFT-NEXT:    add r0, sp, #32
+; SOFT-NEXT:    adds r0, #16
+; SOFT-NEXT:    subs r5, r0, r1
+; SOFT-NEXT:    ldr r0, [r5, #4]
+; SOFT-NEXT:    str r0, [sp, #16] @ 4-byte Spill
+; SOFT-NEXT:    ldr r0, [r5, #8]
+; SOFT-NEXT:    ldr r1, [r5, #12]
+; SOFT-NEXT:    lsrs r3, r0, #1
+; SOFT-NEXT:    lsls r0, r6
+; SOFT-NEXT:    eors r7, r6
+; SOFT-NEXT:    ldr r2, [sp, #16] @ 4-byte Reload
+; SOFT-NEXT:    lsrs r2, r2, #1
+; SOFT-NEXT:    lsrs r2, r7
+; SOFT-NEXT:    orrs r2, r0
+; SOFT-NEXT:    lsls r1, r6
+; SOFT-NEXT:    lsrs r3, r7
+; SOFT-NEXT:    orrs r3, r1
+; SOFT-NEXT:    ldr r0, [sp, #20] @ 4-byte Reload
+; SOFT-NEXT:    ldr r1, [sp, #24] @ 4-byte Reload
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    str r0, [sp, #12] @ 4-byte Spill
+; SOFT-NEXT:    str r1, [sp, #8] @ 4-byte Spill
+; SOFT-NEXT:    ldr r1, [sp, #16] @ 4-byte Reload
+; SOFT-NEXT:    lsls r1, r6
+; SOFT-NEXT:    ldr r5, [r5]
+; SOFT-NEXT:    lsrs r0, r5, #1
+; SOFT-NEXT:    lsrs r0, r7
+; SOFT-NEXT:    orrs r0, r1
+; SOFT-NEXT:    mov r3, r0
+; SOFT-NEXT:    lsls r5, r6
+; SOFT-NEXT:    ldr r0, [sp, #24] @ 4-byte Reload
+; SOFT-NEXT:    mov r1, r0
+; SOFT-NEXT:    mov r2, r5
+; SOFT-NEXT:    mov r6, r3
+; SOFT-NEXT:    str r3, [sp, #4] @ 4-byte Spill
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    ldr r2, [sp, #12] @ 4-byte Reload
+; SOFT-NEXT:    adds r0, r0, r2
+; SOFT-NEXT:    str r0, [sp, #12] @ 4-byte Spill
+; SOFT-NEXT:    ldr r0, [sp, #8] @ 4-byte Reload
+; SOFT-NEXT:    adcs r1, r0
+; SOFT-NEXT:    str r1, [sp, #8] @ 4-byte Spill
+; SOFT-NEXT:    mov r0, r6
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    ldr r6, [sp, #20] @ 4-byte Reload
+; SOFT-NEXT:    mov r2, r6
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    str r0, [sp] @ 4-byte Spill
+; SOFT-NEXT:    mov r7, r1
+; SOFT-NEXT:    mov r0, r5
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    mov r2, r6
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    str r0, [sp, #16] @ 4-byte Spill
+; SOFT-NEXT:    ldr r0, [sp] @ 4-byte Reload
+; SOFT-NEXT:    adds r6, r0, r1
+; SOFT-NEXT:    adcs r7, r4
+; SOFT-NEXT:    mov r0, r5
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    ldr r2, [sp, #24] @ 4-byte Reload
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    adds r6, r0, r6
+; SOFT-NEXT:    adcs r1, r4
+; SOFT-NEXT:    adds r5, r7, r1
+; SOFT-NEXT:    mov r7, r4
+; SOFT-NEXT:    adcs r7, r4
+; SOFT-NEXT:    ldr r0, [sp, #4] @ 4-byte Reload
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    ldr r2, [sp, #24] @ 4-byte Reload
+; SOFT-NEXT:    mov r3, r4
+; SOFT-NEXT:    bl __aeabi_lmul
+; SOFT-NEXT:    mov r2, r0
+; SOFT-NEXT:    ldr r0, [sp, #16] @ 4-byte Reload
+; SOFT-NEXT:    adds r2, r2, r5
+; SOFT-NEXT:    adcs r1, r7
+; SOFT-NEXT:    ldr r3, [sp, #12] @ 4-byte Reload
+; SOFT-NEXT:    adds r2, r2, r3
+; SOFT-NEXT:    ldr r3, [sp, #8] @ 4-byte Reload
+; SOFT-NEXT:    adcs r1, r3
+; SOFT-NEXT:  .LBB18_5: @ %fp-to-i-cleanup
+; SOFT-NEXT:    mvns r3, r4
+; SOFT-NEXT:    ldr r5, .LCPI18_3
+; SOFT-NEXT:    adds r7, r0, #1
+; SOFT-NEXT:    mov r7, r6
+; SOFT-NEXT:    sbcs r7, r5
 ; SOFT-NEXT:    mov r7, r2
 ; SOFT-NEXT:    sbcs r7, r4
-; SOFT-NEXT:    mov r7, r3
+; SOFT-NEXT:    mov r7, r1
 ; SOFT-NEXT:    sbcs r7, r4
-; SOFT-NEXT:    bge .LBB18_8
-; SOFT-NEXT:  @ %bb.1: @ %entry
-; SOFT-NEXT:    bge .LBB18_9
-; SOFT-NEXT:  .LBB18_2: @ %entry
-; SOFT-NEXT:    bge .LBB18_10
-; SOFT-NEXT:  .LBB18_3: @ %entry
-; SOFT-NEXT:    blt .LBB18_5
-; SOFT-NEXT:  .LBB18_4: @ %entry
-; SOFT-NEXT:    mov r0, r5
-; SOFT-NEXT:  .LBB18_5: @ %entry
-; SOFT-NEXT:    movs r6, #1
-; SOFT-NEXT:    lsls r6, r6, #31
+; SOFT-NEXT:    bge .LBB18_15
+; SOFT-NEXT:  @ %bb.6: @ %fp-to-i-cleanup
+; SOFT-NEXT:    ldr r7, [sp, #28] @ 4-byte Reload
+; SOFT-NEXT:    bge .LBB18_16
+; SOFT-NEXT:  .LBB18_7: @ %fp-to-i-cleanup
+; SOFT-NEXT:    bge .LBB18_17
+; SOFT-NEXT:  .LBB18_8: @ %fp-to-i-cleanup
+; SOFT-NEXT:    blt .LBB18_10
+; SOFT-NEXT:  .LBB18_9: @ %fp-to-i-cleanup
+; SOFT-NEXT:    mov r0, r3
+; SOFT-NEXT:  .LBB18_10: @ %fp-to-i-cleanup
+; SOFT-NEXT:    lsls r5, r7, #31
 ; SOFT-NEXT:    rsbs r7, r0, #0
-; SOFT-NEXT:    mov r7, r6
-; SOFT-NEXT:    sbcs r7, r1
 ; SOFT-NEXT:    mov r7, r5
+; SOFT-NEXT:    sbcs r7, r6
+; SOFT-NEXT:    mov r7, r3
 ; SOFT-NEXT:    sbcs r7, r2
-; SOFT-NEXT:    sbcs r5, r3
-; SOFT-NEXT:    bge .LBB18_11
-; SOFT-NEXT:  @ %bb.6: @ %entry
-; SOFT-NEXT:    bge .LBB18_12
-; SOFT-NEXT:  .LBB18_7: @ %entry
-; SOFT-NEXT:    add sp, #4
-; SOFT-NEXT:    pop {r4, r5, r6, r7, pc}
-; SOFT-NEXT:  .LBB18_8: @ %entry
-; SOFT-NEXT:    mov r3, r4
-; SOFT-NEXT:    blt .LBB18_2
-; SOFT-NEXT:  .LBB18_9: @ %entry
-; SOFT-NEXT:    mov r2, r4
-; SOFT-NEXT:    blt .LBB18_3
-; SOFT-NEXT:  .LBB18_10: @ %entry
-; SOFT-NEXT:    mov r1, r6
-; SOFT-NEXT:    bge .LBB18_4
-; SOFT-NEXT:    b .LBB18_5
-; SOFT-NEXT:  .LBB18_11: @ %entry
+; SOFT-NEXT:    sbcs r3, r1
+; SOFT-NEXT:    blt .LBB18_12
+; SOFT-NEXT:  @ %bb.11: @ %fp-to-i-cleanup
 ; SOFT-NEXT:    mov r0, r4
-; SOFT-NEXT:    blt .LBB18_7
-; SOFT-NEXT:  .LBB18_12: @ %entry
+; SOFT-NEXT:  .LBB18_12: @ %fp-to-i-cleanup
+; SOFT-NEXT:    blt .LBB18_14
+; SOFT-NEXT:  @ %bb.13: @ %fp-to-i-cleanup
+; SOFT-NEXT:    mov r6, r5
+; SOFT-NEXT:  .LBB18_14: @ %fp-to-i-cleanup
 ; SOFT-NEXT:    mov r1, r6
-; SOFT-NEXT:    add sp, #4
+; SOFT-NEXT:    add sp, #68
 ; SOFT-NEXT:    pop {r4, r5, r6, r7, pc}
+; SOFT-NEXT:  .LBB18_15: @ %fp-to-i-cleanup
+; SOFT-NEXT:    mov r1, r4
+; SOFT-NEXT:    ldr r7, [sp, #28] @ 4-byte Reload
+; SOFT-NEXT:    blt .LBB18_7
+; SOFT-NEXT:  .LBB18_16: @ %fp-to-i-cleanup
+; SOFT-NEXT:    mov r2, r4
+; SOFT-NEXT:    blt .LBB18_8
+; SOFT-NEXT:  .LBB18_17: @ %fp-to-i-cleanup
+; SOFT-NEXT:    mov r6, r5
+; SOFT-NEXT:    bge .LBB18_9
+; SOFT-NEXT:    b .LBB18_10
 ; SOFT-NEXT:    .p2align 2
-; SOFT-NEXT:  @ %bb.13:
+; SOFT-NEXT:  @ %bb.18:
 ; SOFT-NEXT:  .LCPI18_0:
+; SOFT-NEXT:    .long 1048575 @ 0xfffff
+; SOFT-NEXT:  .LCPI18_1:
+; SOFT-NEXT:    .long 1075 @ 0x433
+; SOFT-NEXT:  .LCPI18_2:
+; SOFT-NEXT:    .long 4294966221 @ 0xfffffbcd
+; SOFT-NEXT:  .LCPI18_3:
 ; SOFT-NEXT:    .long 2147483647 @ 0x7fffffff
 ;
 ; VFP2-LABEL: stest_f64i64:
-; VFP2:       @ %bb.0: @ %entry
-; VFP2-NEXT:    .save {r4, r5, r7, lr}
-; VFP2-NEXT:    push {r4, r5, r7, lr}
-; VFP2-NEXT:    bl __fixdfti
-; VFP2-NEXT:    subs.w r4, r0, #-1
-; VFP2-NEXT:    mvn lr, #-2147483648
-; VFP2-NEXT:    sbcs.w r4, r1, lr
+; VFP2:       @ %bb.0: @ %entryfp-to-i-entry
+; VFP2-NEXT:    .save {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; VFP2-NEXT:    push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; VFP2-NEXT:    .pad #52
+; VFP2-NEXT:    sub sp, #52
+; VFP2-NEXT:    vmov r3, r1, d0
 ; VFP2-NEXT:    mov.w r12, #0
-; VFP2-NEXT:    sbcs r4, r2, #0
-; VFP2-NEXT:    sbcs r4, r3, #0
+; VFP2-NEXT:    movw r2, #1023
+; VFP2-NEXT:    ubfx r0, r1, #20, #11
+; VFP2-NEXT:    cmp r0, r2
+; VFP2-NEXT:    bhs .LBB18_2
+; VFP2-NEXT:  @ %bb.1:
+; VFP2-NEXT:    movs r0, #0
+; VFP2-NEXT:    movs r3, #0
+; VFP2-NEXT:    mov.w lr, #0
+; VFP2-NEXT:    movs r2, #0
+; VFP2-NEXT:    b .LBB18_5
+; VFP2-NEXT:  .LBB18_2: @ %fp-to-i-if-check.exp.size
+; VFP2-NEXT:    movs r2, #1
+; VFP2-NEXT:    mov r4, r1
+; VFP2-NEXT:    orr.w r10, r2, r1, asr #31
+; VFP2-NEXT:    bfi r4, r2, #20, #12
+; VFP2-NEXT:    asrs r1, r1, #31
+; VFP2-NEXT:    movw r2, #1074
+; VFP2-NEXT:    cmp r0, r2
+; VFP2-NEXT:    bhi .LBB18_4
+; VFP2-NEXT:  @ %bb.3: @ %fp-to-i-if-exp.small
+; VFP2-NEXT:    movw r2, #1075
+; VFP2-NEXT:    subs r2, r2, r0
+; VFP2-NEXT:    lsr.w lr, r3, r2
+; VFP2-NEXT:    rsb.w r3, r2, #32
+; VFP2-NEXT:    lsr.w r2, r4, r2
+; VFP2-NEXT:    lsl.w r3, r4, r3
+; VFP2-NEXT:    orr.w r5, lr, r3
+; VFP2-NEXT:    movw r3, #1043
+; VFP2-NEXT:    subs r0, r3, r0
+; VFP2-NEXT:    it pl
+; VFP2-NEXT:    lsrpl.w r5, r4, r0
+; VFP2-NEXT:    umull r0, r3, r5, r10
 ; VFP2-NEXT:    mov.w r4, #0
+; VFP2-NEXT:    it pl
+; VFP2-NEXT:    movpl r2, #0
+; VFP2-NEXT:    mov.w lr, #0
+; VFP2-NEXT:    umlal r3, r4, r2, r10
+; VFP2-NEXT:    umull r8, r7, r5, r1
+; VFP2-NEXT:    adds.w r6, r8, r3
+; VFP2-NEXT:    adcs.w r6, r4, r7
+; VFP2-NEXT:    umlal r3, r4, r5, r1
+; VFP2-NEXT:    adc r6, lr, #0
+; VFP2-NEXT:    mla r7, r1, r2, r7
+; VFP2-NEXT:    umlal r4, r6, r2, r1
+; VFP2-NEXT:    mla r1, r1, r5, r7
+; VFP2-NEXT:    adds.w lr, r4, r8
+; VFP2-NEXT:    adc.w r2, r6, r1
+; VFP2-NEXT:    b .LBB18_5
+; VFP2-NEXT:  .LBB18_4: @ %fp-to-i-if-exp.large
+; VFP2-NEXT:    add r2, sp, #32
+; VFP2-NEXT:    subw r0, r0, #1075
+; VFP2-NEXT:    mov.w lr, #0
+; VFP2-NEXT:    stm.w r2, {r3, r4, lr}
+; VFP2-NEXT:    movs r2, #12
+; VFP2-NEXT:    add r3, sp, #16
+; VFP2-NEXT:    and.w r2, r2, r0, lsr #3
+; VFP2-NEXT:    adds r3, #16
+; VFP2-NEXT:    str.w lr, [sp, #44]
+; VFP2-NEXT:    strd lr, lr, [sp, #24]
+; VFP2-NEXT:    subs r2, r3, r2
+; VFP2-NEXT:    strd lr, lr, [sp, #16]
+; VFP2-NEXT:    ldrd r3, r4, [r2]
+; VFP2-NEXT:    str r4, [sp, #4] @ 4-byte Spill
+; VFP2-NEXT:    ldr r7, [r2, #8]
+; VFP2-NEXT:    str r7, [sp, #12] @ 4-byte Spill
+; VFP2-NEXT:    lsrs r7, r3, #1
+; VFP2-NEXT:    ldr r2, [r2, #12]
+; VFP2-NEXT:    str r2, [sp, #8] @ 4-byte Spill
+; VFP2-NEXT:    and r2, r0, #31
+; VFP2-NEXT:    eor r12, r2, #31
+; VFP2-NEXT:    lsl.w r5, r3, r2
+; VFP2-NEXT:    lsr.w r0, r7, r12
+; VFP2-NEXT:    lsl.w r7, r4, r2
+; VFP2-NEXT:    orr.w r11, r7, r0
+; VFP2-NEXT:    umull r0, r3, r5, r10
+; VFP2-NEXT:    movs r7, #0
+; VFP2-NEXT:    umull r4, r8, r5, r1
+; VFP2-NEXT:    umlal r3, r7, r11, r10
+; VFP2-NEXT:    str r4, [sp] @ 4-byte Spill
+; VFP2-NEXT:    adds.w r9, r4, r3
+; VFP2-NEXT:    adcs.w r6, r7, r8
+; VFP2-NEXT:    umlal r3, r7, r5, r1
+; VFP2-NEXT:    ldr r6, [sp, #8] @ 4-byte Reload
+; VFP2-NEXT:    adc r9, lr, #0
+; VFP2-NEXT:    umlal r7, r9, r11, r1
+; VFP2-NEXT:    lsl.w lr, r6, r2
+; VFP2-NEXT:    ldr r6, [sp, #12] @ 4-byte Reload
+; VFP2-NEXT:    lsrs r4, r6, #1
+; VFP2-NEXT:    lsl.w r2, r6, r2
+; VFP2-NEXT:    lsr.w r4, r4, r12
+; VFP2-NEXT:    orr.w lr, lr, r4
+; VFP2-NEXT:    ldr r4, [sp, #4] @ 4-byte Reload
+; VFP2-NEXT:    ldr r6, [sp] @ 4-byte Reload
+; VFP2-NEXT:    lsrs.w r4, r4, #1
+; VFP2-NEXT:    lsr.w r4, r4, r12
+; VFP2-NEXT:    orrs r2, r4
+; VFP2-NEXT:    umull r12, r4, r10, r2
+; VFP2-NEXT:    mla r4, r10, lr, r4
+; VFP2-NEXT:    mla r2, r1, r2, r4
+; VFP2-NEXT:    mla r4, r1, r11, r8
+; VFP2-NEXT:    mla r1, r1, r5, r4
+; VFP2-NEXT:    adds.w r5, r6, r12
+; VFP2-NEXT:    mov.w r12, #0
+; VFP2-NEXT:    adcs r1, r2
+; VFP2-NEXT:    adds.w lr, r7, r5
+; VFP2-NEXT:    adc.w r2, r9, r1
+; VFP2-NEXT:  .LBB18_5: @ %fp-to-i-cleanup
+; VFP2-NEXT:    subs.w r7, r0, #-1
+; VFP2-NEXT:    mvn r1, #-2147483648
+; VFP2-NEXT:    sbcs.w r7, r3, r1
+; VFP2-NEXT:    mov.w r6, #-2147483648
+; VFP2-NEXT:    sbcs r7, lr, #0
+; VFP2-NEXT:    sbcs r7, r2, #0
 ; VFP2-NEXT:    it lt
-; VFP2-NEXT:    movlt r4, #1
-; VFP2-NEXT:    cmp r4, #0
-; VFP2-NEXT:    itet eq
-; VFP2-NEXT:    moveq r3, r4
-; VFP2-NEXT:    movne r4, r2
-; VFP2-NEXT:    moveq r1, lr
-; VFP2-NEXT:    mov.w r2, #-1
-; VFP2-NEXT:    it eq
-; VFP2-NEXT:    moveq r0, r2
+; VFP2-NEXT:    movlt r1, r3
+; VFP2-NEXT:    mov.w r7, #0
+; VFP2-NEXT:    mov.w r3, #-1
+; VFP2-NEXT:    it ge
+; VFP2-NEXT:    movge r0, r3
+; VFP2-NEXT:    it lt
+; VFP2-NEXT:    movlt r7, #1
+; VFP2-NEXT:    cmp r7, #0
+; VFP2-NEXT:    ite eq
+; VFP2-NEXT:    moveq r2, r7
+; VFP2-NEXT:    movne r7, lr
 ; VFP2-NEXT:    rsbs r5, r0, #0
-; VFP2-NEXT:    mov.w lr, #-2147483648
-; VFP2-NEXT:    sbcs.w r5, lr, r1
-; VFP2-NEXT:    sbcs.w r4, r2, r4
-; VFP2-NEXT:    sbcs r2, r3
+; VFP2-NEXT:    sbcs.w r5, r6, r1
+; VFP2-NEXT:    sbcs.w r7, r3, r7
+; VFP2-NEXT:    sbcs.w r2, r3, r2
 ; VFP2-NEXT:    itt ge
 ; VFP2-NEXT:    movge r0, r12
-; VFP2-NEXT:    movge r1, lr
-; VFP2-NEXT:    pop {r4, r5, r7, pc}
+; VFP2-NEXT:    movge r1, r6
+; VFP2-NEXT:    add sp, #52
+; VFP2-NEXT:    pop.w {r4, r5, r6, r7, r8, r9, r10, r11, pc}
 ;
 ; FULL-LABEL: stest_f64i64:
-; FULL:       @ %bb.0: @ %entry
-; FULL-NEXT:    .save {r4, r5, r7, lr}
-; FULL-NEXT:    push {r4, r5, r7, lr}
-; FULL-NEXT:    bl __fixdfti
-; FULL-NEXT:    subs.w lr, r0, #-1
-; FULL-NEXT:    mvn r12, #-2147483648
-; FULL-NEXT:    sbcs.w lr, r1, r12
-; FULL-NEXT:    sbcs lr, r2, #0
-; FULL-NEXT:    sbcs lr, r3, #0
-; FULL-NEXT:    cset lr, lt
-; FULL-NEXT:    cmp.w lr, #0
-; FULL-NEXT:    csel r5, r3, lr, ne
-; FULL-NEXT:    mov.w r3, #-1
-; FULL-NEXT:    csel r0, r0, r3, ne
-; FULL-NEXT:    csel r1, r1, r12, ne
-; FULL-NEXT:    csel r2, r2, lr, ne
-; FULL-NEXT:    rsbs r4, r0, #0
-; FULL-NEXT:    mov.w r12, #-2147483648
-; FULL-NEXT:    sbcs.w r4, r12, r1
-; FULL-NEXT:    sbcs.w r2, r3, r2
-; FULL-NEXT:    sbcs.w r2, r3, r5
-; FULL-NEXT:    it ge
-; FULL-NEXT:    movge r0, #0
-; FULL-NEXT:    csel r1, r1, r12, lt
-; FULL-NEXT:    pop {r4, r5, r7, pc}
+; FULL:       @ %bb.0: @ %entryfp-to-i-entry
+; FULL-NEXT:    .save {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; FULL-NEXT:    push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; FULL-NEXT:    .pad #52
+; FULL-NEXT:    sub sp, #52
+; FULL-NEXT:    vmov r3, r1, d0
+; FULL-NEXT:    mov.w r12, #0
+; FULL-NEXT:    movw r0, #1023
+; FULL-NEXT:    ubfx r2, r1, #20, #11
+; FULL-NEXT:    cmp r2, r0
+; FULL-NEXT:    bhs .LBB18_2
+; FULL-NEXT:  @ %bb.1:
+; FULL-NEXT:    movs r2, #0
+; FULL-NEXT:    movs r3, #0
+; FULL-NEXT:    movs r0, #0
+; FULL-NEXT:    movs r1, #0
+; FULL-NEXT:    b .LBB18_5
+; FULL-NEXT:  .LBB18_2: @ %fp-to-i-if-check.exp.size
+; FULL-NEXT:    movs r0, #1
+; FULL-NEXT:    mov r4,...
[truncated]

; CHECK-NEXT: # %bb.0: # %entry
; CHECK-NEXT: local.get 0
; CHECK-NEXT: i64.trunc_sat_f64_s
; CHECK-NEXT: .local i32, i64, i64, i64, i64
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's being replaced here with an expansion isn't a libcall, it's a regular instruction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For wasm in particular we can actually always allow int128 libcalls, because the type is supported even on wasm32:

bool hasInt128Type() const final { return true; }

Copy link
Copy Markdown
Contributor

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that i128 libcalls are missing because i128 was not a supported type in gcc/clang for most 32bit targets, and so they were not really supported. If we are now changing to supporting i128 then wouldn't it make sense to start emitting and using libcalls?

@kito-cheng
Copy link
Copy Markdown
Member

My understanding is that i128 libcalls are missing because i128 was not a supported type in gcc/clang for most 32bit targets, and so they were not really supported. If we are now changing to supporting i128 then wouldn't it make sense to start emitting and using libcalls?

libgcc compatibility would be concern if we changing to use libcall by default IMO

@topperc
Copy link
Copy Markdown
Contributor

topperc commented May 8, 2026

My understanding is that i128 libcalls are missing because i128 was not a supported type in gcc/clang for most 32bit targets, and so they were not really supported. If we are now changing to supporting i128 then wouldn't it make sense to start emitting and using libcalls?

libgcc compatibility would be concern if we changing to use libcall by default IMO

The bug mentioned __BitInt(128). Does libgcc have separate functions for __BitInt from __int128?

@nikic
Copy link
Copy Markdown
Contributor Author

nikic commented May 9, 2026

libgcc has __mulbitint3 and __divmodbitint4 for _BitInt, but it does not define builtins for conversion functions between _BitInt and FP. (Incidentally compiler-rt does not implement these builtins and we currently emit inline expansions for mul/div instead.)

@nikic
Copy link
Copy Markdown
Contributor Author

nikic commented May 11, 2026

My understanding is that i128 libcalls are missing because i128 was not a supported type in gcc/clang for most 32bit targets, and so they were not really supported. If we are now changing to supporting i128 then wouldn't it make sense to start emitting and using libcalls?

The libcalls are defined in terms of __int128 rather than _BitInt(128), which are different and ABI-incompatible types :(

@kito-cheng
Copy link
Copy Markdown
Member

libgcc has __mulbitint3 and __divmodbitint4 for _BitInt, but it does not define builtins for conversion functions between _BitInt and FP. (Incidentally compiler-rt does not implement these builtins and we currently emit inline expansions for mul/div instead.)

FYI: There is __floatbitint[s|d|t]f[1] and __fix[s|d|t]fbitint[2] in libgcc, and will generated in GCC [3].*1

*1: I used RISC-V as example since I am port that, however other target like x86 or AArch64 also ported.

[1] https://github.com/gcc-mirror/gcc/blob/master/libgcc/soft-fp/floatbitintsf.c
[2] https://github.com/gcc-mirror/gcc/blob/master/libgcc/soft-fp/fixsfbitint.c
[3] https://godbolt.org/z/YP4dedTh1

@nikic
Copy link
Copy Markdown
Contributor Author

nikic commented May 11, 2026

libgcc has __mulbitint3 and __divmodbitint4 for _BitInt, but it does not define builtins for conversion functions between _BitInt and FP. (Incidentally compiler-rt does not implement these builtins and we currently emit inline expansions for mul/div instead.)

FYI: There is __floatbitint[s|d|t]f[1] and __fix[s|d|t]fbitint[2] in libgcc, and will generated in GCC [3].*1

*1: I used RISC-V as example since I am port that, however other target like x86 or AArch64 also ported.

[1] https://github.com/gcc-mirror/gcc/blob/master/libgcc/soft-fp/floatbitintsf.c [2] https://github.com/gcc-mirror/gcc/blob/master/libgcc/soft-fp/fixsfbitint.c [3] https://godbolt.org/z/YP4dedTh1

Oh, very nice. I was looking at the docs at https://gcc.gnu.org/onlinedocs/gccint/Integer-library-routines.html#Bit-precise-integer-arithmetic-functions, but of course that was the wrong place to look for FP builtins. These are documented at https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html#index-_005f_005ffixsfbitint.

So yeah, once we have all these builtins in compiler-rt, we can stop using the big IR expansions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using _BitInt(128) with double conversion generates call to unimplemented builtin for 32-bit targets

5 participants