[compiler-rt][ARM] Optimized mulsf3 and divsf3 #168394

statham-arm · 2025-11-17T16:13:03Z

(Reland of #161546, fixing three build and test issues)

This commit adds optimized assembly versions of single-precision float multiplication and division. Both functions are implemented in a style that can be assembled as either of Arm and Thumb2; for multiplication, a separate implementation is provided for Thumb1. Also, extensive new tests are added for multiplication and division.

These implementations can be removed from the build by defining the cmake variable COMPILER_RT_ARM_OPTIMIZED_FP=OFF.

Outlying parts of the functionality which are not on the fast path, such as NaN handling and underflow, are handled in helper functions written in C. These can be shared between the Arm/Thumb2 and Thumb1 implementations, and also reused by other optimized assembly functions we hope to add in future.

This commit adds optimized assembly versions of single-precision float multiplication and division. Both functions are implemented in a style that can be assembled as either of Arm and Thumb2; for multiplication, a separate implementation is provided for Thumb1. Also, extensive new tests are added for multiplication and division. These implementations can be removed from the build by defining the cmake variable COMPILER_RT_ARM_OPTIMIZED_FP=OFF. Outlying parts of the functionality which are not on the fast path, such as NaN handling and underflow, are handled in helper functions written in C. These can be shared between the Arm/Thumb2 and Thumb1 implementations, and also reused by other optimized assembly functions we hope to add in future.

In the earliest version of cmake supported by LLVM, `try_compile` doesn't understand convenient `SOURCE_FROM_CONTENT` option, so we must manually write our (empty) assembly source file and pass it to `try_compile` by filename. It also doesn't understand `NO_CACHE`, so in order to avoid making a spurious cache entry called `success` which would confuse the next run, I'm putting the result directly into the user-specified output variable. This leads to a less helpful comment in CMakeCache.txt, but what can you do.

A buildbot reported a failure in a hardfp build, related to ABI: the test was calling __mulsf3 and passing arguments in s0/s1, but the code inside __mulsf3 was reading them out of r0/r1. The ABI using GPRs is correct for __aeabi_fmul, but not for __mulsf3, which takes float arguments in accordance with whatever the normal ABI is. So in hardfp, the two functions behave differently. The obvious question is why anyone is linking this function in to a hardfp build in the first place - surely in a hardfp context clients would just use a vmul instruction instead of calling either of these entry points? But there seems to be no provision in builtins/CMakeLists.txt for leaving things out of hardfp builds. The generic __mulsf3.c is still included in a hardfp builtins library. So I've stuck with those basic premises, and just corrected my replacement functions to get the ABIs right.

The current functions depend on the MLS instruction, and future ones will depend on CLZ too.

statham-arm · 2025-11-17T16:14:47Z

This PR is initially a sequence of four commits already. The first is #161546 unchanged; the next three fix build and test failures, two of which were found by buildbots after the previous attempt, and the third I found myself during re-testing. I know they'll all be squashed together into one commit for landing, but I keep them separate to make review easier.

smithp35

The fixups look good to me. Can confirm that the hard to soft-float conversion is per the hard-float AAPCS, and that the new files are excluded from the "base_SOURCES".

compnerd · 2025-11-17T17:02:01Z

(I've only reviewed the new changes, given that the first commit is just the original work)

statham-arm added 4 commits November 17, 2025 16:10

Exclude the optimized sources from older Arm architectures

2b8d363

The current functions depend on the MLS instruction, and future ones will depend on CLZ too.

statham-arm requested review from compnerd, petrhosek and smithp35 November 17, 2025 16:13

llvmbot added compiler-rt compiler-rt:builtins labels Nov 17, 2025

smithp35 approved these changes Nov 17, 2025

View reviewed changes

compnerd approved these changes Nov 17, 2025

View reviewed changes

statham-arm merged commit 5efce73 into llvm:main Nov 18, 2025
13 checks passed

statham-arm deleted the arm-mulsf3-divsf3-reland branch November 18, 2025 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[compiler-rt][ARM] Optimized mulsf3 and divsf3 #168394

[compiler-rt][ARM] Optimized mulsf3 and divsf3 #168394

Uh oh!

statham-arm commented Nov 17, 2025

Uh oh!

statham-arm commented Nov 17, 2025

Uh oh!

smithp35 left a comment

Uh oh!

compnerd commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[compiler-rt][ARM] Optimized mulsf3 and divsf3 #168394

[compiler-rt][ARM] Optimized mulsf3 and divsf3 #168394

Uh oh!

Conversation

statham-arm commented Nov 17, 2025

Uh oh!

statham-arm commented Nov 17, 2025

Uh oh!

smithp35 left a comment

Choose a reason for hiding this comment

Uh oh!

compnerd commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants