Skip to content

Conversation

@statham-arm
Copy link
Collaborator

(Reland of #161546, fixing three build and test issues)

This commit adds optimized assembly versions of single-precision float multiplication and division. Both functions are implemented in a style that can be assembled as either of Arm and Thumb2; for multiplication, a separate implementation is provided for Thumb1. Also, extensive new tests are added for multiplication and division.

These implementations can be removed from the build by defining the cmake variable COMPILER_RT_ARM_OPTIMIZED_FP=OFF.

Outlying parts of the functionality which are not on the fast path, such as NaN handling and underflow, are handled in helper functions written in C. These can be shared between the Arm/Thumb2 and Thumb1 implementations, and also reused by other optimized assembly functions we hope to add in future.

This commit adds optimized assembly versions of single-precision float
multiplication and division. Both functions are implemented in a style
that can be assembled as either of Arm and Thumb2; for multiplication, a
separate implementation is provided for Thumb1. Also, extensive new
tests are added for multiplication and division.

These implementations can be removed from the build by defining the
cmake variable COMPILER_RT_ARM_OPTIMIZED_FP=OFF.

Outlying parts of the functionality which are not on the fast path, such
as NaN handling and underflow, are handled in helper functions written
in C. These can be shared between the Arm/Thumb2 and Thumb1
implementations, and also reused by other optimized assembly functions
we hope to add in future.
In the earliest version of cmake supported by LLVM, `try_compile`
doesn't understand convenient `SOURCE_FROM_CONTENT` option, so we must
manually write our (empty) assembly source file and pass it to
`try_compile` by filename.

It also doesn't understand `NO_CACHE`, so in order to avoid making a
spurious cache entry called `success` which would confuse the next
run, I'm putting the result directly into the user-specified output
variable. This leads to a less helpful comment in CMakeCache.txt, but
what can you do.
A buildbot reported a failure in a hardfp build, related to ABI: the
test was calling __mulsf3 and passing arguments in s0/s1, but the code
inside __mulsf3 was reading them out of r0/r1.

The ABI using GPRs is correct for __aeabi_fmul, but not for __mulsf3,
which takes float arguments in accordance with whatever the normal ABI
is. So in hardfp, the two functions behave differently.

The obvious question is why anyone is linking this function in to a
hardfp build in the first place - surely in a hardfp context clients
would just use a vmul instruction instead of calling either of these
entry points? But there seems to be no provision in
builtins/CMakeLists.txt for leaving things out of hardfp builds. The
generic __mulsf3.c is still included in a hardfp builtins library. So
I've stuck with those basic premises, and just corrected my
replacement functions to get the ABIs right.
The current functions depend on the MLS instruction, and future ones
will depend on CLZ too.
@statham-arm
Copy link
Collaborator Author

This PR is initially a sequence of four commits already. The first is #161546 unchanged; the next three fix build and test failures, two of which were found by buildbots after the previous attempt, and the third I found myself during re-testing. I know they'll all be squashed together into one commit for landing, but I keep them separate to make review easier.

Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixups look good to me. Can confirm that the hard to soft-float conversion is per the hard-float AAPCS, and that the new files are excluded from the "base_SOURCES".

@compnerd
Copy link
Member

(I've only reviewed the new changes, given that the first commit is just the original work)

@statham-arm statham-arm merged commit 5efce73 into llvm:main Nov 18, 2025
13 checks passed
@statham-arm statham-arm deleted the arm-mulsf3-divsf3-reland branch November 18, 2025 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants