[compiler-rt][ARM] Optimized mulsf3 and divsf3 #161546

statham-arm · 2025-10-01T16:24:26Z

This commit adds optimized assembly versions of single-precision float multiplication and division. Both functions are implemented in a style that can be assembled as either of Arm and Thumb2; for multiplication, a separate implementation is provided for Thumb1. Also, extensive new tests are added for multiplication and division.

These implementations can be removed from the build by defining the cmake variable COMPILER_RT_ARM_OPTIMIZED_FP=OFF.

Outlying parts of the functionality which are not on the fast path, such as NaN handling and underflow, are handled in helper functions written in C. These can be shared between the Arm/Thumb2 and Thumb1 implementations, and also reused by other optimized assembly functions we hope to add in future.

This commit adds optimized assembly versions of single-precision float multiplication and division. Both functions are implemented in a style that can be assembled as either of Arm and Thumb2; for multiplication, a separate implementation is provided for Thumb1. Also, extensive new tests are added for multiplication and division. These implementations can be removed from the build by defining the cmake variable COMPILER_RT_ARM_OPTIMIZED_FP=OFF. Outlying parts of the functionality which are not on the fast path, such as NaN handling and underflow, are handled in helper functions written in C. These can be shared between the Arm/Thumb2 and Thumb1 implementations, and also reused by other optimized assembly functions we hope to add in future.

statham-arm · 2025-10-01T16:24:43Z

This is the second PR in my planned series to upstream optimized AArch32 FP implementations, as discussed on Discourse in August. (Sorry for the delay.)

The first PR is #154093, which is replacing an existing assembly implementation with (we think) a better one. This one is adding new assembly implementations, for functions which don't have them already. The two PRs conflict, but benignly, in that they both add the same supporting C functions; whichever one lands first, I'll update the other one.

This PR is not quite in a committable state yet, because I'd like advice on what to do about the new tests. At the moment, they're using compareResultF to check the answers, which forgives differences of opinion in NaN handling. Our assembly routines have well specified NaN handling (designed to match the behavior of Arm's hardware FP), and a set of tests to check it. So when they're testing the new versions of the function, I'd like to make them check the output NaNs exactly.

But other architectures, and the existing C implementations in compiler-rt, can't be expected to pass those tests in their strict form. So those tests will have to be reverted to use compareResultF on any other architecture, or when the new config option COMPILER_RT_ARM_OPTIMIZED_FP=OFF is set. Any thoughts on the best thing to do about that?

aykevl

Can't give much of a review here, but superficially this looks fine to me.

aykevl · 2025-10-01T16:53:16Z

compiler-rt/lib/builtins/CMakeLists.txt

 )

+option(COMPILER_RT_ARM_OPTIMIZED_FP
+  "On 32-bit Arm, use optimized assembly implementations of FP arithmetic" ON)


I believe this is a code size vs speed tradeoff, right?
I think it would be a good idea to say that explicitly. (And IMHO if the new assembly routines are both smaller and faster they should just be replaced instead of having two options).

I've done that, with a "likely" in it to cover the fact that until we've gone through all of the available functions we won't know for sure whether all of them trade off size for speed.

(It's also difficult to judge, since when you compare assembly against C, the C is more likely to vary with compile options, so the answer might turn out to be "in this configuration but not that one".)

compnerd · 2025-10-01T17:47:46Z

compiler-rt/lib/builtins/arm/thumb1/mulsf3.S

+
+DEFINE_AEABI_FUNCTION_ALIAS(__aeabi_fmul, __mulsf3)
+
+DEFINE_COMPILERRT_FUNCTION(__mulsf3)


Given the previous .thumb, I think that we should use DEFINE_COMPILERRT_THUMB_FUNCTION instead.

Done. (I'm not sure it makes any difference when assembling for a Thumb-only architecture, but it keeps things consistent with existing files.)

compnerd · 2025-10-01T17:51:08Z

compiler-rt/lib/builtins/arm/thumb1/mulsf3.S

+  LSLS    r3, r2, #23
+  ADDS    r0, r0, r3    // put on the biased exponent
+
+  BL      __funder


Should this be SYMBOL_NAME(__compiler_rt_funder)?

compnerd · 2025-10-01T17:51:50Z

compiler-rt/lib/builtins/arm/thumb1/mulsf3.S

+LOCAL_LABEL(denorm):
+  PUSH    {r0,r1,r2,r3}
+  MOV     r0, sp
+  BL      __fnorm2


Should this be SYMBOL_NAME(__compiler_rt_fnorm2)?

How embarrassing. Not only that, but I hadn't actually added the helper functions to the build in the Thumb1 case. Apparently forgot to re-test both architectures before pushing! Fixed now.

compnerd · 2025-10-01T17:52:14Z

compiler-rt/lib/builtins/arm/thumb1/mulsf3.S

+  // propagates an appropriate NaN to the output, dealing with the special
+  // cases of signalling/quiet NaNs.
+LOCAL_LABEL(nan):
+  BL      __fnan2


Should this be SYMBOL_NAME(__compiler_rt_fnan2)?

compnerd · 2025-10-01T17:54:26Z

compiler-rt/lib/builtins/arm/divsf3.S

+
+*/
+
+  .p2align 2  // make sure we start on a 32-bit boundary, even in Thumb


I think that changing this to 4-byte boundary is better than 32-bit boundary as it can be confusing when scanning over the comment and code.

compnerd · 2025-10-01T17:55:22Z

compiler-rt/lib/builtins/arm/fnan2.c

+  if (aadj < 0x00800000)   // a is a quiet NaN?
+    return a;              // if so, return it
+  else                     // expect (badj < 0x00800000)
+    return b;              // in that case b must be a quiet NaN


I think that this should be either of the following:

return (aadj < 0x00800000) ? a : b;

or

if (aadj < 0x00800000) return a; return b;

This was Petr Hosek's comment on llvm#154093, but if we're doing that, we should do it consistently.

Now we should only test the extra NaN faithfulness in cases where it's provided by the library. Also tweaked the cmake setup to make it easier to add more assembly files later. Plus a missing piece of comment in fnan2.c.

statham-arm requested review from aykevl, compnerd, petrhosek and smithp35 October 1, 2025 16:24

llvmbot added compiler-rt compiler-rt:builtins labels Oct 1, 2025

aykevl reviewed Oct 1, 2025

View reviewed changes

compnerd reviewed Oct 1, 2025

View reviewed changes

statham-arm added 6 commits October 2, 2025 13:51

Fix the Thumb1 build which I forgot to test

8c7228f

Use DEFINE_COMPILERRT_THUMB_FUNCTION in Thumb1

4edb28b

Tweak final return in fnan2 as suggested

c73dfea

Clarify comment about 4-byte boundary

b236372

Lowercase instruction mnemonics and shifter operands

7a24535

This was Petr Hosek's comment on llvm#154093, but if we're doing that, we should do it consistently.

Mention size/speed tradeoff in the cmake option help

40e3621

statham-arm added a commit to statham-arm/llvm-project that referenced this pull request Oct 2, 2025

Changes to fnan2 to be consistent wih llvm#161546

a6f6263

Update build and test setup

66a3bcb

Now we should only test the extra NaN faithfulness in cases where it's provided by the library. Also tweaked the cmake setup to make it easier to add more assembly files later. Plus a missing piece of comment in fnan2.c.


		DEFINE_AEABI_FUNCTION_ALIAS(__aeabi_fmul, __mulsf3)

		DEFINE_COMPILERRT_FUNCTION(__mulsf3)


		*/

		.p2align 2 // make sure we start on a 32-bit boundary, even in Thumb

[compiler-rt][ARM] Optimized mulsf3 and divsf3 #161546

Are you sure you want to change the base?

[compiler-rt][ARM] Optimized mulsf3 and divsf3 #161546

Uh oh!

Conversation

statham-arm commented Oct 1, 2025

Uh oh!

statham-arm commented Oct 1, 2025

Uh oh!

aykevl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants