Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: software floating point for GOARM=6, 7 (not only GOARM=5) #61588

Open
ludi317 opened this issue Jul 26, 2023 · 33 comments
Open

runtime: software floating point for GOARM=6, 7 (not only GOARM=5) #61588

ludi317 opened this issue Jul 26, 2023 · 33 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FeatureRequest Proposal Proposal-Accepted
Milestone

Comments

@ludi317
Copy link
Contributor

ludi317 commented Jul 26, 2023

I want to run a go binary on an ARMv7 target that doesn't have a hardware floating point unit (FPU). (The ARMv7 specification does not require a hardware FPU; it is optional.) Currently, the only way to use software floating point on ARM targets is to set GOARM=5, regardless of the actual ARM version of the target, whether 5, 6, or 7. If the decision of using software or hardware floating point were decoupled from the ARM version, then there would be no need to fall back to the ARMv5 instruction set on ARMv7 chips lacking a hardware FPU.

I request a new go environment variable (perhaps GOARMFP=soft or hard) that could be used alongside GOARCH=arm and either GOARM=6 or GOARM=7 to specify software ("soft") or hardware ("hard") floating point. GOARM=5 would always imply software floating point.

Because this addresses an immediate business need, I have developed a working prototype for GOARM=7 with software floating point, and could make contributions toward this new setting.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 26, 2023
@cherrymui
Copy link
Member

You can try using go build -gcflags=all=-d=softfloat, which should make all compiled code using softfloat. There might be some assembly code that uses floating point, which you might need to rewrite.

@mknyszek mknyszek added this to the Backlog milestone Jul 26, 2023
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 26, 2023
@mknyszek mknyszek changed the title runtime: request for software floating point for GOARM=6, 7 (not only GOARM=5) proposal: runtime: software floating point for GOARM=6, 7 (not only GOARM=5) Jul 26, 2023
@mknyszek
Copy link
Contributor

In triage, we think this needs to be a proposal. Since this isn't explicitly supported (and we don't have hardware for CI to test this configuration, or a test to make sure there aren't any FP instructions when setting the softfloat configuration) we'd have to make a decision to support it.

@mknyszek mknyszek added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jul 26, 2023
@gopherbot gopherbot removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jul 26, 2023
@ianlancetaylor ianlancetaylor modified the milestones: Backlog, Proposal Jul 26, 2023
@gopherbot
Copy link

Change https://go.dev/cl/514907 mentions this issue: all: add GOARMFP env var for ARM floating point mode

@ludi317
Copy link
Contributor Author

ludi317 commented Aug 1, 2023

In this comment, another user is forced to downgrade to GOARM=5 on an ARMv7 chip just to get soft floating point (#58686 comment).

An ARMv7 chip should execute ARMv7 instructions. Anything less leaves the CPU underutilized, and is a waste of resources.

To support this proposal, I have submitted a CL that can build GOARM=7 and GOARMFP=soft. Even if this proposal is not approved, I would greatly appreciate a review, or any feedback, on the CL. Thanks.

@cherrymui
Copy link
Member

@ludi317 Have you tried the compiler flag -gcflags=all=-d=softfloat? If there is some assembly code that needs to be adjusted we could introduce a macro like -D softfloat that you can pass as -asmflags.

If we really want an environment variable for the go command, my counter proposal: use an existing variable, either GOARM=7,softfloat (see also #60072), or GOEXPERIMENT=softfloat (our softfloat implementation is largely architecture independent (except a small amount of assembly code), so may as well use an architecture independent flag).

@ludi317
Copy link
Contributor Author

ludi317 commented Aug 3, 2023

@cherrymui I did try building with the compiler flag -gcflags=all=-d=softfloat (and commenting out this check). Unfortunately, the binary crashed with signal SIGILL. The assembly code does indeed need to be modified in a few places, as seen in my CL.

I am not particular about the API used to specify soft float for ARM, as long as there is one. If I were to choose, I'd suggest that if GOMIPS64 accepts a comma-separated list of options (as proposed in #60072), then it would make sense for GOARM to do the same. Your proposal to use GOARM=7,softfloat seems very reasonable.

@randall77
Copy link
Contributor

Can you tell us what this chip is that is armv7 but without floating point? I am curious.

Since you have the change prototyped, what performance differences are you seeing between GOARM=5 and GOARM=7,softfloat? In the compiler at least, the differences I see are mostly bit manipulation instructions (find first bit, etc.). There may be some more in the runtime (memmove?).

@MDr164
Copy link

MDr164 commented Aug 9, 2023

Can you tell us what this chip is that is armv7 but without floating point? I am curious.

The Aspeed AST2500 for example is a chip that supports the armv6k instruction set but does not have a floating point unit so we need to fall back to GOARM=5 for that one. Another one is the Broadcom BCM4708A0 armv7 SoC that lacks floating point hardware. In general a lot of the cheaper WiFi/AP/Network appliances or deeply embedded SoCs often come without an fpu as it's often times not really needed for the limited usecase of the system.

@ludi317
Copy link
Contributor Author

ludi317 commented Aug 9, 2023

Can you tell us what this chip is that is armv7 but without floating point? I am curious.

The chip is a BCM56160, and is found in a network switch. sysctl shows that the CPU is an ARM Cortex-A9, without an FPU:

root@martini48t-p2a-sys04:RE:0% sysctl hw.model hw.floatingpoint
hw.model: ARM Cortex-A9 r4p1 (ECO: 0x00000000)
hw.floatingpoint: 0

Since you have the change prototyped, what performance differences are you seeing between GOARM=5 and GOARM=7,softfloat?

I never measured the performance of our Go program when GOARM=5. Since the network switch is already CPU-bound, I was concerned that downgrading would only hurt performance.

In the compiler at least, the differences I see are mostly bit manipulation instructions (find first bit, etc.). There may be some more in the runtime (memmove?).

Yes, the runtime leverages ARMv7 features. One example is that when GOARM=7, the runtime opts for ARM-specific atomic operations (armCas64, armXadd64, armXchg64, armLoad64, armStore64).

MOVB runtime·goarm(SB), R11
CMP $7, R11
BLT 2(PC)
JMP armCas64<>(SB)
JMP ·goCas64(SB)

FWIW, the prototype has matured into a feature implementation that takes GOARM=7,softfloat as an argument. Using this new option, we have built binaries that work as expected on the switch. Please see the CL for the implementation.

Finally, I came across a comment from Russ Cox indicating that back in 2011, Go supported software floating point for GOARM > 5, by setting the -F flag.

@cherrymui
Copy link
Member

Finally, I came across a comment from Russ Cox indicating that back in 2011, Go supported software floating point for GOARM > 5, by setting the -F flag.

The softfloat support in Go has been reworked since then. We used to handle it in the linker (5l at the time), at instruction level, which means it would also handle (Go) assembly code (but not cgo). Now we handle it in the compiler, with -gcflags=-d=softfloat, which means it doesn't handle assembly code. So we need a way for that.

@randall77
Copy link
Contributor

I'd really like to see some performance numbers of the difference between GOARM=5 and GOARM=7,softfloat. If there is little or no difference the whole point of this proposal is kind of moot.
It doesn't have to be on these strange chips. Any GOARM=7 capable chip could run some benchmarks in both modes and see. (You'd need to patch in the proposed CL for 7,softfloat support.)

@ludi317
Copy link
Contributor Author

ludi317 commented Aug 26, 2023

@randall77 Please find the requested benchmarks comparing GOARM=5 and GOARM=7,softfloat below. Full source code here.

The benchmarks show many significant performance improvements, and only a few minor degradations. On the AtomicOperationsInt64 benchmark, GOARM=7,softfloat is more than 3x faster than GOARM=5 .

goarch: arm
pkg: github.com/ludi317/arm-wrestle
                                  │ armv5_1cpu_raw.txt │       armv7soft_1cpu_raw.txt       │
                                  │       sec/op       │   sec/op     vs base               │
Float32Arithmetic                          4.944µ ± 1%   4.678µ ± 0%   -5.37% (p=0.002 n=6)
Int32Arithmetic                            15.67n ± 3%   15.65n ± 0%        ~ (p=0.318 n=6)
Float64Arithmetic                          3.905µ ± 0%   3.876µ ± 0%   -0.74% (p=0.002 n=6)
Int64Arithmetic                            29.06n ± 0%   29.07n ± 0%   +0.03% (p=0.015 n=6)
ANDconstBICconst                           52.53n ± 0%   52.55n ± 0%   +0.03% (p=0.035 n=6)
Uint64Move                                 22.35n ± 0%   22.36n ± 0%        ~ (p=1.000 n=6)
ADD                                        1.049µ ± 0%   1.009µ ± 0%   -3.81% (p=0.002 n=6)
ADDBICconst                                20.12n ± 0%   19.00n ± 0%   -5.57% (p=0.002 n=6)
ADDBICconstInt64                           29.07n ± 0%   27.94n ± 0%   -3.87% (p=0.002 n=6)
WithMulDAndMulF                           1029.0n ± 0%   986.2n ± 0%   -4.16% (p=0.002 n=6)
BitwiseInt32                               8.942n ± 0%   8.942n ± 0%        ~ (p=0.773 n=6)
BitwiseInt64                               13.42n ± 0%   13.42n ± 0%        ~ (p=1.000 n=6)
TrailingZeros                              43.59n ± 0%   30.18n ± 0%  -30.76% (p=0.002 n=6)
ProducerConsumerBufferedCh                 3.894µ ± 0%   3.603µ ± 0%   -7.46% (p=0.002 n=6)
ProducerConsumerBufferedChInt64            3.961µ ± 1%   3.631µ ± 0%   -8.33% (p=0.002 n=6)
ProducerConsumerUnBufferedCh               5.099µ ± 0%   4.701µ ± 0%   -7.81% (p=0.002 n=6)
ProducerConsumerUnBufferedChInt64          5.073µ ± 0%   4.634µ ± 0%   -8.65% (p=0.002 n=6)
GetCntxct                                  3.851µ ± 0%   3.578µ ± 0%   -7.10% (p=0.002 n=6)
CASInt32                                   158.9n ± 0%   160.9n ± 0%   +1.26% (p=0.002 n=6)
CASInt64                                   502.1n ± 0%   157.3n ± 3%  -68.66% (p=0.002 n=6)
CASUint64                                  502.1n ± 0%   157.5n ± 0%  -68.64% (p=0.002 n=6)
CASUint32                                  158.9n ± 0%   166.7n ± 0%   +4.91% (p=0.002 n=6)
CASUintptr                                 158.9n ± 0%   167.8n ± 3%   +5.60% (p=0.002 n=6)
AtomicOperationsInt64                      931.1n ± 0%   268.6n ± 0%  -71.15% (p=0.002 n=6)
AtomicOperationsInt32                      306.6n ± 0%   297.6n ± 0%   -2.92% (p=0.002 n=6)
AtomicOperationsUint64                     928.8n ± 0%   270.8n ± 0%  -70.84% (p=0.002 n=6)
AtomicOperationsUint32                     306.6n ± 0%   297.6n ± 0%   -2.92% (p=0.002 n=6)
AtomicOperationsUintptr                    308.8n ± 0%   304.4n ± 0%   -1.42% (p=0.002 n=6)
AtomicOperationsBool                       537.1n ± 0%   494.5n ± 0%   -7.93% (p=0.002 n=6)
geomean                                    300.4n        245.5n       -18.28%

@randall77
Copy link
Contributor

So it looks like math/bits and 64-bit atomics are the regressions.

The math/bits one is pretty minor, GOARM=5 is missing the RBIT instruction so getting trailing bits takes 2 more instructions. I think ReverseBytes is similar. (Reverse32 should be a lot faster on GOARM=7, but no one has optimized that function to use RBIT.)

The 64-bit atomic costs are more substantial. The arm atomics already do a runtime check, but they just use the GOARM value the binary was built with. If we can detect the presence of the atomic instructions we need (LDREXD/STREXD, maybe also DMB?) at runtime, then we can base the runtime check on the actual hardware we're running on.

@randall77
Copy link
Contributor

LDREXTD/STREXD can be detected using the lpae feature bit. (Particularly, detecting that they will be 64-bit atomic.)
It looks like we also need to make sure the DMB instruction is available. It is only available starting in v7, so we need a way to detect that the chip is v7. Anyone know how to get that from feature bits? Currently we check vfp and vfpv3, but of course that's too strict if we're trying to run on fp-less chips.

@ludi317
Copy link
Contributor Author

ludi317 commented Aug 30, 2023

@randall77 I thought the performance deltas in the channel-backed ProducerConsumer benchmarks (-8%) were also interesting, even though they were not as large as those of the math/bits and 64-bit atomic benchmarks.

Based on that finding, I wrote more benchmarks to compare the performance of synchronization primitives between the two builds. Please find the results below. The Mutex benchmarks that acquire a mutex lock, do some work, then release the lock are ~2x faster on GOARM=7,softlfloat.

goos: linux
goarch: arm
pkg: github.com/ludi317/arm-wrestle
                                  │ armv5_1cpu_raw.txt │       armv7soft_1cpu_raw.txt       │
                                  │       sec/op       │   sec/op     vs base               │
                                  ...
Mutex                                      44.94µ ± 0%   22.60µ ± 0%  -49.70% (p=0.002 n=6)
RWMutex_Read                               45.00µ ± 0%   22.65µ ± 0%  -49.67% (p=0.002 n=6)
RWMutex_Write                              45.22µ ± 0%   22.87µ ± 0%  -49.42% (p=0.002 n=6)
WaitGroup                                  90.63m ± 4%   77.13m ± 4%  -14.89% (p=0.002 n=6)
Channel                                    8.781m ± 0%   8.383m ± 0%   -4.54% (p=0.002 n=6)
AtomicAdd                                 259.40n ± 0%   73.86n ± 0%  -71.53% (p=0.002 n=6)
Once                                       67.11n ± 0%   64.87n ± 0%   -3.33% (p=0.002 n=6)
Cond                                      11.126µ ± 0%   9.781µ ± 0%  -12.09% (p=0.002 n=6)
Pool                                       774.5n ± 1%   723.2n ± 1%   -6.62% (p=0.002 n=6)

@randall77
Copy link
Contributor

I suspect that the channel differences are all due to the synchronization primitives that channels use, for which we know there is already a sizable performance difference.

@gopherbot
Copy link

Change https://go.dev/cl/525637 mentions this issue: runtime: on arm32, detect whether we have sync instructions

@ludi317
Copy link
Contributor Author

ludi317 commented Oct 13, 2023

Is there any additional information needed to move this proposal to "Active" status? To summarize,

  • In 2011, Go supported software floating point for GOARM 6 and 7. Support was dropped in a refactoring.
  • Some ARMv6 and ARMv7 chips don't have FPUs. They must now fall back to GOARM=5 to run Go binaries.
  • This CL reintroduces soft float support for GOARM 6 and 7. It's been successfully tested for several months.
  • The soft float option is specified in a comma-separated list, eg GOARM=7,softfloat. The format is consistent with the API defined in the accepted GOMIPS64 proposal, eg GOMIPS64=iii,softfloat.
  • Benchmarks show 3x speedups for 64-bit atomics and 2x speedups for other synchronization primitives in GOARM=7,softfloat as compared to GOARM=5.
  • @randall77 authored a CL that uses cpu feature detection at runtime to see if ARMv7 64-bit atomics can be used on any ARM target. This would allows binaries compiled with GOARM=5 to leverage some ARMv7 features.

@rsc
Copy link
Contributor

rsc commented Oct 25, 2023

It sounds like https://go.dev/cl/525637 is the right thing to try first, since it is not a visible API change and does not require a proposal at all. @ludi317 can you please rerun your GOARM=5 benchmarks with Keith's CL patched in?

@ludi317
Copy link
Contributor Author

ludi317 commented Oct 25, 2023

@rsc Please find the requested benchmarks comparing GOARM=5 with Keith's CL applied and GOARM=7,softfloat.

  • 64-bit atomics performance is the same between the two builds (as expected)
  • Mutex benchmarks are still 2x faster and math/bits benchmarks are up to 1.4x faster in GOARM=7,softfloat
goos: linux
goarch: arm
pkg: github.com/ludi317/arm-wrestle
cpu: ARMv7 Processor rev 5 (v7l)
                                  │ raw/round2/armv5keith_1cpu_raw.txt │ raw/round2/armv7soft_1cpu_raw.txt  │
                                  │               sec/op               │   sec/op     vs base               │
Float32Arithmetic                                          4.761µ ± 0%   4.715µ ± 0%   -0.98% (p=0.002 n=6)
Int32Arithmetic                                            15.70n ± 0%   15.74n ± 0%   +0.25% (p=0.002 n=6)
Float64Arithmetic                                          3.923µ ± 0%   3.898µ ± 0%   -0.64% (p=0.002 n=6)
Int64Arithmetic                                            29.15n ± 0%   29.15n ± 0%        ~ (p=0.396 n=6)
ANDconstBICconst                                           52.69n ± 0%   52.65n ± 0%        ~ (p=0.266 n=6)
Uint64Move                                                 22.42n ± 0%   22.42n ± 0%        ~ (p=1.000 n=6)
ADD                                                        1.060µ ± 0%   1.007µ ± 0%   -5.09% (p=0.002 n=6)
ADDBICconst                                                20.18n ± 0%   18.95n ± 0%   -6.12% (p=0.002 n=6)
ADDBICconstInt64                                           29.15n ± 0%   27.88n ± 0%   -4.36% (p=0.002 n=6)
WithMulDAndMulF                                           1040.5n ± 0%   984.7n ± 0%   -5.36% (p=0.002 n=6)
BitwiseInt32                                               8.968n ± 0%   8.919n ± 0%   -0.55% (p=0.002 n=6)
BitwiseInt64                                               13.46n ± 0%   13.38n ± 0%   -0.56% (p=0.002 n=6)
TrailingZeros                                              43.72n ± 0%   30.10n ± 0%  -31.15% (p=0.002 n=6)
LeadingZeros                                               42.60n ± 0%   39.14n ± 0%   -8.13% (p=0.002 n=6)
RotateLeft                                                 114.4n ± 0%   111.0n ± 0%   -2.97% (p=0.002 n=6)
OnesCount                                                  150.2n ± 0%   144.7n ± 0%   -3.73% (p=0.002 n=6)
ProducerConsumerBufferedCh                                 3.602µ ± 0%   3.480µ ± 0%   -3.39% (p=0.002 n=6)
ProducerConsumerBufferedChInt64                            3.687µ ± 0%   3.529µ ± 0%   -4.29% (p=0.002 n=6)
ProducerConsumerUnBufferedCh                               4.683µ ± 0%   4.492µ ± 0%   -4.09% (p=0.002 n=6)
ProducerConsumerUnBufferedChInt64                          4.653µ ± 0%   4.490µ ± 0%   -3.50% (p=0.002 n=6)
GetCntxct                                                  3.580µ ± 1%   3.496µ ± 1%   -2.36% (p=0.002 n=6)
CASInt32                                                   158.4n ± 0%   160.5n ± 0%   +1.33% (p=0.002 n=6)
CASInt64                                                   152.6n ± 2%   154.8n ± 0%   +1.44% (p=0.015 n=6)
CASUint64                                                  152.6n ± 1%   151.8n ± 1%        ~ (p=0.117 n=6)
CASUint32                                                  159.3n ± 0%   160.3n ± 0%   +0.63% (p=0.002 n=6)
CASUintptr                                                 159.3n ± 0%   164.8n ± 0%   +3.45% (p=0.002 n=6)
AtomicOperationsInt64                                      269.4n ± 0%   270.0n ± 0%   +0.24% (p=0.002 n=6)
AtomicOperationsInt32                                      307.5n ± 0%   296.8n ± 0%   -3.50% (p=0.002 n=6)
AtomicOperationsUint64                                     269.3n ± 0%   267.7n ± 0%   -0.59% (p=0.002 n=6)
AtomicOperationsUint32                                     307.5n ± 0%   296.9n ± 0%   -3.46% (p=0.002 n=6)
AtomicOperationsUintptr                                    309.7n ± 0%   301.2n ± 1%   -2.74% (p=0.002 n=6)
AtomicOperationsBool                                       539.7n ± 0%   498.2n ± 0%   -7.68% (p=0.002 n=6)
Mutex                                                      45.09µ ± 0%   22.68µ ± 0%  -49.69% (p=0.002 n=6)
RWMutex_Read                                               45.13µ ± 0%   22.71µ ± 1%  -49.69% (p=0.002 n=6)
RWMutex_Write                                              45.37µ ± 0%   22.90µ ± 0%  -49.52% (p=0.002 n=6)
WaitGroup                                                  77.64m ± 6%   77.01m ± 5%        ~ (p=0.394 n=6)
Channel                                                    8.816m ± 0%   8.344m ± 1%   -5.35% (p=0.002 n=6)
AtomicAdd                                                  71.80n ± 0%   73.67n ± 0%   +2.61% (p=0.002 n=6)
Once                                                       67.31n ± 0%   64.98n ± 0%   -3.45% (p=0.002 n=6)
Cond                                                      10.316µ ± 0%   9.796µ ± 0%   -5.04% (p=0.002 n=6)
Pool                                                       787.1n ± 1%   741.6n ± 2%   -5.79% (p=0.002 n=6)
MutexContended                                             233.6n ± 1%   235.1n ± 0%   +0.64% (p=0.002 n=6)
RWMutexContendedRead                                       276.3n ± 0%   278.6n ± 0%   +0.80% (p=0.002 n=6)
RWMutexContendedWrite                                      499.2n ± 0%   521.5n ± 0%   +4.48% (p=0.002 n=6)
Semaphore                                                  970.6n ± 0%   967.6n ± 0%   -0.30% (p=0.002 n=6)
Mutex2                                                     213.1n ± 0%   210.9n ± 0%   -1.03% (p=0.002 n=6)
RWMutex                                                    253.5n ± 0%   242.2n ± 0%   -4.46% (p=0.002 n=6)
Channel2                                                   970.7n ± 0%   968.0n ± 0%   -0.28% (p=0.002 n=6)
MapRWMutex/Write                                           807.4n ± 0%   790.3n ± 0%   -2.12% (p=0.002 n=6)
MapRWMutex/Read                                            335.1n ± 0%   334.7n ± 0%   -0.10% (p=0.002 n=6)
MapMutex                                                   503.9n ± 0%   520.6n ± 0%   +3.32% (p=0.002 n=6)
geomean                                                    586.8n        550.0n        -6.27%

@rsc
Copy link
Contributor

rsc commented Oct 27, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@ludi317
Copy link
Contributor Author

ludi317 commented Oct 30, 2023

I came across this comment from @minux about GOARM:

We made a mistake when defining GOARM: VFP status and ARM architecture
version really are two separate property. For example, ARMv5E chips could
also have FPU (at least in theory) and there are ARMv6 chips without VFP.

I drew a table to help clarify my understanding of the relationship between a target's ARM architecture version and FPU status, and the right GOARM value to use to generate its binary. (Could be wrong; please correct any errors.) It shows what kind of instructions each GOARM value emits. The ARM architecture version {5, 6, 7} and FPU status {No FPU, VFPv1, VFPv3} are separate properties, on separate axes, as the comment says.

No FPU VFPv1 VFPv3
ARMv5 GOARM=5
ARMv6 GOARM=6
ARMv7 GOARM=7
  • GOARM=5 emits ARMv5 instructions using softfloat emulation
  • GOARM=6 emits ARMv6 instructions using VFPv1
  • GOARM=7 emits ARMv7 instructions using VFPv3

Many ARM targets are located on the main diagonal. Off-diagonal targets fall back to a GOARM that underutilizes their hardware. Arrows point in the direction of the fallback. For example, an ARMv7 device with no FPU drops down to a binary with ARMv5 instructions. (Dashes represent invalid combinations of architecture versions and FPUs; VFPv3 is not implemented on ARM architectures v5 and v6.)

This proposal aims to avoid the 2 fallback cases in the No FPU column, allowing them to leverage the features of their respective architecture versions.

No FPU VFPv1 VFPv3
ARMv5 GOARM=5
ARMv6 GOARM=6,softfloat GOARM=6
ARMv7 GOARM=7,softfloat GOARM=7

I wanted to frame this problem in the larger context of all fallbacks, to help guide the selection of a new set of GOARM options. For example, one advantage of the proposed softfloat / hardfloat naming scheme is that it is expressive enough to select GOARM=5,hardfloat and redress another fallback case. This is not to say GOARM=5,hardfloat ought to be implemented, only that the options generalize well enough to permit the possibility.

gopherbot pushed a commit that referenced this issue Oct 31, 2023
Make the choice of using these instructions dynamic (triggered by cpu
feature detection) rather than static (trigered by GOARM setting).

if GOARM>=7, we know we have them.
For GOARM=5/6, dynamically dispatch based on auxv information.

Update #17082
Update #61588

Change-Id: I8a50481d942f62cf36348998a99225d0d242f8af
Reviewed-on: https://go-review.googlesource.com/c/go/+/525637
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@rsc
Copy link
Contributor

rsc commented Nov 2, 2023

Thanks for the numbers showing that 7,softfloat is still better than 5 with checks.

@rsc
Copy link
Contributor

rsc commented Nov 2, 2023

Have all remaining concerns about this proposal been addressed?

GOARM changes to have the form [567](,attrs)?.
That is, there is now an optional attribute list.
The only two defined attributes are softfloat and hardfloat, specifying software and hardware floating point (same names as for GOMIPS).
It is an error to specify both softfloat and hardfloat.
The leading number cannot be omitted.
softfloat is the default for GOARM=5 and hardfloat is the default for GOARM=6 and GOARM=7.

When compiled with GOARM=7,softfloat, code will assume ARMv7 non-FP instructions like atomics but will use software floating point.

@MDr164
Copy link

MDr164 commented Nov 3, 2023

Looks good, looking forward to create some real-world benchmarks as this feature might greatly boost performance due to being finally able to use the v6 and v7 ISA on non-FP chips 🎉
I'm also in favor of the new optional attribute as this allows aot compilation with optimized asm instead of autodetection via cpu feature bits which aren't always reliable. And it keeps code size down.

@rsc
Copy link
Contributor

rsc commented Nov 10, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

GOARM changes to have the form [567](,attrs)?.
That is, there is now an optional attribute list.
The only two defined attributes are softfloat and hardfloat, specifying software and hardware floating point (same names as for GOMIPS).
It is an error to specify both softfloat and hardfloat.
The leading number cannot be omitted.
softfloat is the default for GOARM=5 and hardfloat is the default for GOARM=6 and GOARM=7.

When compiled with GOARM=7,softfloat, code will assume ARMv7 non-FP instructions like atomics but will use software floating point.

@cherrymui
Copy link
Member

I assume GOARM=5,hardfloat will be an unsupported configuration?

@MDr164
Copy link

MDr164 commented Nov 12, 2023

I assume GOARM=5,hardfloat will be an unsupported configuration?

To quote Ludi from earlier:

For example, one advantage of the proposed softfloat / hardfloat naming scheme is that it is expressive enough to select GOARM=5,hardfloat and redress another fallback case. This is not to say GOARM=5,hardfloat ought to be implemented, only that the options generalize well enough to permit the possibility.

So I'd say GOARM=5,hardfloat should be generally supported as VFP is technically supported on ARMv5 but I never came accross a chip that actually implements this combination (while the other way around, having a higher ISA but no VFP, is more common than one might think). And to streamline the flags and quote Russ:

GOARM changes to have the form [567](,attrs)?. [...] softfloat is the default for GOARM=5 and hardfloat is the default for GOARM=6 and GOARM=7.

So there should not be a difference of attrs supported for each number imo.

@rsc
Copy link
Contributor

rsc commented Nov 14, 2023

I think it's fine to support 5,hardfloat and easier to support it than to reject it. Maybe people on chips with broken atomics will want it.

@ludi317
Copy link
Contributor Author

ludi317 commented Nov 14, 2023

I updated my CL to support GOARM=5,hardfloat. I marked the parts of code that require the eye of a Go compiler team member as todo. I assume it's too late for this change to make it into the upcoming 1.22 release?

@randall77
Copy link
Contributor

It is not too late yet. The freeze is Nov 21.

@rsc
Copy link
Contributor

rsc commented Nov 16, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

GOARM changes to have the form [567](,attrs)?.
That is, there is now an optional attribute list.
The only two defined attributes are softfloat and hardfloat, specifying software and hardware floating point (same names as for GOMIPS).
It is an error to specify both softfloat and hardfloat.
The leading number cannot be omitted.
softfloat is the default for GOARM=5 and hardfloat is the default for GOARM=6 and GOARM=7.

When compiled with GOARM=7,softfloat, code will assume ARMv7 non-FP instructions like atomics but will use software floating point.

@rsc rsc changed the title proposal: runtime: software floating point for GOARM=6, 7 (not only GOARM=5) runtime: software floating point for GOARM=6, 7 (not only GOARM=5) Nov 16, 2023
@rsc rsc modified the milestones: Proposal, Backlog Nov 16, 2023
gopherbot pushed a commit that referenced this issue Nov 20, 2023
This change introduces new options to set the floating point
mode on ARM targets. The GOARM version number can optionally be
followed by ',hardfloat' or ',softfloat' to select whether to
use hardware instructions or software emulation for floating
point computations, respectively. For example,
GOARM=7,softfloat.

Previously, software floating point support was limited to
GOARM=5. With these options, software floating point is now
extended to all ARM versions, including GOARM=6 and 7. This
change also extends hardware floating point to GOARM=5.

GOARM=5 defaults to softfloat and GOARM=6 and 7 default to
hardfloat.

For #61588

Change-Id: I23dc86fbd0733b262004a2ed001e1032cf371e94
Reviewed-on: https://go-review.googlesource.com/c/go/+/514907
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
@MDr164
Copy link

MDr164 commented Nov 20, 2023

The CL has been merged, I guess this can be marked as resolved then?

@dmitshur dmitshur modified the milestones: Backlog, Go1.22 Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FeatureRequest Proposal Proposal-Accepted
Projects
Status: Todo
Status: Accepted
Development

No branches or pull requests

9 participants