-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Closed
Labels
Description
My last https://go-review.googlesource.com/c/go/+/94901 improved FP computation performance by about
9% on ARM64, but introduced a little accuracy lost.
The main idea is packing a pair of FMUL/FADD instructions into a single FMADD, and its benefits
- save a register for the intermediate mul result
- save CPU ticks
How ever accuracy loss also be introduced. Such as
float32(0.6046603 * 0.9405091) + 0.6645601, expected 1.2332485, got 1.2332486
float32(0.67908466 * 0.21855305) + 0.20318687, expected 0.3516029, got 0.35160288
...
The test case go/src/cmd/compile/internal/gc/testdata/fp.go failed.
There are two solutions
-
Roll back to the less optimized fmul/fadd
-
Modify the test case, something like pattern matching
float32(0.6046603 * 0.9405091) + 0.6645601 == 1.2332485
float32(0.6046603 * 0.9405091) + 0.6645601 == 1.233248*
What is your opinion?