Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
proposal: cmd/go: add GOARM=8 for further optimization on armv7/aarch32 #29373
Currently an ARM program built by go will call runtime.udiv() for a division, in which it detect if a hardware divider is available, or use software division.
The main reason is that a hardware divider is an optional component for an ARMv7 machine. But in the real world, most ARMv7 SOC has it, such as RaspberryPi2.
GOARM=8 implies that the program will run in the aarch32 mode (ARMv7 compatible) of an arm64 machine, on which a hardware divider is a must. So
The go1 benchmark does show some improvement for directly generation of SDIV/UDIV against runtime detection.
I don't think it is a good idea to introduce another GOARM value for only two instructions and a performance gain less than 1% of geomean. I think dynamic feature detection is still better in this case. If the overhead of a runtime call is larger than we thought, we could generate the feature test and conditional branch inlined. Compared to a division operation, I don't think the overhead of a conditional branch is not acceptable.
It sounds like the proposal is GOARM=8 means ARMv7+hwdivide.
It looks like this usually doesn't matter, performance-wise. Maybe the compiler should emit code to do the check + branch like we already do for write barriers? That would at least allow a bit more optimization of the hw code path (not doing unnecessary spills/reloads/etc).
And a very division heavy function could just have two copies with the compiler having hoisted the check out of the loop or other body to amortize it, all without a GOARM=.
Another option is to follow the GOMIPS and have GOARM=7+divide. But the faster branch check a la write barriers seems better to try first.