Fused multiply-add (FMA) floating point instructions typically provide improved accuracy and performance when compared to independent floating point multiply and add instructions. However they may change the result of such an operation because they omit the rounding operation that would normally take place between a multiply instruction and an add instruction.
This proposal seeks to clarify the guarantees that Go provides as to when rounding to float32 or float64 must be performed so that FMA operations can be safely extracted by SSA rules. I assume that complex{64,128} casts will be lowered to float{32,64} casts for the purposes of this proposal.
The consensus from previous discussions on the subject is that explicit casts should force rounding, as is already specified for constants:
❌ a := float64(x * y) + z (1)
❌ z += float64(x * y) (2)
There is also consensus that parentheses should not force rounding. So in the following cases the intermediate rounding stage can be omitted and a FMA used:
✅ a := x * y + z (3)
✅ a := (x * y) + z (4)
✅ z += x * y (5)
✅ z += (x * y) (6)
It is also proposed that assignments to local variables should not force rounding to take place:
I also propose that an assignment to a memory location should force rounding (I lean towards forcing rounding whenever an intermediate result is visible to the program):
❌ *a = x * y; t := *a + z (8)
(SSA rules could optimize example 8 because they will replace the load from a with a reuse of the result of x * y.)
I think the only real complexity in the implementation is how we plumb the casts from the compiler to the SSA backend so that optimization rules can be blocked as appropriate. I’m not sure if there is a pre-existing mechanism we can use.
See these links for previous discussion of this proposal on golang-dev:
https://groups.google.com/d/topic/golang-dev/qvOqcmAkKnA/discussion
https://groups.google.com/d/topic/golang-dev/cVnE1K08Aks/discussion
Fused multiply-add (FMA) floating point instructions typically provide improved accuracy and performance when compared to independent floating point multiply and add instructions. However they may change the result of such an operation because they omit the rounding operation that would normally take place between a multiply instruction and an add instruction.
This proposal seeks to clarify the guarantees that Go provides as to when rounding to float32 or float64 must be performed so that FMA operations can be safely extracted by SSA rules. I assume that complex{64,128} casts will be lowered to float{32,64} casts for the purposes of this proposal.
The consensus from previous discussions on the subject is that explicit casts should force rounding, as is already specified for constants:
There is also consensus that parentheses should not force rounding. So in the following cases the intermediate rounding stage can be omitted and a FMA used:
It is also proposed that assignments to local variables should not force rounding to take place:
I also propose that an assignment to a memory location should force rounding (I lean towards forcing rounding whenever an intermediate result is visible to the program):
(SSA rules could optimize example 8 because they will replace the load from a with a reuse of the result of x * y.)
I think the only real complexity in the implementation is how we plumb the casts from the compiler to the SSA backend so that optimization rules can be blocked as appropriate. I’m not sure if there is a pre-existing mechanism we can use.
See these links for previous discussion of this proposal on golang-dev:
https://groups.google.com/d/topic/golang-dev/qvOqcmAkKnA/discussion
https://groups.google.com/d/topic/golang-dev/cVnE1K08Aks/discussion