Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
runtime: add mul intrinsic for overflow checks #21588
This is neither a language change nor introducing or changing a public api but discussing a runtime implementation detail.
Profiling over production programs reveals that growslice and makemap are hot functions (even outside the malloc part). I am working on to improve their performance and correctness for go1.10.
One part of these functions is at least one check for
(Update: the check can also have variants like
We have resorted to switches with constant divisions in growslice to improve performance for common cases. This introduces branch mis predicts and a larger function size which is bad for icaches.
Elsewhere we use maxSliceCap with precomputed values (
An alternative to the above optimizations that avoids data cache lookups and a division is to check
To be able to check overflow quickly and safely i propose the following:
Add a new unexported runtime function runtime.mul:
The function can be used like:
Let the compiler recognize runtime.mul as an intrinsic and replace it with optimized inline instructions where possible.
On architectures that provide overflow/carryflags this can be a mul and setting overflow according to the flag. Otherwise use the provided generic implementation. (Note that a*b is always returned also in the generic implementation to match optimizations in machine code.)
Thereby we can eliminate maxSliceCap and shrink the switch cases without loss of (much) performance but better branch prediction and cache use and shrinking the functions away from special cases again.
Possible future directions and ideas for later extra proposals
If the introduction of the above function and optimization looks ok i would like to work on implementing it and then replace existing runtime codepaths that would benefit from the new function.
@griesemer: As far as i know currently runtime can only import runtime, unsafe and runtime/internal/... not math. Since everything depends on runtime we would create an import cycle if we use math.mul in runtime. (https://github.com/golang/go/blob/master/src/cmd/go/internal/load/pkg.go)
Being runtime only also avoids introducing a new public api for now and runtime.mul can be easily deleted again or changed since it is private to the runtime.
math.Bits might be simple enough to include in runtime but this would need checks that this does not introduce any circular dependencies.
If we find it works well with runtime we can use the then existing compiler support to discuss a public api as a future independent proposal.
@robpike i think keeping runtime.mul simple leaves it more generally applicable to other cases of overflow checking at the same time as being used for _MaxMem checks.
Not all checks in runtime are against _MaxMem but can also be _MaxMem-someAmount. Where someAmount is known to be smaller than _MaxMem. Some uses round up len*elemSize before checking against _MaxMem. These cases are faster to check with help of mul when not directly comparing against _MaxMem.
In the later case depending on roundup a round up overflow to 0 needs to be prevented in roundup.
Updated the description to include that there can be variants of the _MaxMem check.
Seems unobjectionable to add such a runtime function, although it should probably go in runtime/internal/sys, with the other runtime intrinsics.
If we decide to add something like it to math/bits as well, we might decide to give the runtime flavor looser semantics for performance (like we have done with other runtime intrinsics that also appear in math/bits). FWIW, I recently wanted math/bits to have something like
Longer term, though, it seems perhaps better to fix this either via:
Note that with either of these, we should be able to get equivalent performance, with enough compiler help. (1) In the arbitrary precision case, the compiler could in theory recognize that the integer is in use only in one expression, which bounds its range, and replace it under the hood with a 128 bit version. (2) Working with a 128 bit multiply, the compiler can recognize that the top half of a multiply is being checked against 0, so it can substitute a pure overflow check there.