Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
math/big: optimize amd64 asm shlVU and shrVU for shift==0 case #31097
This adds branches for s == 0 and s == 0 && z.base == x.base to shlVU and shrVU. In the first case runtime.memmove is called, while in the second case we just return. Tests are also added for the new branches. go test -bench Float run 20 times and compared with benchstat does not show any meaningful difference in performance. Fixes golang#31097
This adds branches for s == 0 and s == 0 && z.base == x.base to shlVU and shrVU. In the first case runtime.memmove is called, while in the second case we just return. Tests and benchmarks are also added for the new branches. Benchmarked on AMD64 Linux on i5-8300H: name old time/op new time/op delta ShlVUCopy1e7-8 16.0ms ± 0% 11.1ms ± 1% -30.79% (p=0.000 n=10+19) ShlVUNop1e7-8 10.5ms ± 1% 0.0ms ± 0% -100.00% (p=0.000 n=9+20) ShrVUCopy1e7-8 15.5ms ± 0% 11.1ms ± 1% -28.55% (p=0.000 n=8+18) ShrVUNop1e7-8 10.3ms ± 2% 0.0ms ± 0% -100.00% (p=0.000 n=9+20) Fixes golang#31097
DO NOT MAIL TODO: shrVU too TODO: benchmarks TODO: fuzz for confidence TODO: better commit message When shift == 0, shlVU and shrVU reduce to a memcopy. When z.ptr == x.ptr, it further reduces to a no-op. The pure Go implementation has these optimizations, as of https://go-review.googlesource.com/c/go/+/164967. The arm64 implementation has one of them (see golang#31084 (comment)). We should add both to the amd64 implementation. cc @griesemer Fixes golang#31097 Change-Id: I3979d7c82a63e1840c8191636a8947e8f440af3b
Good question. As of this moment there aren’t any pure go wrappers for these functions—they all go straight to the assembly implementations. Now that we have mid-stack inlining, it might make sense to change that, and do optimizations like this in the wrappers, so they can skip the call entirely. Want to experiment and send a CL for 1.16 if appropriate?