Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/big: optimize amd64 asm shlVU and shrVU for shift==0 case #31097

Open
josharian opened this issue Mar 28, 2019 · 3 comments · May be fixed by #31171
Open

math/big: optimize amd64 asm shlVU and shrVU for shift==0 case #31097

josharian opened this issue Mar 28, 2019 · 3 comments · May be fixed by #31171
Assignees
Milestone

Comments

@josharian
Copy link
Contributor

@josharian josharian commented Mar 28, 2019

When shift == 0, shlVU and shrVU reduce to a memcopy. When z.ptr == x.ptr, it further reduces to a no-op. The pure Go implementation has these optimizations, as of https://go-review.googlesource.com/c/go/+/164967. The arm64 implementation has one of them (see #31084 (comment)). We should add both to the amd64 implementation.

cc @griesemer

@josharian josharian added this to the Go1.13 milestone Mar 28, 2019
@josharian josharian self-assigned this Mar 29, 2019
nsajko added a commit to nsajko/go-1 that referenced this issue Mar 31, 2019
This adds branches for s == 0 and s == 0 && z.base == x.base to shlVU
and shrVU. In the first case runtime.memmove is called, while in the
second case we just return.

Tests are also added for the new branches.

go test -bench Float run 20 times and compared with benchstat does not
show any meaningful difference in performance.

Fixes golang#31097
@gopherbot
Copy link

@gopherbot gopherbot commented Mar 31, 2019

Change https://golang.org/cl/170257 mentions this issue: math/big: optimize amd64 asm shlVU and shrVU for shift==0 case

nsajko added a commit to nsajko/go-1 that referenced this issue Mar 31, 2019
This adds branches for s == 0 and s == 0 && z.base == x.base to shlVU
and shrVU. In the first case runtime.memmove is called, while in the
second case we just return.

Tests and benchmarks are also added for the new branches.

Benchmarked on AMD64 Linux on i5-8300H:

name            old time/op  new time/op  delta
ShlVUCopy1e7-8  16.0ms ± 0%  11.1ms ± 1%   -30.79%  (p=0.000 n=10+19)
ShlVUNop1e7-8   10.5ms ± 1%   0.0ms ± 0%  -100.00%  (p=0.000 n=9+20)
ShrVUCopy1e7-8  15.5ms ± 0%  11.1ms ± 1%   -28.55%  (p=0.000 n=8+18)
ShrVUNop1e7-8   10.3ms ± 2%   0.0ms ± 0%  -100.00%  (p=0.000 n=9+20)

Fixes golang#31097
josharian added a commit to josharian/go that referenced this issue May 7, 2019
DO NOT MAIL

TODO: shrVU too
TODO: benchmarks
TODO: fuzz for confidence
TODO: better commit message

When shift == 0, shlVU and shrVU reduce to a memcopy. When z.ptr == x.ptr, it further reduces to a no-op. The pure Go implementation has these optimizations, as of https://go-review.googlesource.com/c/go/+/164967. The arm64 implementation has one of them (see golang#31084 (comment)). We should add both to the amd64 implementation.

cc @griesemer

Fixes golang#31097

Change-Id: I3979d7c82a63e1840c8191636a8947e8f440af3b
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@nightlyone
Copy link
Contributor

@nightlyone nightlyone commented May 7, 2020

Can this be done in the wrappers/callers instead so the per-arch assembler as well as the generic can just assume that this optimization has been applied?

This also allows SSA to see where these conditions might be constant either now or in the future.

@josharian
Copy link
Contributor Author

@josharian josharian commented May 7, 2020

Good question. As of this moment there aren’t any pure go wrappers for these functions—they all go straight to the assembly implementations. Now that we have mid-stack inlining, it might make sense to change that, and do optimizations like this in the wrappers, so they can skip the call entirely. Want to experiment and send a CL for 1.16 if appropriate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants
You can’t perform that action at this time.