Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: intrinsify math/bits.RotateLeft{32,64} on all architectures #31265

Open
mundaym opened this issue Apr 4, 2019 · 9 comments
Open

cmd/compile: intrinsify math/bits.RotateLeft{32,64} on all architectures #31265

mundaym opened this issue Apr 4, 2019 · 9 comments

Comments

@mundaym
Copy link
Member

@mundaym mundaym commented Apr 4, 2019

#21536 (comment) reminded me about this. Currently the intrinsic is only implemented on amd64, arm64, ppc64 and s390x. We should enable it everywhere. It will need tests in tests/codegen. Could be a good beginner compiler issue to tackle.

CLs so far:

@mundaym mundaym added this to the Unplanned milestone Apr 4, 2019
@josharian

This comment has been minimized.

Copy link
Contributor

@josharian josharian commented Apr 4, 2019

cc @benshi001 for arm

@bcmills bcmills added the NeedsFix label Apr 12, 2019
@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Apr 28, 2019

Kindly paging @bmkessler who might also be interested in this issue.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Jun 14, 2019

Change https://golang.org/cl/182359 mentions this issue: cmd/compile: intrinsify RotateLeft32 on wasm

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Aug 8, 2019

Change https://golang.org/cl/188697 mentions this issue: cmd/compile: optimize ARM's math.bits.RotateLeft32

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Aug 8, 2019

Change https://golang.org/cl/189277 mentions this issue: cmd/compile: optimize 386's math.bits.RotateLeft32

gopherbot pushed a commit that referenced this issue Aug 28, 2019
This CL optimizes math.bits.RotateLeft32 to inline
"MOVW Rx@>Ry, Rd" on ARM.

The benchmark results of math/bits show some improvements.
name               old time/op  new time/op  delta
RotateLeft-4       9.42ns ± 0%  6.91ns ± 0%  -26.66%  (p=0.000 n=40+33)
RotateLeft8-4      8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+31)
RotateLeft16-4     8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+32)
RotateLeft32-4     8.16ns ± 0%  7.54ns ± 0%   -7.68%  (p=0.000 n=40+40)
RotateLeft64-4     15.7ns ± 0%  15.7ns ± 0%     ~     (all equal)

updates #31265

Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712
Reviewed-on: https://go-review.googlesource.com/c/go/+/188697
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
gopherbot pushed a commit that referenced this issue Aug 31, 2019
wasm has 32-bit versions of all integer operations. This change
lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits
call.  Benchmarking on amd64 under node.js this is ~25% faster.

node v10.15.3/amd64
name          old time/op  new time/op  delta
RotateLeft    8.37ns ± 1%  8.28ns ± 0%   -1.05%  (p=0.029 n=4+4)
RotateLeft8   11.9ns ± 1%  11.8ns ± 0%     ~     (p=0.167 n=5+5)
RotateLeft16  11.8ns ± 0%  11.8ns ± 0%     ~     (all equal)
RotateLeft32  11.9ns ± 1%   8.7ns ± 0%  -26.32%  (p=0.008 n=5+5)
RotateLeft64  8.31ns ± 1%  8.43ns ± 2%     ~     (p=0.063 n=5+5)

Updates #31265

Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46
Reviewed-on: https://go-review.googlesource.com/c/go/+/182359
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
tomocy added a commit to tomocy/go that referenced this issue Sep 1, 2019
This CL optimizes math.bits.RotateLeft32 to inline
"MOVW Rx@>Ry, Rd" on ARM.

The benchmark results of math/bits show some improvements.
name               old time/op  new time/op  delta
RotateLeft-4       9.42ns ± 0%  6.91ns ± 0%  -26.66%  (p=0.000 n=40+33)
RotateLeft8-4      8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+31)
RotateLeft16-4     8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+32)
RotateLeft32-4     8.16ns ± 0%  7.54ns ± 0%   -7.68%  (p=0.000 n=40+40)
RotateLeft64-4     15.7ns ± 0%  15.7ns ± 0%     ~     (all equal)

updates golang#31265

Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712
Reviewed-on: https://go-review.googlesource.com/c/go/+/188697
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
tomocy added a commit to tomocy/go that referenced this issue Sep 1, 2019
wasm has 32-bit versions of all integer operations. This change
lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits
call.  Benchmarking on amd64 under node.js this is ~25% faster.

node v10.15.3/amd64
name          old time/op  new time/op  delta
RotateLeft    8.37ns ± 1%  8.28ns ± 0%   -1.05%  (p=0.029 n=4+4)
RotateLeft8   11.9ns ± 1%  11.8ns ± 0%     ~     (p=0.167 n=5+5)
RotateLeft16  11.8ns ± 0%  11.8ns ± 0%     ~     (all equal)
RotateLeft32  11.9ns ± 1%   8.7ns ± 0%  -26.32%  (p=0.008 n=5+5)
RotateLeft64  8.31ns ± 1%  8.43ns ± 2%     ~     (p=0.063 n=5+5)

Updates golang#31265

Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46
Reviewed-on: https://go-review.googlesource.com/c/go/+/182359
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
t4n6a1ka added a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
This CL optimizes math.bits.RotateLeft32 to inline
"MOVW Rx@>Ry, Rd" on ARM.

The benchmark results of math/bits show some improvements.
name               old time/op  new time/op  delta
RotateLeft-4       9.42ns ± 0%  6.91ns ± 0%  -26.66%  (p=0.000 n=40+33)
RotateLeft8-4      8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+31)
RotateLeft16-4     8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+32)
RotateLeft32-4     8.16ns ± 0%  7.54ns ± 0%   -7.68%  (p=0.000 n=40+40)
RotateLeft64-4     15.7ns ± 0%  15.7ns ± 0%     ~     (all equal)

updates golang#31265

Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712
Reviewed-on: https://go-review.googlesource.com/c/go/+/188697
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
t4n6a1ka added a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
wasm has 32-bit versions of all integer operations. This change
lowers RotateLeft32 to i32.rotl on wasm and intrinsifies the math/bits
call.  Benchmarking on amd64 under node.js this is ~25% faster.

node v10.15.3/amd64
name          old time/op  new time/op  delta
RotateLeft    8.37ns ± 1%  8.28ns ± 0%   -1.05%  (p=0.029 n=4+4)
RotateLeft8   11.9ns ± 1%  11.8ns ± 0%     ~     (p=0.167 n=5+5)
RotateLeft16  11.8ns ± 0%  11.8ns ± 0%     ~     (all equal)
RotateLeft32  11.9ns ± 1%   8.7ns ± 0%  -26.32%  (p=0.008 n=5+5)
RotateLeft64  8.31ns ± 1%  8.43ns ± 2%     ~     (p=0.063 n=5+5)

Updates golang#31265

Change-Id: I5b8e155978faeea536c4f6427ac9564d2f096a46
Reviewed-on: https://go-review.googlesource.com/c/go/+/182359
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Oct 17, 2019

Alright, and doing a checkup here on the statuses of the various GOARCHes

GOARCH CLs Done
386 89277
amd64 132435
arm 188697
arm64 122542
mips
mips64
mips64le
mipsle
ppc64 163760
ppc64le 163760
s390x 133035
wasm 182359

From the table above, we are currently lacking for the GOARCH variants of mips* and ppc64le.
Kindly pinging you @ceseo, might you have some bandwidth to work on a CL for ppc64le? @cherrymui @randall77 might y'all be available for mips* or have suggestions of other experts? I think we can get this in for Go1.14 :) And thank you everyone who has submitted CLs and reviewed or discussed getting CLs to solve this, great stuff!

@cherrymui

This comment has been minimized.

Copy link
Contributor

@cherrymui cherrymui commented Oct 17, 2019

CL 163760 should cover both ppc64 and ppc64le, right?

On MIPS, the rotate instruction is rather new. We still support older machines. I guess we could do something based on GOMIPS(64) value, but we need to introduce the values first. Two shifts + an OR doesn't sounds too bad.

@laboger

This comment has been minimized.

Copy link
Contributor

@laboger laboger commented Oct 17, 2019

@mengzhuo

This comment has been minimized.

Copy link
Contributor

@mengzhuo mengzhuo commented Oct 17, 2019

CL 163760 should cover both ppc64 and ppc64le, right?

On MIPS, the rotate instruction is rather new. We still support older machines. I guess we could do something based on GOMIPS(64) value, but we need to introduce the values first. Two shifts + an OR doesn't sounds too bad.

The minimal requirement of MIPS64X is MIPS III, which is introduced in 1991 according to Wikipedia.

I'm thinking update Go minimal requirement of MIPS64X to MIPS64 R2 (not MIPS II) as MIPS32X do.

And we can adding compile flag

GOMIPS64=r*

r2 = release 2
r6 = release 6

For the future MIPS64 r6, which is whole new instruction set.

We already had GOMIPS64=hardfloat and won't conflict with r* flag by separate them with comma, i.e. GOMIPS64=hardfloat,r2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.