New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: generate MOVZX instead of AND where applicable #15105

Open
martisch opened this Issue Apr 4, 2016 · 6 comments

Comments

Projects
None yet
6 participants
@martisch
Member

martisch commented Apr 4, 2016

I noticed that for uint&0xFF go tip on amd64 generates e.g.:

0x001d 00029 (main.go:10) MOVQ "".u+8(FP), DX
0x0022 00034 (main.go:10) ANDQ $255, DX

http://play.golang.org/p/CKhi_nt615

It might be possible to make these a zero latency MOV that does not require an ALU by using MOVZX for the applicable AND values or it might be possible to change the previous load to be MOVZX.

0x0022 00034 (main.go:10) MOVBQZX DL, DX

See "Intel 64 and IA-32 Architectures Optimization Reference Manual" Section: "3.5.1.13 Zero-Latency MOV Instructions"

@brtzsnr

This comment has been minimized.

Contributor

brtzsnr commented Apr 4, 2016

MOVQ "".u+8(FP), DX is a LoadReg of the argument into the DX register. I don't think we can remove this anytime soon.

@gopherbot

This comment has been minimized.

gopherbot commented Apr 4, 2016

CL https://golang.org/cl/21508 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 6, 2016

cmd/compile: replaces ANDQ with MOV?ZX
Where possible replace ANDQ with MOV?ZX.
Takes care that we don't regress wrt bounds checking,
for example [1000]int{}[i&255].

According to "Intel 64 and IA-32 Architectures Optimization Reference
Manual" Section: "3.5.1.13 Zero-Latency MOV Instructions"
MOV?ZX instructions have zero latency on newer processors.

Updates #15105

Change-Id: I63539fdbc5812d5563aa1ebc49eca035bd307997
Reviewed-on: https://go-review.googlesource.com/21508
Reviewed-by: Айнар Гарипов <gugl.zadolbal@gmail.com>
Reviewed-by: David Chase <drchase@google.com>

@bradfitz bradfitz added the Performance label Apr 7, 2016

@bradfitz bradfitz added this to the Unplanned milestone Apr 7, 2016

@randall77

This comment has been minimized.

Contributor

randall77 commented Aug 30, 2016

func f(x uint) uint {
    return x & 0xff
}
func g(x *uint) uint {
    return *x & 0xff
}

code from f:

    0x0000 00000 (/usr/local/google/home/khr/go/tmp1.go:4)  MOVQ    "".x+8(FP), AX
    0x0005 00005 (/usr/local/google/home/khr/go/tmp1.go:4)  MOVBLZX AL, AX

code from g:

    0x0000 00000 (/usr/local/google/home/khr/go/tmp1.go:7)  MOVQ    "".x+8(FP), AX
    0x0005 00005 (/usr/local/google/home/khr/go/tmp1.go:7)  MOVQ    (AX), CX
    0x0008 00008 (/usr/local/google/home/khr/go/tmp1.go:7)  MOVBLZX CL, CX

So we are now rewriting the AND to a MOVZX.

Alexandru is right, combining the extension with the load in f is harder because the MOVQ is generated from a LoadReg which doesn't appear in the SSA intermediate representation until regalloc. For g we could combine the extension and load. I'll send out a CL. It won't handle all cases, but the common case like in g it should be simple. (The harder cases are indexed loads and such where we would need lots more SSA opcodes to cover all the addressing modes.)

@josharian

This comment has been minimized.

Contributor

josharian commented Aug 30, 2016

The LoadReg case reminds me a lot of #15300.

I know you have serious reservations about a post-regalloc peep pass, Keith, but it does seem like it might be the right way to handle this small, specific class of optimizations. And perhaps with docs and code review we could keep it from growing past the bare minimum necessary.

@gopherbot

This comment has been minimized.

gopherbot commented Aug 30, 2016

CL https://golang.org/cl/28190 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 22, 2016

CL https://golang.org/cl/31639 mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 24, 2016

cmd/compile: replace ANDL with MOV?ZX
According to "Intel 64 and IA-32 Architectures Optimization Reference
Manual" Section: "3.5.1.13 Zero-Latency MOV Instructions"
MOV?ZX instructions have zero latency on newer processors.

during make.bash:
(ANDLconst [0xFF] x) -> (MOVBQZX x)
applies 422 times
(ANDLconst [0xFFFF] x) -> (MOVWQZX x)
applies 114 times

Updates #15105

Change-Id: I10933af599de3c26449c52f4b5cd859331028f39
Reviewed-on: https://go-review.googlesource.com/31639
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>

gopherbot pushed a commit that referenced this issue Oct 27, 2016

cmd/compile: combine some extensions with loads
For cases where we already have the ops, combine
sign or zero extension with the previous load
(even if the load is larger width).

Update #15105

Change-Id: I76c5ddd69e1f900d2a17d35503083bd3b4978e48
Reviewed-on: https://go-review.googlesource.com/28190
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment