Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: eliminate redundant zeroing after lower pass #47107

Open
tdakkota opened this issue Jul 9, 2021 · 6 comments
Open

cmd/compile: eliminate redundant zeroing after lower pass #47107

tdakkota opened this issue Jul 9, 2021 · 6 comments

Comments

@tdakkota
Copy link

@tdakkota tdakkota commented Jul 9, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16.4 windows/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GO111MODULE=on
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\tdakkota\AppData\Local\go-build
set GOENV=C:\Users\tdakkota\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=C:\Users\tdakkota\go\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=C:\Users\tdakkota\go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=C:\Go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GOVCS=
set GOVERSION=go1.16.4
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=C:\Users\tdakkota\GolandProjects\gh\abc\go.mod
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0

What did you do?

https://go.godbolt.org/z/64qsjGTcd

package foo

func foo(v uint64) (b [8]byte) {
	b[0] = byte(v)
	b[1] = byte(v >> 8)
	b[2] = byte(v >> 16)
	b[3] = byte(v >> 24)
	b[4] = byte(v >> 32)
	b[5] = byte(v >> 40)
	b[6] = byte(v >> 48)
	b[7] = byte(v >> 56)
	return b
}

What did you expect to see?

Compiler should generate

        MOVQ    "".v+8(SP), AX
        MOVQ    AX, "".b+16(SP)

What did you see instead?

Compiler generates

        MOVQ    $0, "".b+16(SP) // redundant zeroing
        MOVQ    "".v+8(SP), AX
        MOVQ    AX, "".b+16(SP)

According to SSA dump and rules, I see that one-byte shift-and-moves are folded to one MOVQ during lower pass, so zeroing cannot be elimated during opt passes.

(MOVLstore [i] {s} p (SHRQconst [32] w) x:(MOVLstore [i-4] {s} p w mem))
&& x.Uses == 1
&& clobber(x)
=> (MOVQstore [i-4] {s} p w mem)
(MOVLstore [i] {s} p (SHRQconst [j] w) x:(MOVLstore [i-4] {s} p w0:(SHRQconst [j-32] w) mem))
&& x.Uses == 1
&& clobber(x)
=> (MOVQstore [i-4] {s} p w0 mem)
(MOVLstore [i] {s} p1 (SHRQconst [32] w) x:(MOVLstore [i] {s} p0 w mem))
&& x.Uses == 1
&& sequentialAddresses(p0, p1, 4)
&& clobber(x)
=> (MOVQstore [i] {s} p0 w mem)
(MOVLstore [i] {s} p1 (SHRQconst [j] w) x:(MOVLstore [i] {s} p0 w0:(SHRQconst [j-32] w) mem))
&& x.Uses == 1
&& sequentialAddresses(p0, p1, 4)
&& clobber(x)
=> (MOVQstore [i] {s} p0 w0 mem)

@tdakkota tdakkota changed the title cmd/compile: elimate redundant zeroing after lower pass cmd/compile: eliminate redundant zeroing after lower pass Jul 9, 2021
@mknyszek mknyszek added this to the Backlog milestone Jul 9, 2021
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Jul 9, 2021

@randall77
Copy link
Contributor

@randall77 randall77 commented Jul 9, 2021

The zeroing could still be detected as unnecessary in the current dead store pass before the writes are folded into one MOVQ.
I think that pass is just confused by the LocalAddr reading the memory state between the byte writes. Maybe that's all that needs fixing.

@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented Jul 14, 2021

I don't think dse can elide this.
dse using shadowed set looking for shadow size bigger than previous store.
b[i] = byte(a >> x) only try to looking for one byte while zero [8]byte has 8 bytes in shadow.

@mengzhuo
Copy link
Contributor

@mengzhuo mengzhuo commented Jul 14, 2021

I tried this rule that elided the redundant zero store rule
@randall77 is this OK to you and can I submit a CL for this issue?

(MOVQstore [0] {sym2} destptr2 val a:(MOVQstoreconst [makeValAndOff(0,0)] {sym1} destptr1 mem))
        && sym2 == sym1
        && destptr2 == destptr1
        => (MOVQstore [0] {sym2} destptr2 val mem)

@randall77
Copy link
Contributor

@randall77 randall77 commented Jul 14, 2021

I don't think dse can elide this.
dse using shadowed set looking for shadow size bigger than previous store.

Yes, that's the current problem. It's not combining the ranges of the individual byte writes, so when it gets to the 8-byte Zero it is bigger than the shadow size. We need a more robust computation of what is shadowed. We need to combine byte writes into larger regions.

v10 (4) = LocalAddr <*[8]byte> {b} v2 v8
v13 (4) = OffPtr <*byte> [0] v10
v76 (4) = Store <mem> {byte} v13 v9 v8
v18 (5) = LocalAddr <*[8]byte> {b} v2 v76
v20 (5) = OffPtr <*byte> [1] v18
v75 (5) = Store <mem> {byte} v20 v17 v76

This should shadow 2 bytes at v10/v18. Currently dse treats those LocalAddrs as reads, which prevents combination. Then we'll need to implement combination as well.

@randall77 is this OK to you and can I submit a CL for this issue?

I'd really rather fix dse. Your rule works only for 8-byte constant stores, and only for amd64. We'd need a lot more rules to fix it for all cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants