Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: double zeroing and unnecessary copying/stack use #67957

Closed
dominikh opened this issue Jun 12, 2024 · 10 comments
Closed

cmd/compile: double zeroing and unnecessary copying/stack use #67957

dominikh opened this issue Jun 12, 2024 · 10 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@dominikh
Copy link
Member

Go version

go version devel go1.23-722d59436b Thu May 16 02:00:26 2024 +0000 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/dominikh/.cache/go-build'
GOENV='/home/dominikh/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/dominikh/prj/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/dominikh/prj'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/dominikh/prj/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/dominikh/prj/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='devel go1.23-722d59436b Thu May 16 02:00:26 2024 +0000'
GODEBUG=''
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/dominikh/prj/src/example.com/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1152336647=/tmp/go-build -gno-record-gcc-switches'

What did you do?

go build -gcflags=-S the code at https://play.golang.org/p/tt0T6oOsQvr.go and look at the assembly generated for main

What did you see happen?

  • the same memory gets zeroed twice
  • we write to the stack only to copy from it to the global
main.main STEXT nosplit size=195 args=0x0 locals=0x50 funcid=0x0 align=0x0
        0x0000 00000 (foo.go:47)     TEXT    main.main(SB), NOSPLIT|ABIInternal, $80-0
        0x0000 00000 (foo.go:47)     PUSHQ   BP
        0x0001 00001 (foo.go:47)     MOVQ    SP, BP
        0x0004 00004 (foo.go:47)     SUBQ    $72, SP
        0x0008 00008 (foo.go:47)     FUNCDATA        $0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
        0x0008 00008 (foo.go:47)     FUNCDATA        $1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
        0x0008 00008 (foo.go:48)     MOVUPS  X15, main.~r0(SP)
        0x000d 00013 (foo.go:48)     MOVUPS  X15, main.~r0+8(SP)
        0x0013 00019 (foo.go:48)     MOVUPS  X15, main.~r0+24(SP)
        0x0019 00025 (foo.go:48)     MOVUPS  X15, main.~r0+40(SP)
        0x001f 00031 (foo.go:48)     MOVUPS  X15, main.~r0+56(SP)
        0x0025 00037 (<unknown line number>)    NOP
        0x0025 00037 (foo.go:48)     MOVUPS  X15, main.~r0(SP)
        0x002a 00042 (foo.go:48)     MOVUPS  X15, main.~r0+8(SP)
        0x0030 00048 (foo.go:48)     MOVUPS  X15, main.~r0+24(SP)
        0x0036 00054 (foo.go:48)     MOVUPS  X15, main.~r0+40(SP)
        0x003c 00060 (foo.go:48)     MOVUPS  X15, main.~r0+56(SP)
        0x0042 00066 (foo.go:37)     MOVQ    $1, main.~r0(SP)
        0x004a 00074 (foo.go:37)     MOVSD   $f64.4000000000000000(SB), X0
        0x0052 00082 (foo.go:37)     MOVSD   X0, main.~r0+8(SP)
        0x0058 00088 (foo.go:37)     MOVSD   $f64.4008000000000000(SB), X0
        0x0060 00096 (foo.go:37)     MOVSD   X0, main.~r0+16(SP)
        0x0066 00102 (foo.go:37)     MOVSD   $f64.4010000000000000(SB), X0
        0x006e 00110 (foo.go:37)     MOVSD   X0, main.~r0+24(SP)
        0x0074 00116 (foo.go:37)     MOVSD   $f64.4014000000000000(SB), X0
        0x007c 00124 (foo.go:37)     MOVSD   X0, main.~r0+32(SP)
        0x0082 00130 (foo.go:48)     MOVQ    main.~r0(SP), AX
        0x0086 00134 (foo.go:48)     MOVQ    AX, main.Sink(SB)
        0x008d 00141 (foo.go:48)     MOVUPS  main.~r0+8(SP), X0
        0x0092 00146 (foo.go:48)     MOVUPS  X0, main.Sink+8(SB)
        0x0099 00153 (foo.go:48)     MOVUPS  main.~r0+24(SP), X0
        0x009e 00158 (foo.go:48)     MOVUPS  X0, main.Sink+24(SB)
        0x00a5 00165 (foo.go:48)     MOVUPS  main.~r0+40(SP), X0
        0x00aa 00170 (foo.go:48)     MOVUPS  X0, main.Sink+40(SB)
        0x00b1 00177 (foo.go:48)     MOVUPS  main.~r0+56(SP), X0
        0x00b6 00182 (foo.go:48)     MOVUPS  X0, main.Sink+56(SB)
        0x00bd 00189 (foo.go:49)     ADDQ    $72, SP
        0x00c1 00193 (foo.go:49)     POPQ    BP
        0x00c2 00194 (foo.go:49)     RET

What did you expect to see?

At a minimum, we shouldn't zero the same memory twice. Ideally, we'd zero Sink and store constants to it, not using any stack.

/cc @golang/compiler

@dominikh dominikh added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 12, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jun 12, 2024
@dominikh dominikh changed the title cmd/compile: double zeroeing and unnecessary copying/stack use cmd/compile: double zeroing and unnecessary copying/stack use Jun 12, 2024
@gabyhelp
Copy link

@randall77
Copy link
Contributor

Hmm, I do not get the same generated code as you do. At 722d594, right?
In many ways, the generated code I'm seeing is worse.
The extra zeroing is not there, so that's good.
But the type switch in main has not been constant-propagated away, which yours apparently has. As a result my main is quite a bit longer than the result you showed (522 bytes, vs your code which is only 195 bytes).

Could you double-check the code you pasted and the Go stdlib git commit hash?

@dominikh
Copy link
Member Author

I've double-checked (and rebuild). I'm definitely on 722d594, with a clean working directory. And I can reproduce it by downloading the code I shared.

I've also tried with a newer master, ca5ba14, and am getting the same output. GODEBUG, GOEXPERIMENT, and GOAMD64 are all unset.

chulak ~ ^$ mkdir meow2
chulak ~ ^$ cd meow2
chulak ~/meow2 ^$ go version
go version devel go1.23-ca5ba146da Wed Jun 12 20:40:04 2024 +0000 linux/amd64
chulak ~/meow2 ^$ wget -q https://play.golang.org/p/tt0T6oOsQvr.go  
chulak ~/meow2 ^$ go build -gcflags=-S tt0T6oOsQvr.go |& head -40
# command-line-arguments
main.main STEXT nosplit size=195 args=0x0 locals=0x50 funcid=0x0 align=0x0
	0x0000 00000 (/home/dominikh/meow2/tt0T6oOsQvr.go:47)	TEXT	main.main(SB), NOSPLIT|ABIInternal, $80-0
	0x0000 00000 (/home/dominikh/meow2/tt0T6oOsQvr.go:47)	PUSHQ	BP
	0x0001 00001 (/home/dominikh/meow2/tt0T6oOsQvr.go:47)	MOVQ	SP, BP
	0x0004 00004 (/home/dominikh/meow2/tt0T6oOsQvr.go:47)	SUBQ	$72, SP
	0x0008 00008 (/home/dominikh/meow2/tt0T6oOsQvr.go:47)	FUNCDATA	$0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
	0x0008 00008 (/home/dominikh/meow2/tt0T6oOsQvr.go:47)	FUNCDATA	$1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
	0x0008 00008 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X15, main.~r0(SP)
	0x000d 00013 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X15, main.~r0+8(SP)
	0x0013 00019 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X15, main.~r0+24(SP)
	0x0019 00025 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X15, main.~r0+40(SP)
	0x001f 00031 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X15, main.~r0+56(SP)
	0x0025 00037 (<unknown line number>)	NOP
	0x0025 00037 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X15, main.~r0(SP)
	0x002a 00042 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X15, main.~r0+8(SP)
	0x0030 00048 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X15, main.~r0+24(SP)
	0x0036 00054 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X15, main.~r0+40(SP)
	0x003c 00060 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X15, main.~r0+56(SP)
	0x0042 00066 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)	MOVQ	$1, main.~r0(SP)
	0x004a 00074 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)	MOVSD	$f64.4000000000000000(SB), X0
	0x0052 00082 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)	MOVSD	X0, main.~r0+8(SP)
	0x0058 00088 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)	MOVSD	$f64.4008000000000000(SB), X0
	0x0060 00096 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)	MOVSD	X0, main.~r0+16(SP)
	0x0066 00102 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)	MOVSD	$f64.4010000000000000(SB), X0
	0x006e 00110 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)	MOVSD	X0, main.~r0+24(SP)
	0x0074 00116 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)	MOVSD	$f64.4014000000000000(SB), X0
	0x007c 00124 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)	MOVSD	X0, main.~r0+32(SP)
	0x0082 00130 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVQ	main.~r0(SP), AX
	0x0086 00134 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVQ	AX, main.Sink(SB)
	0x008d 00141 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	main.~r0+8(SP), X0
	0x0092 00146 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X0, main.Sink+8(SB)
	0x0099 00153 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	main.~r0+24(SP), X0
	0x009e 00158 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X0, main.Sink+24(SB)
	0x00a5 00165 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	main.~r0+40(SP), X0
	0x00aa 00170 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X0, main.Sink+40(SB)
	0x00b1 00177 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	main.~r0+56(SP), X0
	0x00b6 00182 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)	MOVUPS	X0, main.Sink+56(SB)
	0x00bd 00189 (/home/dominikh/meow2/tt0T6oOsQvr.go:49)	ADDQ	$72, SP
	0x00c1 00193 (/home/dominikh/meow2/tt0T6oOsQvr.go:49)	POPQ	BP

@randall77
Copy link
Contributor

Ah, for some reason go build -gcflags=-S and go tool compile -S are resulting in different assembly output.
I will look further later.

@dominikh
Copy link
Member Author

Passing -p main (or any other value for -p) to go tool compile seems to produce output identical to go build.

@randall77
Copy link
Contributor

Ok, a bunch of things here.

  1. -p main matters because somehow that makes the dictionary entry constant folding happen, whereas without that the compiler doesn't realize the type switch is deadcodeable. Annoying, but might not be worth fixing as it only applies to raw tool compile use.
  2. The double zeroing is just the dead store elim pass being confused by an inline mark. Will fix.
  3. The storing of float constants could be better on amd64, also have a small CL for that.
  4. The reason why we materialize on the stack and then copy is somewhat subtle. Consider this code
type T struct{ a, b, c, d, e int }

func f(t *T, z int) {
	*t = T{1, z, z, z, z}
}

This code makes a temporary T on the stack, initializes it, then copies it to the destination.
Replace the 1 with a 0, and it writes everything to the destination directly.

The reason this happens is that we have rules (starting at cmd/compile/internal/ssa/_gen/generic.rules, line 2397) that break a Move down into individual stores if the source involved 4 or fewer stores. When the first field is a 0, its store gets eliminated because it is storing to a freshly zeroed object. So there are only 4 stores remaining and the rule kicks in. When the first field is a 1, there are 5 stores and the Move persists. When the Move is broken down into individual stores then further optimizations (store->load forwarding, I think) eventually eliminate the temporary altogether.

I don't see anything obvious to fix for number 4. We could change the constant, sure, but the rule description says, and I mostly agree, that we don't want to do this for arbitrary sized objects because we'd end up having to spill/restore all its individual components anyway.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/592596 mentions this issue: cmd/compile: store constant floats using integer constants

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/592615 mentions this issue: cmd/compile: don't treat an InlMark as a read during deadstore

@dominikh
Copy link
Member Author

Regarding 4 and "we'd end up having to spill/restore all its individual components anyway", maybe the count of stores shouldn't include constants being stored?

@randall77
Copy link
Contributor

@dominikh Sure, that would be an improvement. Maybe difficult to do though given how the rules are currently structured.

@randall77 randall77 modified the milestone: Go1.24 Jun 26, 2024
@randall77 randall77 self-assigned this Jun 26, 2024
gopherbot pushed a commit that referenced this issue Jul 23, 2024
x86 is better at storing constant ints than constant floats.
(It uses a constant directly in the instruction stream, instead of
loading it from a constant global memory.)

Noticed as part of #67957

Change-Id: I9b7b586ad8e0fe9ce245324f020e9526f82b209d
Reviewed-on: https://go-review.googlesource.com/c/go/+/592596
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Development

No branches or pull requests

4 participants