Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: don't generate newobject call for 0-sized types #29446

Closed
quasilyte opened this issue Dec 28, 2018 · 10 comments

Comments

Projects
None yet
4 participants
@quasilyte
Copy link
Contributor

commented Dec 28, 2018

Sometimes compiler generates a runtime.newobject(t) call where t size is statically known to be 0.

That call would return &runtime.zerobase:

go/src/runtime/malloc.go

Lines 809 to 816 in c043fc4

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
if gcphase == _GCmarktermination {
throw("mallocgc called with gcphase == _GCmarktermination")
}
if size == 0 {
return unsafe.Pointer(&zerobase)
}

While new(zeroSizedType) case is not very interesting, empty slice literals also emit a call to newobject (see below).

Instead of generating runtime.newobject call, compiler could insert the returned expression itself.

Impact on performance can be measured by this simple benchmark:

package benchmark

import (
	"testing"
)

var sinkStruct *struct{}
var sinkSlice []int

func BenchmarkNew(b *testing.B) {
	for i := 0; i < b.N; i++ {
		sinkStruct = new(struct{})
	}
}

func BenchmarkSliceLit(b *testing.B) {
	for i := 0; i < b.N; i++ {
		sinkSlice = []int{}
	}
}
name        old time/op    new time/op    delta
New-8         8.39ns ± 0%    1.29ns ± 6%  -84.59%  (p=0.000 n=9+10)
SliceLit-8    8.80ns ± 0%    1.88ns ± 0%  -78.63%  (p=0.000 n=9+9)

The impact on the code size is also positive.

func newSlice() []int { return []int{} }

Old generated code for newSlice (amd64/linux):

"".newSlice STEXT size=80 args=0x18 locals=0x18
	0x0000 00000 (foo.go:11)	TEXT	"".newSlice(SB), ABIInternal, $24-24
	0x0000 00000 (foo.go:11)	MOVQ	(TLS), CX
	0x0009 00009 (foo.go:11)	CMPQ	SP, 16(CX)
	0x000d 00013 (foo.go:11)	JLS	73
	0x000f 00015 (foo.go:11)	SUBQ	$24, SP
	0x0013 00019 (foo.go:11)	MOVQ	BP, 16(SP)
	0x0018 00024 (foo.go:11)	LEAQ	16(SP), BP
	0x001d 00029 (foo.go:11)	FUNCDATA	$0, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
	0x001d 00029 (foo.go:11)	FUNCDATA	$1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
	0x001d 00029 (foo.go:11)	FUNCDATA	$3, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
	0x001d 00029 (foo.go:12)	PCDATA	$2, $1
	0x001d 00029 (foo.go:12)	PCDATA	$0, $0
	0x001d 00029 (foo.go:12)	LEAQ	type.[0]int(SB), AX
	0x0024 00036 (foo.go:12)	PCDATA	$2, $0
	0x0024 00036 (foo.go:12)	MOVQ	AX, (SP)
	0x0028 00040 (foo.go:12)	CALL	runtime.newobject(SB)
	0x002d 00045 (foo.go:12)	PCDATA	$2, $1
	0x002d 00045 (foo.go:12)	MOVQ	8(SP), AX
	0x0032 00050 (foo.go:12)	PCDATA	$2, $0
	0x0032 00050 (foo.go:12)	PCDATA	$0, $1
	0x0032 00050 (foo.go:12)	MOVQ	AX, "".~r0+32(SP)
	0x0037 00055 (foo.go:12)	XORPS	X0, X0
	0x003a 00058 (foo.go:12)	MOVUPS	X0, "".~r0+40(SP)
	0x003f 00063 (foo.go:12)	MOVQ	16(SP), BP
	0x0044 00068 (foo.go:12)	ADDQ	$24, SP
	0x0048 00072 (foo.go:12)	RET
	0x0049 00073 (foo.go:12)	NOP
	0x0049 00073 (foo.go:11)	PCDATA	$0, $-1
	0x0049 00073 (foo.go:11)	PCDATA	$2, $-1
	0x0049 00073 (foo.go:11)	CALL	runtime.morestack_noctxt(SB)
	0x004e 00078 (foo.go:11)	JMP	0

New generated code for newSlice:

"".newSlice STEXT nosplit size=21 args=0x18 locals=0x0
	0x0000 00000 (foo.go:10)	TEXT	"".newSlice(SB), NOSPLIT|ABIInternal, $0-24
	0x0000 00000 (foo.go:10)	FUNCDATA	$0, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
	0x0000 00000 (foo.go:10)	FUNCDATA	$1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
	0x0000 00000 (foo.go:10)	FUNCDATA	$3, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
	0x0000 00000 (foo.go:11)	PCDATA	$2, $1
	0x0000 00000 (foo.go:11)	PCDATA	$0, $1
	0x0000 00000 (foo.go:11)	LEAQ	runtime.zerobase(SB), AX
	0x0007 00007 (foo.go:11)	PCDATA	$2, $0
	0x0007 00007 (foo.go:11)	MOVQ	AX, "".~r0+8(SP)
	0x000c 00012 (foo.go:11)	XORPS	X0, X0
	0x000f 00015 (foo.go:11)	MOVUPS	X0, "".~r0+16(SP)
	0x0014 00020 (foo.go:11)	RET

The important part is that there is no more call to runtime.newobject(SB).

I'll send a CL with that optimization applied.

@gopherbot

This comment has been minimized.

Copy link

commented Dec 28, 2018

Change https://golang.org/cl/155840 mentions this issue: cmd/compile: don't generate newobject call for 0-sized types

@mvdan

This comment has been minimized.

Copy link
Member

commented Dec 28, 2018

What's the size impact on a large Go binary like cmd/go? Please include that little stat in the commit message too.

@mvdan mvdan added the Performance label Dec 28, 2018

@quasilyte

This comment has been minimized.

Copy link
Contributor Author

commented Dec 28, 2018

@mvdan, for cmd/go there is 0 change in code size.
Optimization does trigger several times during compilation, but it doesn't seem to have an effect on that particular binary.

@mvdan

This comment has been minimized.

Copy link
Member

commented Dec 28, 2018

Huh, I'd expect this to happen often and to shave off at least a few kilobytes from most large binaries. Did you check if the compiled binary changes at all?

@quasilyte

This comment has been minimized.

Copy link
Contributor Author

commented Dec 28, 2018

cmd/go does not have any diff.
cmd/gofmt, however, does hove some.

bincmp gofmt_old gofmt
binary  delta  old      new
gofmt   0      3467274  3467274        0.00%

symbol name           delta  old     new
bytes.Join            -14    971     957          -1.44%
os.runtime_args       -2     249     247          -0.80%
runtime.pclntab       -32    805225  805193       -0.00%
runtime.typelink      -8     4704    4696         -0.17%
sync.poolCleanup      -7     545     538          -1.28%
syscall.runtime_envs  -2     249     247          -0.80%
total                 -65    811943  811878       -0.01%

name            delta  old      new
.gopclntab      -32    805225   805193        -0.00%
.rodata         -224   489886   489662        -0.05%
.text           -32    1073489  1073457       -0.00%
.typelink       -8     4704     4696          -0.17%
.zdebug_frame   -34    41610    41576         -0.08%
.zdebug_info    -52    295260   295208        -0.02%
.zdebug_line    3      148607   148610         0.00%
.zdebug_loc     -21    162363   162342        -0.01%
.zdebug_ranges  14     53946    53960          0.03%
total           -386   3075090  3074704       -0.01%
@josharian

This comment has been minimized.

Copy link
Contributor

commented Dec 29, 2018

I have a handful of newobject changes, including this one, in my tree. I didn’t mail this one because it basically never triggers. Others include specializing newobject in various ways. I didn’t mail the rest because I think we should move newobject to SSA construction world first.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Dec 30, 2018

To be clear, I’m game to see the optimization go in. But I do think we should move that code before it gets too complex. I peeked at the other things I was playing with. One was a specialized newstring, which doesn’t need a typ arg. (Use SoleComponent for best effect.) Another was for newobject for SSA-able types containing no pointers. In that case you can allocate without zeroing and then zero on the caller side, in the hope that that zeroing will be optimized away in favor of later writes. Just in case you wanted to see either of those through. :) One minor complication is that newobject is treated as special throughout SSA world.

@mvdan

This comment has been minimized.

Copy link
Member

commented Dec 30, 2018

I didn’t mail this one because it basically never triggers.

I'm a bit confused. I would imagine that statements like sinkSlice = []int{} in the benchmark above would be very common.

Also, if this optimization basically never triggers, how come gofmt got a bit smaller?

@quasilyte

This comment has been minimized.

Copy link
Contributor Author

commented Dec 30, 2018

Also, if this optimization basically never triggers, how come gofmt got a bit smaller?

It does trigger for empty slices as well as new(T) calls where T size is 0.
The latter can be the 0 frequency case, but maybe empty slices were not reduced to newobject call previously?

offtopic 2019 is coming 🎄 :)
@josharian

This comment has been minimized.

Copy link
Contributor

commented Mar 17, 2019

https://go-review.googlesource.com/c/go/+/167957 moves newobject handling to ssa conversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.