Skip to content

cmd/compile: don't generate newobject call for 0-sized types #29446

Closed
@quasilyte

Description

@quasilyte

Sometimes compiler generates a runtime.newobject(t) call where t size is statically known to be 0.

That call would return &runtime.zerobase:

go/src/runtime/malloc.go

Lines 809 to 816 in c043fc4

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
if gcphase == _GCmarktermination {
throw("mallocgc called with gcphase == _GCmarktermination")
}
if size == 0 {
return unsafe.Pointer(&zerobase)
}

While new(zeroSizedType) case is not very interesting, empty slice literals also emit a call to newobject (see below).

Instead of generating runtime.newobject call, compiler could insert the returned expression itself.

Impact on performance can be measured by this simple benchmark:

package benchmark

import (
	"testing"
)

var sinkStruct *struct{}
var sinkSlice []int

func BenchmarkNew(b *testing.B) {
	for i := 0; i < b.N; i++ {
		sinkStruct = new(struct{})
	}
}

func BenchmarkSliceLit(b *testing.B) {
	for i := 0; i < b.N; i++ {
		sinkSlice = []int{}
	}
}
name        old time/op    new time/op    delta
New-8         8.39ns ± 0%    1.29ns ± 6%  -84.59%  (p=0.000 n=9+10)
SliceLit-8    8.80ns ± 0%    1.88ns ± 0%  -78.63%  (p=0.000 n=9+9)

The impact on the code size is also positive.

func newSlice() []int { return []int{} }

Old generated code for newSlice (amd64/linux):

"".newSlice STEXT size=80 args=0x18 locals=0x18
	0x0000 00000 (foo.go:11)	TEXT	"".newSlice(SB), ABIInternal, $24-24
	0x0000 00000 (foo.go:11)	MOVQ	(TLS), CX
	0x0009 00009 (foo.go:11)	CMPQ	SP, 16(CX)
	0x000d 00013 (foo.go:11)	JLS	73
	0x000f 00015 (foo.go:11)	SUBQ	$24, SP
	0x0013 00019 (foo.go:11)	MOVQ	BP, 16(SP)
	0x0018 00024 (foo.go:11)	LEAQ	16(SP), BP
	0x001d 00029 (foo.go:11)	FUNCDATA	$0, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
	0x001d 00029 (foo.go:11)	FUNCDATA	$1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
	0x001d 00029 (foo.go:11)	FUNCDATA	$3, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
	0x001d 00029 (foo.go:12)	PCDATA	$2, $1
	0x001d 00029 (foo.go:12)	PCDATA	$0, $0
	0x001d 00029 (foo.go:12)	LEAQ	type.[0]int(SB), AX
	0x0024 00036 (foo.go:12)	PCDATA	$2, $0
	0x0024 00036 (foo.go:12)	MOVQ	AX, (SP)
	0x0028 00040 (foo.go:12)	CALL	runtime.newobject(SB)
	0x002d 00045 (foo.go:12)	PCDATA	$2, $1
	0x002d 00045 (foo.go:12)	MOVQ	8(SP), AX
	0x0032 00050 (foo.go:12)	PCDATA	$2, $0
	0x0032 00050 (foo.go:12)	PCDATA	$0, $1
	0x0032 00050 (foo.go:12)	MOVQ	AX, "".~r0+32(SP)
	0x0037 00055 (foo.go:12)	XORPS	X0, X0
	0x003a 00058 (foo.go:12)	MOVUPS	X0, "".~r0+40(SP)
	0x003f 00063 (foo.go:12)	MOVQ	16(SP), BP
	0x0044 00068 (foo.go:12)	ADDQ	$24, SP
	0x0048 00072 (foo.go:12)	RET
	0x0049 00073 (foo.go:12)	NOP
	0x0049 00073 (foo.go:11)	PCDATA	$0, $-1
	0x0049 00073 (foo.go:11)	PCDATA	$2, $-1
	0x0049 00073 (foo.go:11)	CALL	runtime.morestack_noctxt(SB)
	0x004e 00078 (foo.go:11)	JMP	0

New generated code for newSlice:

"".newSlice STEXT nosplit size=21 args=0x18 locals=0x0
	0x0000 00000 (foo.go:10)	TEXT	"".newSlice(SB), NOSPLIT|ABIInternal, $0-24
	0x0000 00000 (foo.go:10)	FUNCDATA	$0, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
	0x0000 00000 (foo.go:10)	FUNCDATA	$1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
	0x0000 00000 (foo.go:10)	FUNCDATA	$3, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
	0x0000 00000 (foo.go:11)	PCDATA	$2, $1
	0x0000 00000 (foo.go:11)	PCDATA	$0, $1
	0x0000 00000 (foo.go:11)	LEAQ	runtime.zerobase(SB), AX
	0x0007 00007 (foo.go:11)	PCDATA	$2, $0
	0x0007 00007 (foo.go:11)	MOVQ	AX, "".~r0+8(SP)
	0x000c 00012 (foo.go:11)	XORPS	X0, X0
	0x000f 00015 (foo.go:11)	MOVUPS	X0, "".~r0+16(SP)
	0x0014 00020 (foo.go:11)	RET

The important part is that there is no more call to runtime.newobject(SB).

I'll send a CL with that optimization applied.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions