Closed
Description
Sometimes compiler generates a runtime.newobject(t)
call where t
size is statically known to be 0.
That call would return &runtime.zerobase
:
Lines 809 to 816 in c043fc4
While new(zeroSizedType)
case is not very interesting, empty slice literals also emit a call to newobject
(see below).
Instead of generating runtime.newobject
call, compiler could insert the returned expression itself.
Impact on performance can be measured by this simple benchmark:
package benchmark
import (
"testing"
)
var sinkStruct *struct{}
var sinkSlice []int
func BenchmarkNew(b *testing.B) {
for i := 0; i < b.N; i++ {
sinkStruct = new(struct{})
}
}
func BenchmarkSliceLit(b *testing.B) {
for i := 0; i < b.N; i++ {
sinkSlice = []int{}
}
}
name old time/op new time/op delta
New-8 8.39ns ± 0% 1.29ns ± 6% -84.59% (p=0.000 n=9+10)
SliceLit-8 8.80ns ± 0% 1.88ns ± 0% -78.63% (p=0.000 n=9+9)
The impact on the code size is also positive.
func newSlice() []int { return []int{} }
Old generated code for newSlice
(amd64/linux):
"".newSlice STEXT size=80 args=0x18 locals=0x18
0x0000 00000 (foo.go:11) TEXT "".newSlice(SB), ABIInternal, $24-24
0x0000 00000 (foo.go:11) MOVQ (TLS), CX
0x0009 00009 (foo.go:11) CMPQ SP, 16(CX)
0x000d 00013 (foo.go:11) JLS 73
0x000f 00015 (foo.go:11) SUBQ $24, SP
0x0013 00019 (foo.go:11) MOVQ BP, 16(SP)
0x0018 00024 (foo.go:11) LEAQ 16(SP), BP
0x001d 00029 (foo.go:11) FUNCDATA $0, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
0x001d 00029 (foo.go:11) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
0x001d 00029 (foo.go:11) FUNCDATA $3, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
0x001d 00029 (foo.go:12) PCDATA $2, $1
0x001d 00029 (foo.go:12) PCDATA $0, $0
0x001d 00029 (foo.go:12) LEAQ type.[0]int(SB), AX
0x0024 00036 (foo.go:12) PCDATA $2, $0
0x0024 00036 (foo.go:12) MOVQ AX, (SP)
0x0028 00040 (foo.go:12) CALL runtime.newobject(SB)
0x002d 00045 (foo.go:12) PCDATA $2, $1
0x002d 00045 (foo.go:12) MOVQ 8(SP), AX
0x0032 00050 (foo.go:12) PCDATA $2, $0
0x0032 00050 (foo.go:12) PCDATA $0, $1
0x0032 00050 (foo.go:12) MOVQ AX, "".~r0+32(SP)
0x0037 00055 (foo.go:12) XORPS X0, X0
0x003a 00058 (foo.go:12) MOVUPS X0, "".~r0+40(SP)
0x003f 00063 (foo.go:12) MOVQ 16(SP), BP
0x0044 00068 (foo.go:12) ADDQ $24, SP
0x0048 00072 (foo.go:12) RET
0x0049 00073 (foo.go:12) NOP
0x0049 00073 (foo.go:11) PCDATA $0, $-1
0x0049 00073 (foo.go:11) PCDATA $2, $-1
0x0049 00073 (foo.go:11) CALL runtime.morestack_noctxt(SB)
0x004e 00078 (foo.go:11) JMP 0
New generated code for newSlice
:
"".newSlice STEXT nosplit size=21 args=0x18 locals=0x0
0x0000 00000 (foo.go:10) TEXT "".newSlice(SB), NOSPLIT|ABIInternal, $0-24
0x0000 00000 (foo.go:10) FUNCDATA $0, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
0x0000 00000 (foo.go:10) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
0x0000 00000 (foo.go:10) FUNCDATA $3, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
0x0000 00000 (foo.go:11) PCDATA $2, $1
0x0000 00000 (foo.go:11) PCDATA $0, $1
0x0000 00000 (foo.go:11) LEAQ runtime.zerobase(SB), AX
0x0007 00007 (foo.go:11) PCDATA $2, $0
0x0007 00007 (foo.go:11) MOVQ AX, "".~r0+8(SP)
0x000c 00012 (foo.go:11) XORPS X0, X0
0x000f 00015 (foo.go:11) MOVUPS X0, "".~r0+16(SP)
0x0014 00020 (foo.go:11) RET
The important part is that there is no more call to runtime.newobject(SB)
.
I'll send a CL with that optimization applied.