-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Description
This patch aims to extend this optimization c9d89f6 to slice of structs.
I have written some benchmarks in the encoding/binary package to measure the impact of the change when encoding a slice of structs.
Profiling those benchmarks show that there is an absurd number of allocations when writing a slice of structs caused by reflection. Allocations at reflect.(*structType).Field account for ~75% of the total allocs made as shown below.
Allocations without patch
root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# ../../../bin/go test -run='^$' -memprofile memprofile.out -benchmem -bench BenchmarkWriteSlice1000Structs -count=20
root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# ../../../bin/go tool pprof --alloc_objects memprofile.out
warning: GOPATH set to GOROOT (/root/go) has no effect
File: binary.test
Type: alloc_objects
Time: Mar 11, 2024 at 8:09pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 10 -cum
Showing nodes accounting for 130688, 100% of 130735 total
Dropped 4 nodes (cum <= 653)
flat flat% sum% cum cum%
94 0.072% 0.072% 130735 100% encoding/binary.BenchmarkWriteSlice1000Structs
0 0% 0.072% 130735 100% testing.(*B).runN
0 0% 0.072% 130659 99.94% testing.(*B).launch
32289 24.70% 24.77% 130641 99.93% encoding/binary.Write
0 0% 24.77% 98305 75.19% encoding/binary.dataSize
0 0% 24.77% 98305 75.19% encoding/binary.sizeof
0 0% 24.77% 98305 75.19% reflect.(*rtype).Field
98305 75.19% 100% 98305 75.19% reflect.(*structType).Field
This patch essentially also caches the result of binary.dataSize in the same sync.Map being used in the struct case. This leads to huge allocation savings in writing a slice of structs.
Allocations with patch
root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# ../../../bin/go test -run='^$' -memprofile memprofile.out -benchmem -bench BenchmarkWriteSlice1000Structs -count=20
root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# ../../../bin/go tool pprof --alloc_objects memprofile.out
File: binary.test
Type: alloc_objects
Time: Mar 11, 2024 at 6:31pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 10 -cum
Showing nodes accounting for 32228, 99.81% of 32289 total
Dropped 4 nodes (cum <= 161)
flat flat% sum% cum cum%
56 0.17% 0.17% 32289 100% encoding/binary.BenchmarkWriteSlice1000Structs
0 0% 0.17% 32289 100% testing.(*B).runN
32172 99.64% 99.81% 32233 99.83% encoding/binary.Write
0 0% 99.81% 32224 99.80% testing.(*B).launch
(pprof)
Running benchstat on both implementations shows these savings for writes and reads.
root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# benchstat old.txt new.txt
goos: linux
goarch: amd64
pkg: encoding/binary
cpu: DO-Regular
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
WriteSlice1000Structs-2 846.7µ ± 4% 856.4µ ± 3% ~ (p=0.602 n=20)
│ old.txt │ new.txt │
│ B/s │ B/s vs base │
WriteSlice1000Structs-2 84.48Mi ± 4% 83.52Mi ± 3% ~ (p=0.602 n=20)
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
WriteSlice1000Structs-2 80.18Ki ± 0% 80.06Ki ± 0% -0.15% (p=0.000 n=20)
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
WriteSlice1000Structs-2 16.000 ± 0% 1.000 ± 0% -93.75% (p=0.000 n=20)
root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# benchstat old.txt new.txt
goos: linux
goarch: amd64
pkg: encoding/binary
cpu: DO-Regular
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
ReadSlice1000Structs-2 847.4µ ± 4% 821.1µ ± 3% -3.10% (p=0.012 n=20)
│ old.txt │ new.txt │
│ B/s │ B/s vs base │
ReadSlice1000Structs-2 84.40Mi ± 4% 87.11Mi ± 3% +3.20% (p=0.012 n=20)
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
ReadSlice1000Structs-2 80.12Ki ± 0% 80.00Ki ± 0% -0.15% (p=0.000 n=20)
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
ReadSlice1000Structs-2 16.000 ± 0% 1.000 ± 0% -93.75% (p=0.000 n=20)