Skip to content

encoding/binary: cache dataSize result across invocations of Read and Write for slice of structs #66253

@kwakubiney

Description

@kwakubiney

This patch aims to extend this optimization c9d89f6 to slice of structs.
I have written some benchmarks in the encoding/binary package to measure the impact of the change when encoding a slice of structs.

Profiling those benchmarks show that there is an absurd number of allocations when writing a slice of structs caused by reflection. Allocations at reflect.(*structType).Field account for ~75% of the total allocs made as shown below.

Allocations without patch

root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# ../../../bin/go test -run='^$' -memprofile memprofile.out -benchmem -bench BenchmarkWriteSlice1000Structs -count=20
root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# ../../../bin/go tool pprof --alloc_objects memprofile.out
warning: GOPATH set to GOROOT (/root/go) has no effect
File: binary.test
Type: alloc_objects
Time: Mar 11, 2024 at 8:09pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 10 -cum
Showing nodes accounting for 130688, 100% of 130735 total
Dropped 4 nodes (cum <= 653)
      flat  flat%   sum%        cum   cum%
        94 0.072% 0.072%     130735   100%  encoding/binary.BenchmarkWriteSlice1000Structs
         0     0% 0.072%     130735   100%  testing.(*B).runN
         0     0% 0.072%     130659 99.94%  testing.(*B).launch
     32289 24.70% 24.77%     130641 99.93%  encoding/binary.Write
         0     0% 24.77%      98305 75.19%  encoding/binary.dataSize
         0     0% 24.77%      98305 75.19%  encoding/binary.sizeof
         0     0% 24.77%      98305 75.19%  reflect.(*rtype).Field
     98305 75.19%   100%      98305 75.19%  reflect.(*structType).Field

This patch essentially also caches the result of binary.dataSize in the same sync.Map being used in the struct case. This leads to huge allocation savings in writing a slice of structs.

Allocations with patch

root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# ../../../bin/go test -run='^$' -memprofile memprofile.out -benchmem -bench BenchmarkWriteSlice1000Structs -count=20
root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# ../../../bin/go tool pprof --alloc_objects memprofile.out
File: binary.test
Type: alloc_objects
Time: Mar 11, 2024 at 6:31pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 10 -cum
Showing nodes accounting for 32228, 99.81% of 32289 total
Dropped 4 nodes (cum <= 161)
      flat  flat%   sum%        cum   cum%
        56  0.17%  0.17%      32289   100%  encoding/binary.BenchmarkWriteSlice1000Structs
         0     0%  0.17%      32289   100%  testing.(*B).runN
     32172 99.64% 99.81%      32233 99.83%  encoding/binary.Write
         0     0% 99.81%      32224 99.80%  testing.(*B).launch
(pprof) 

Running benchstat on both implementations shows these savings for writes and reads.

root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# benchstat old.txt new.txt
goos: linux
goarch: amd64
pkg: encoding/binary
cpu: DO-Regular
                        │   old.txt   │            new.txt            │
                        │   sec/op    │   sec/op     vs base          │
WriteSlice1000Structs-2   846.7µ ± 4%   856.4µ ± 3%  ~ (p=0.602 n=20)

                        │   old.txt    │            new.txt             │
                        │     B/s      │     B/s       vs base          │
WriteSlice1000Structs-2   84.48Mi ± 4%   83.52Mi ± 3%  ~ (p=0.602 n=20)

                        │   old.txt    │               new.txt               │
                        │     B/op     │     B/op      vs base               │
WriteSlice1000Structs-2   80.18Ki ± 0%   80.06Ki ± 0%  -0.15% (p=0.000 n=20)

                        │   old.txt   │              new.txt               │
                        │  allocs/op  │ allocs/op   vs base                │
WriteSlice1000Structs-2   16.000 ± 0%   1.000 ± 0%  -93.75% (p=0.000 n=20)
root@ubuntu-s-2vcpu-2gb-fra1-01:~/go/src/encoding/binary# benchstat old.txt new.txt
goos: linux
goarch: amd64
pkg: encoding/binary
cpu: DO-Regular
                       │   old.txt   │              new.txt               │
                       │   sec/op    │   sec/op     vs base               │
ReadSlice1000Structs-2   847.4µ ± 4%   821.1µ ± 3%  -3.10% (p=0.012 n=20)

                       │   old.txt    │               new.txt               │
                       │     B/s      │     B/s       vs base               │
ReadSlice1000Structs-2   84.40Mi ± 4%   87.11Mi ± 3%  +3.20% (p=0.012 n=20)

                       │   old.txt    │               new.txt               │
                       │     B/op     │     B/op      vs base               │
ReadSlice1000Structs-2   80.12Ki ± 0%   80.00Ki ± 0%  -0.15% (p=0.000 n=20)

                       │   old.txt   │              new.txt               │
                       │  allocs/op  │ allocs/op   vs base                │
ReadSlice1000Structs-2   16.000 ± 0%   1.000 ± 0%  -93.75% (p=0.000 n=20)

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions