reduce the number of allocations in the WatchServer during objects serialisation #108186

p0lyn0mial · 2022-02-17T15:48:31Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

The WatchServer is largely responsible for streaming data received from the storage layer. It turns out that sending a single event per consumer requires 4 memory allocations, visualized in the following image. Two of which deserve special attention, namely allocations 1 and 3 because they won't reuse memory and rely on the GC for cleanup. In other words, the more events we need to send, the more (temporary) memory will be used. In contrast, the other two allocations are already optimized they reuse memory instead of creating new buffers for every single event.

For better memory utilization, this PR changes the protobuf encoders to accept a memory allocator and changes the WatchServer to allocate a single buffer (combines 1, 3 and 4) for the entire watch session and pass it to encoders during objects serialization.

I am attaching the results from the benchmarks included in this PR that show an improvement.

The results for protobuf.Serializer

BenchmarkProtobufEncoder/an_obj_with_1kB_payload-12         	  499345	      2270 ns/op	    1192 B/op	       3 allocs/op
BenchmarkProtobufEncoder/an_obj_with_10kB_payload-12        	  210517	      6327 ns/op	   10280 B/op	       3 allocs/op
BenchmarkProtobufEncoder/an_obj_with_100kB_payload-12       	   27073	     41799 ns/op	  106536 B/op	       3 allocs/op
BenchmarkProtobufEncoder/an_obj_with_1MB_payload-12         	    2787	    372108 ns/op	 1007658 B/op	       3 allocs/op

BenchmarkProtobufEncodeWithAllocator/an_obj_with_1kB_payload-12         	  800178	      1512 ns/op	      40 B/op	       2 allocs/op
BenchmarkProtobufEncodeWithAllocator/an_obj_with_10kB_payload-12        	  719036	      1740 ns/op	      40 B/op	       2 allocs/op
BenchmarkProtobufEncodeWithAllocator/an_obj_with_100kB_payload-12       	  201190	      5908 ns/op	      40 B/op	       2 allocs/op
BenchmarkProtobufEncodeWithAllocator/an_obj_with_1MB_payload-12         	   16674	     73044 ns/op	     100 B/op	       2 allocs/op

The results for protobuf.RawSerializer:

BenchmarkRawProtobufEncoder/an_obj_with_1kB_payload-12                  	  669680	      1978 ns/op	    1192 B/op	       3 allocs/op
BenchmarkRawProtobufEncoder/an_obj_with_10kB_payload-12                 	  188064	      6155 ns/op	   10280 B/op	       3 allocs/op
BenchmarkRawProtobufEncoder/an_obj_with_100kB_payload-12                	   29367	     41180 ns/op	  106536 B/op	       3 allocs/op
BenchmarkRawProtobufEncoder/an_obj_with_1MB_payload-12                  	    3354	    370200 ns/op	 1007656 B/op	       3 allocs/op

BenchmarkRawProtobufEncodeWithAllocator/an_obj_with_1kB_payload-12      	 1000000	      1128 ns/op	      40 B/op	       2 allocs/op
BenchmarkRawProtobufEncodeWithAllocator/an_obj_with_10kB_payload-12     	  785414	      1377 ns/op	      40 B/op	       2 allocs/op
BenchmarkRawProtobufEncodeWithAllocator/an_obj_with_100kB_payload-12    	  209011	      5314 ns/op	      40 B/op	       2 allocs/op
BenchmarkRawProtobufEncodeWithAllocator/an_obj_with_1MB_payload-12      	   22004	     46844 ns/op	      85 B/op	       2 allocs/op

Which issue(s) this PR fixes:

kubernetes/enhancements#3157

Special notes for your reviewer:

you will find more info in kubernetes/enhancements#3142

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/pull/3142

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/allocator.go

fedebongio · 2022-02-17T17:38:51Z

/assign @aojea @wojtek-t
/triage accepted

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/protobuf/testing/types.go

wojtek-t

Just two quick comments - I will take a deeper look in the upcoming days, but I think we should address them before that.

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/allocator.go

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/protobuf/protobuf.go

staging/src/k8s.io/apimachinery/pkg/runtime/interfaces.go

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/allocator.go

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/allocator_test.go

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/protobuf/protobuf.go

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/watch.go

wojtek-t

This is great - I added some comments but those are relatively small.

The only bigger one is missing support for JSON (for CRD purpose).

staging/src/k8s.io/apimachinery/pkg/runtime/interfaces.go

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/allocator.go

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/encoder_with_allocator_test.go

wojtek-t · 2022-02-22T07:57:23Z

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/encoder_with_allocator_test.go

+			b.Fatal(err)
+		}
+	}
+}


Can you paste the results of those benchmark into PR description?

heh, so my first idea was to create an HTML table to present the results. That turned out to be labor-intensive for me. So I wanted to use https://pkg.go.dev/golang.org/x/perf/cmd/benchstat but haven't found a good way to presents the results either.

Would be okay to just copy/paste the results from my terminal?

Yes - I'm fine with anything as long as I can read it :)

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/protobuf/protobuf_test.go

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/versioning/versioning.go

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/watch.go

p0lyn0mial · 2022-02-22T16:57:42Z

Also @p0lyn0mial - is this still WIP?

Two things from my side. Some benchmarks for the RawSerializer and making sure EncodeNestedObjects doesn't require memory allocator support. WDYT?

staging/src/k8s.io/apimachinery/pkg/runtime/allocator.go

…ialization. It allows us to allocate a single buffer for the entire watch session and release it when a watch connection is closed. Previously memory was allocated for every object serialization putting a lot of pressure on GC and consuming more memory than needed.

The new method is implemented by the protobuf serializer and helps to reduce memory footprint during object serialization.

p0lyn0mial · 2022-02-23T10:23:19Z

ok, so I have addressed the recent comments, added some benchmarks and NewEncoderWithAllocator for the RawSerializer, PTAL.

p0lyn0mial · 2022-02-23T10:35:33Z

/kind feature

wojtek-t

Just some minor nits - other than that LGTM

wojtek-t · 2022-02-23T11:30:32Z

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/protobuf/protobuf.go

-func (s *Serializer) doEncode(obj runtime.Object, w io.Writer) error {
+func (s *Serializer) doEncode(obj runtime.Object, w io.Writer, memAlloc runtime.MemoryAllocator) error {
+	if memAlloc == nil {
+		return fmt.Errorf("a memory allocator must be provided")


I would prefer to logging an Error and falling back to SimpleAllocator in this case rather than failing the whole encoding.

it cannot happen. I added it as a safety net, doEncode is an internal method.

memAlloc is provided by Encode and EncodeWithAllocator methods.

Yes - an one can call EncodeWithAllocator and pass nil memoryallocator

In that case, a caller should use the Encode method or provide a noop allocator.
I would prefer to make it explicit as it is less error-prone, i.e. some middle layer doesn't pass the provided allocated down.

They should use Encode but they made a bug. I think it's still better to mitigate the problem and not fail the request on the cost of worse performance.

wojtek-t · 2022-02-23T11:31:34Z

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/protobuf/protobuf.go

-func (s *RawSerializer) doEncode(obj runtime.Object, w io.Writer) error {
+func (s *RawSerializer) doEncode(obj runtime.Object, w io.Writer, memAlloc runtime.MemoryAllocator) error {
+	if memAlloc == nil {
+		return fmt.Errorf("a memory allocator must be provided")


wojtek-t · 2022-02-23T12:29:41Z

I was actually expecting this, but to close the loop - the benchmark results are amazing!

The new method allows for providing a memory allocator for efficient memory usage during object serialization.

The primary use case for the allocator is to reduce cost of object serialization. Initially it will be used by the protobuf serializer. This approach puts less load on GC and leads to less fragmented memory in general.

…uring objects serialization

wojtek-t · 2022-02-23T14:09:12Z

/lgtm
/approve

Thanks!

k8s-ci-robot · 2022-02-23T14:09:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apimachinery/pkg/OWNERS~~ [wojtek-t]
~~staging/src/k8s.io/apiserver/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wojtek-t · 2022-02-23T14:42:51Z

/retest

wojtek-t · 2022-02-23T16:15:04Z

/retest

k8s-ci-robot requested review from hzxuzhonghu and mikedanese February 17, 2022 15:49

aojea reviewed Feb 17, 2022

View reviewed changes

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/allocator.go Outdated Show resolved Hide resolved

k8s-ci-robot assigned aojea and wojtek-t Feb 17, 2022

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 17, 2022

wojtek-t reviewed Feb 17, 2022

View reviewed changes

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/protobuf/testing/types.go Outdated Show resolved Hide resolved

wojtek-t reviewed Feb 17, 2022

View reviewed changes

wojtek-t reviewed Feb 21, 2022

View reviewed changes

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/allocator.go Outdated Show resolved Hide resolved

p0lyn0mial force-pushed the watch-list-reduce-allocations-in-watch-server branch from 6e3c32a to a6bd72c Compare February 21, 2022 11:24

p0lyn0mial commented Feb 21, 2022

View reviewed changes

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/protobuf/protobuf.go Show resolved Hide resolved

p0lyn0mial commented Feb 21, 2022

View reviewed changes

staging/src/k8s.io/apimachinery/pkg/runtime/serializer/protobuf/protobuf.go Show resolved Hide resolved

wojtek-t reviewed Feb 21, 2022

View reviewed changes

p0lyn0mial force-pushed the watch-list-reduce-allocations-in-watch-server branch from a6bd72c to bc76be4 Compare February 21, 2022 16:57

wojtek-t reviewed Feb 22, 2022

View reviewed changes

aojea mentioned this pull request Feb 22, 2022

apimachinery: add cache for buffer reuse when encode #108273

Closed

p0lyn0mial force-pushed the watch-list-reduce-allocations-in-watch-server branch from 855ce75 to c97797a Compare February 22, 2022 16:46

weilaaa reviewed Feb 23, 2022

View reviewed changes

staging/src/k8s.io/apimachinery/pkg/runtime/allocator.go Outdated Show resolved Hide resolved

p0lyn0mial added 3 commits February 23, 2022 11:06

codec interfaces

f3d5f42

codec: exposes EncodeWithAllocator method

81cf096

The new method is implemented by the protobuf serializer and helps to reduce memory footprint during object serialization.

p0lyn0mial force-pushed the watch-list-reduce-allocations-in-watch-server branch from c97797a to 31ff8eb Compare February 23, 2022 10:19

p0lyn0mial changed the title ~~WIP: reduce memory allocations in the watch server~~ reduce the number of allocations in the WatchServer during objects serialisation Feb 23, 2022

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 23, 2022

p0lyn0mial force-pushed the watch-list-reduce-allocations-in-watch-server branch from 31ff8eb to 73f3a7d Compare February 23, 2022 10:25

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 23, 2022

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Feb 23, 2022

wojtek-t reviewed Feb 23, 2022

View reviewed changes

p0lyn0mial added 4 commits February 23, 2022 14:38

provides EncodeWithAllocator method for the protobuf encoder

32ca2b8

The new method allows for providing a memory allocator for efficient memory usage during object serialization.

introduces a memory allocator

52de201

The primary use case for the allocator is to reduce cost of object serialization. Initially it will be used by the protobuf serializer. This approach puts less load on GC and leads to less fragmented memory in general.

fixes TestNestedEncodeError test

034868e

introduces a new streaming encoder that utilizes a memory allocator d…

9dd77ac

…uring objects serialization

p0lyn0mial force-pushed the watch-list-reduce-allocations-in-watch-server branch from 73f3a7d to 9dd77ac Compare February 23, 2022 13:39

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 23, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 23, 2022

k8s-ci-robot merged commit b435061 into kubernetes:master Feb 23, 2022

k8s-ci-robot added this to the v1.24 milestone Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce the number of allocations in the WatchServer during objects serialisation #108186

reduce the number of allocations in the WatchServer during objects serialisation #108186

p0lyn0mial commented Feb 17, 2022 •

edited

fedebongio commented Feb 17, 2022

wojtek-t left a comment

wojtek-t left a comment

wojtek-t Feb 22, 2022

wojtek-t Feb 23, 2022

p0lyn0mial Feb 23, 2022

wojtek-t Feb 23, 2022

p0lyn0mial Feb 23, 2022

p0lyn0mial commented Feb 22, 2022

p0lyn0mial commented Feb 23, 2022

p0lyn0mial commented Feb 23, 2022

wojtek-t left a comment

wojtek-t Feb 23, 2022

p0lyn0mial Feb 23, 2022

p0lyn0mial Feb 23, 2022

wojtek-t Feb 23, 2022

p0lyn0mial Feb 23, 2022

wojtek-t Feb 23, 2022

wojtek-t Feb 23, 2022

wojtek-t commented Feb 23, 2022

wojtek-t commented Feb 23, 2022

k8s-ci-robot commented Feb 23, 2022

wojtek-t commented Feb 23, 2022

wojtek-t commented Feb 23, 2022

reduce the number of allocations in the WatchServer during objects serialisation #108186

reduce the number of allocations in the WatchServer during objects serialisation #108186

Conversation

p0lyn0mial commented Feb 17, 2022 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

fedebongio commented Feb 17, 2022

wojtek-t left a comment

Choose a reason for hiding this comment

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p0lyn0mial commented Feb 22, 2022

p0lyn0mial commented Feb 23, 2022

p0lyn0mial commented Feb 23, 2022

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t commented Feb 23, 2022

wojtek-t commented Feb 23, 2022

k8s-ci-robot commented Feb 23, 2022

wojtek-t commented Feb 23, 2022

wojtek-t commented Feb 23, 2022

p0lyn0mial commented Feb 17, 2022 •

edited