Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce the number of allocations in the WatchServer during objects serialisation #108186

Conversation

p0lyn0mial
Copy link
Contributor

@p0lyn0mial p0lyn0mial commented Feb 17, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

The WatchServer is largely responsible for streaming data received from the storage layer. It turns out that sending a single event per consumer requires 4 memory allocations, visualized in the following image. Two of which deserve special attention, namely allocations 1 and 3 because they won't reuse memory and rely on the GC for cleanup. In other words, the more events we need to send, the more (temporary) memory will be used. In contrast, the other two allocations are already optimized they reuse memory instead of creating new buffers for every single event.

For better memory utilization, this PR changes the protobuf encoders to accept a memory allocator and changes the WatchServer to allocate a single buffer (combines 1, 3 and 4) for the entire watch session and pass it to encoders during objects serialization.

watch_server_allocs

I am attaching the results from the benchmarks included in this PR that show an improvement.

The results for protobuf.Serializer

BenchmarkProtobufEncoder/an_obj_with_1kB_payload-12         	  499345	      2270 ns/op	    1192 B/op	       3 allocs/op
BenchmarkProtobufEncoder/an_obj_with_10kB_payload-12        	  210517	      6327 ns/op	   10280 B/op	       3 allocs/op
BenchmarkProtobufEncoder/an_obj_with_100kB_payload-12       	   27073	     41799 ns/op	  106536 B/op	       3 allocs/op
BenchmarkProtobufEncoder/an_obj_with_1MB_payload-12         	    2787	    372108 ns/op	 1007658 B/op	       3 allocs/op

BenchmarkProtobufEncodeWithAllocator/an_obj_with_1kB_payload-12         	  800178	      1512 ns/op	      40 B/op	       2 allocs/op
BenchmarkProtobufEncodeWithAllocator/an_obj_with_10kB_payload-12        	  719036	      1740 ns/op	      40 B/op	       2 allocs/op
BenchmarkProtobufEncodeWithAllocator/an_obj_with_100kB_payload-12       	  201190	      5908 ns/op	      40 B/op	       2 allocs/op
BenchmarkProtobufEncodeWithAllocator/an_obj_with_1MB_payload-12         	   16674	     73044 ns/op	     100 B/op	       2 allocs/op

The results for protobuf.RawSerializer:

BenchmarkRawProtobufEncoder/an_obj_with_1kB_payload-12                  	  669680	      1978 ns/op	    1192 B/op	       3 allocs/op
BenchmarkRawProtobufEncoder/an_obj_with_10kB_payload-12                 	  188064	      6155 ns/op	   10280 B/op	       3 allocs/op
BenchmarkRawProtobufEncoder/an_obj_with_100kB_payload-12                	   29367	     41180 ns/op	  106536 B/op	       3 allocs/op
BenchmarkRawProtobufEncoder/an_obj_with_1MB_payload-12                  	    3354	    370200 ns/op	 1007656 B/op	       3 allocs/op

BenchmarkRawProtobufEncodeWithAllocator/an_obj_with_1kB_payload-12      	 1000000	      1128 ns/op	      40 B/op	       2 allocs/op
BenchmarkRawProtobufEncodeWithAllocator/an_obj_with_10kB_payload-12     	  785414	      1377 ns/op	      40 B/op	       2 allocs/op
BenchmarkRawProtobufEncodeWithAllocator/an_obj_with_100kB_payload-12    	  209011	      5314 ns/op	      40 B/op	       2 allocs/op
BenchmarkRawProtobufEncodeWithAllocator/an_obj_with_1MB_payload-12      	   22004	     46844 ns/op	      85 B/op	       2 allocs/op

Which issue(s) this PR fixes:

kubernetes/enhancements#3157

Special notes for your reviewer:

you will find more info in kubernetes/enhancements#3142

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/pull/3142

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 17, 2022
@fedebongio
Copy link
Contributor

/assign @aojea @wojtek-t
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 17, 2022
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two quick comments - I will take a deeper look in the upcoming days, but I think we should address them before that.

@p0lyn0mial p0lyn0mial force-pushed the watch-list-reduce-allocations-in-watch-server branch from 6e3c32a to a6bd72c Compare February 21, 2022 11:24
@p0lyn0mial p0lyn0mial force-pushed the watch-list-reduce-allocations-in-watch-server branch from a6bd72c to bc76be4 Compare February 21, 2022 16:57
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great - I added some comments but those are relatively small.

The only bigger one is missing support for JSON (for CRD purpose).

staging/src/k8s.io/apimachinery/pkg/runtime/interfaces.go Outdated Show resolved Hide resolved
b.Fatal(err)
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you paste the results of those benchmark into PR description?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, so my first idea was to create an HTML table to present the results. That turned out to be labor-intensive for me. So I wanted to use https://pkg.go.dev/golang.org/x/perf/cmd/benchstat but haven't found a good way to presents the results either.

Would be okay to just copy/paste the results from my terminal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I'm fine with anything as long as I can read it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@p0lyn0mial
Copy link
Contributor Author

Also @p0lyn0mial - is this still WIP?

Two things from my side. Some benchmarks for the RawSerializer and making sure EncodeNestedObjects doesn't require memory allocator support. WDYT?

…ialization.

It allows us to allocate a single buffer for the entire watch session and release it when a watch connection is closed.
Previously memory was allocated for every object serialization putting a lot of pressure on GC and consuming more memory than needed.
The new method is implemented by the protobuf serializer and helps to reduce memory footprint during object serialization.
@p0lyn0mial p0lyn0mial force-pushed the watch-list-reduce-allocations-in-watch-server branch from c97797a to 31ff8eb Compare February 23, 2022 10:19
@p0lyn0mial p0lyn0mial changed the title WIP: reduce memory allocations in the watch server reduce the number of allocations in the WatchServer during objects serialisation Feb 23, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 23, 2022
@p0lyn0mial
Copy link
Contributor Author

ok, so I have addressed the recent comments, added some benchmarks and NewEncoderWithAllocator for the RawSerializer, PTAL.

@p0lyn0mial p0lyn0mial force-pushed the watch-list-reduce-allocations-in-watch-server branch from 31ff8eb to 73f3a7d Compare February 23, 2022 10:25
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 23, 2022
@p0lyn0mial
Copy link
Contributor Author

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Feb 23, 2022
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor nits - other than that LGTM

func (s *Serializer) doEncode(obj runtime.Object, w io.Writer) error {
func (s *Serializer) doEncode(obj runtime.Object, w io.Writer, memAlloc runtime.MemoryAllocator) error {
if memAlloc == nil {
return fmt.Errorf("a memory allocator must be provided")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to logging an Error and falling back to SimpleAllocator in this case rather than failing the whole encoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it cannot happen. I added it as a safety net, doEncode is an internal method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memAlloc is provided by Encode and EncodeWithAllocator methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - an one can call EncodeWithAllocator and pass nil memoryallocator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, a caller should use the Encode method or provide a noop allocator.
I would prefer to make it explicit as it is less error-prone, i.e. some middle layer doesn't pass the provided allocated down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should use Encode but they made a bug. I think it's still better to mitigate the problem and not fail the request on the cost of worse performance.

func (s *RawSerializer) doEncode(obj runtime.Object, w io.Writer) error {
func (s *RawSerializer) doEncode(obj runtime.Object, w io.Writer, memAlloc runtime.MemoryAllocator) error {
if memAlloc == nil {
return fmt.Errorf("a memory allocator must be provided")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@wojtek-t
Copy link
Member

I was actually expecting this, but to close the loop - the benchmark results are amazing!

The new method allows for providing a memory allocator for efficient memory usage during object serialization.
The primary use case for the allocator is to reduce cost of object serialization.
Initially it will be used by the protobuf serializer.
This approach puts less load on GC and leads to less fragmented memory in general.
@p0lyn0mial p0lyn0mial force-pushed the watch-list-reduce-allocations-in-watch-server branch from 73f3a7d to 9dd77ac Compare February 23, 2022 13:39
@wojtek-t
Copy link
Member

/lgtm
/approve

Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 23, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 23, 2022
@wojtek-t
Copy link
Member

/retest

1 similar comment
@wojtek-t
Copy link
Member

/retest

@k8s-ci-robot k8s-ci-robot merged commit b435061 into kubernetes:master Feb 23, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Feb 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants