optimize watch-cache getlist #116327

sxllwx · 2023-03-07T13:26:40Z

What type of PR is this?

What this PR does / why we need it:

Faster watch-cache get-list.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

In order to facilitate verification, I split it into two commits:

Add benchmarks
Optimize GetList

Here are my own benchmark results. FIY...

# before
$ go test -bench=BenchmarkCacher_GetList  -count 5 . -run=none --benchmem
goos: linux
goarch: amd64
pkg: k8s.io/apiserver/pkg/storage/cacher
cpu: AMD EPYC 7K62 48-Core Processor
BenchmarkCacher_GetList-16    	14	 137743757 ns/op	129016510 B/op	   67975 allocs/op
BenchmarkCacher_GetList-16    	15	 121673849 ns/op	151072125 B/op	   66776 allocs/op
BenchmarkCacher_GetList-16    	15	  88535782 ns/op	151072186 B/op	   66774 allocs/op
BenchmarkCacher_GetList-16    	18	  83706540 ns/op	126228029 B/op	   63987 allocs/op
BenchmarkCacher_GetList-16    	19	 124967934 ns/op	149816737 B/op	   63252 allocs/op
PASS

# after

$ go test -bench=BenchmarkCacher_GetList  . -run=none --benchmem --count=5
goos: linux
goarch: amd64
pkg: k8s.io/apiserver/pkg/storage/cacher
cpu: AMD EPYC 7K62 48-Core Processor
BenchmarkCacher_GetList-16    	      44	  24302640 ns/op	 2528573 B/op	    5749 allocs/op
BenchmarkCacher_GetList-16    	      45	  23997310 ns/op	 2490390 B/op	    5623 allocs/op
BenchmarkCacher_GetList-16    	      46	  23521017 ns/op	 2453680 B/op	    5501 allocs/op
BenchmarkCacher_GetList-16    	      48	  23682351 ns/op	 2384704 B/op	    5273 allocs/op
BenchmarkCacher_GetList-16    	      49	  23772900 ns/op	 2352476 B/op	    5166 allocs/op
PASS

# compare
benchcmp before.text after.text                                                                      
benchmark                      old ns/op     new ns/op     delta
BenchmarkCacher_GetList-16     137743757     24302640      -82.36%
BenchmarkCacher_GetList-16     121673849     23997310      -80.28%
BenchmarkCacher_GetList-16     88535782      23521017      -73.43%
BenchmarkCacher_GetList-16     83706540      23682351      -71.71%
BenchmarkCacher_GetList-16     124967934     23772900      -80.98%

benchmark                      old allocs     new allocs     delta
BenchmarkCacher_GetList-16     67975          5749           -91.54%
BenchmarkCacher_GetList-16     66776          5623           -91.58%
BenchmarkCacher_GetList-16     66774          5501           -91.76%
BenchmarkCacher_GetList-16     63987          5273           -91.76%
BenchmarkCacher_GetList-16     63252          5166           -91.83%

benchmark                      old bytes     new bytes     delta
BenchmarkCacher_GetList-16     129016510     2528573       -98.04%
BenchmarkCacher_GetList-16     151072125     2490390       -98.35%
BenchmarkCacher_GetList-16     151072186     2453680       -98.38%
BenchmarkCacher_GetList-16     126228029     2384704       -98.11%
BenchmarkCacher_GetList-16     149816737     2352476       -98.43%

Does this PR introduce a user-facing change?

Kube-apiserver: Improved memory use when performing GetList on the cache.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2023-03-07T13:26:42Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

liggitt · 2023-03-07T18:23:21Z

cc @wojtek-t

wojtek-t · 2023-03-07T20:49:06Z

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

+	pred := storage.SelectionPredicate{
+		Label: labels.Everything(),
+		Field: fields.Everything(),
+	}


This benchmark assumes that we actually return all objects.

The other extremely important usecase is where a bunch of objects are actually filtered out (e.g. Kubelet listing its own pods).
Can you add a second benchmark that will simulate this one (i.e. only say 50 out of those 50,000 pods are being returned)?

Your prompt reply is much appreciated 😄! Use-cases have been added and I provided the latest benchmark results at #116327 (comment). PTAL

Fixed, PTAL @wojtek-t

fedebongio · 2023-03-07T21:29:14Z

/triage accepted

sxllwx · 2023-04-03T12:26:44Z

/ping @lavalamp

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go

- add comment to explain why we need to apply for a slice of runtime.Object instead of making a slice of ListObject.Items directly.

sxllwx · 2023-04-04T04:51:43Z

/retest

lavalamp · 2023-04-04T16:05:41Z

/lgtm
/approve

Thank you! (this should merge when we open up for 1.28)

k8s-ci-robot · 2023-04-04T16:05:53Z

LGTM label has been added.

Git tree hash: e7fd8ed2f9067a86cc64347e394289cd6ade3bc1

k8s-ci-robot · 2023-04-04T16:06:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp, sxllwx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/apiserver/pkg/storage/OWNERS~~ [lavalamp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sxllwx · 2023-04-05T04:35:13Z

/lgtm
/approve

Thank you! (this should merge when we open up for 1.28)

Ok，Thank you for your help, it has been a pleasure to work with you.

wojtek-t · 2023-04-05T07:20:25Z

Sorry for delay - this LGTM too.

/lgtm
/label tide/merge-method-squash

* ftr(watch-cache): add benchmarks * ftr(kube-apiserver): faster watch-cache getlist * refine: testcase name * - refine var name make it easier to convey meaning - add comment to explain why we need to apply for a slice of runtime.Object instead of making a slice of ListObject.Items directly.

sxllwx · 2023-06-01T07:07:34Z

I want to add a release-note here. hope it's not too late

* ftr(watch-cache): add benchmarks * ftr(kube-apiserver): faster watch-cache getlist * refine: testcase name * - refine var name make it easier to convey meaning - add comment to explain why we need to apply for a slice of runtime.Object instead of making a slice of ListObject.Items directly.

k8s-ci-robot requested review from deads2k and enj March 7, 2023 13:27

sxllwx changed the title ~~[apiserver] Optimize watch-cache getlist~~ Optimize watch-cache getlist Mar 7, 2023

sxllwx changed the title ~~Optimize watch-cache getlist~~ optimize watch-cache getlist Mar 7, 2023

sxllwx force-pushed the optimize/watch-cache-getlist branch from 40dcba9 to ca8a72f Compare March 7, 2023 13:44

sxllwx marked this pull request as ready for review March 7, 2023 13:48

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 7, 2023

k8s-ci-robot requested review from caesarxuchao and lavalamp March 7, 2023 13:48

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 7, 2023

wojtek-t reviewed Mar 7, 2023

View reviewed changes

wojtek-t self-assigned this Mar 7, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 7, 2023

sxllwx force-pushed the optimize/watch-cache-getlist branch from ca8a72f to 3073b0c Compare March 8, 2023 03:33

k8s-ci-robot removed the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 8, 2023

lavalamp reviewed Apr 3, 2023

View reviewed changes

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go Outdated Show resolved Hide resolved

lavalamp reviewed Apr 3, 2023

View reviewed changes

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher_whitebox_test.go Outdated Show resolved Hide resolved

lavalamp reviewed Apr 3, 2023

View reviewed changes

staging/src/k8s.io/apiserver/pkg/storage/cacher/cacher.go Show resolved Hide resolved

sxllwx added 2 commits April 4, 2023 11:36

refine: testcase name

d0aa37e

- refine var name make it easier to convey meaning

44420d3

- add comment to explain why we need to apply for a slice of runtime.Object instead of making a slice of ListObject.Items directly.

k8s-ci-robot assigned lavalamp Apr 4, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 4, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 4, 2023

k8s-ci-robot added the tide/merge-method-merge Denotes a PR that should use a standard merge by tide when it merges. label Apr 5, 2023

wojtek-t added tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. and removed tide/merge-method-merge Denotes a PR that should use a standard merge by tide when it merges. labels Apr 5, 2023

k8s-ci-robot merged commit 75f17eb into kubernetes:master Apr 11, 2023
12 checks passed

k8s-ci-robot added this to the v1.28 milestone Apr 11, 2023

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jun 1, 2023

bertinatto mentioned this pull request Jun 20, 2023

ocp next openshift/kubernetes#1612

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize watch-cache getlist #116327

optimize watch-cache getlist #116327

sxllwx commented Mar 7, 2023 •

edited

k8s-ci-robot commented Mar 7, 2023

liggitt commented Mar 7, 2023

wojtek-t Mar 7, 2023

sxllwx Mar 8, 2023

sxllwx Mar 13, 2023

fedebongio commented Mar 7, 2023

sxllwx commented Apr 3, 2023

sxllwx commented Apr 4, 2023

lavalamp commented Apr 4, 2023

k8s-ci-robot commented Apr 4, 2023

k8s-ci-robot commented Apr 4, 2023

sxllwx commented Apr 5, 2023

wojtek-t commented Apr 5, 2023 •

edited

sxllwx commented Jun 1, 2023

optimize watch-cache getlist #116327

optimize watch-cache getlist #116327

Conversation

sxllwx commented Mar 7, 2023 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Mar 7, 2023

liggitt commented Mar 7, 2023

wojtek-t Mar 7, 2023

Choose a reason for hiding this comment

sxllwx Mar 8, 2023

Choose a reason for hiding this comment

sxllwx Mar 13, 2023

Choose a reason for hiding this comment

fedebongio commented Mar 7, 2023

sxllwx commented Apr 3, 2023

sxllwx commented Apr 4, 2023

lavalamp commented Apr 4, 2023

k8s-ci-robot commented Apr 4, 2023

k8s-ci-robot commented Apr 4, 2023

sxllwx commented Apr 5, 2023

wojtek-t commented Apr 5, 2023 • edited

sxllwx commented Jun 1, 2023

sxllwx commented Mar 7, 2023 •

edited

wojtek-t commented Apr 5, 2023 •

edited