[prototype] Make apiserver admission plugins retrieve k8s objects dir… #121979

linxiulei · 2023-11-21T11:58:50Z

Make apiserver admission plugins retrieve k8s objects directly from its storage

Before this commit, admission plugins use informers to sync k8s objects and as a result, there is an additional copy of k8s object in kube-apiserver. This commit allows referencing k8s objects from storage without actually copying them, which reduces significant memory usage in both serving and storage.

Also observed a visible amount of CPU usage reduced in kube-apiserver startup as the admission plugins didn't need to fetch objects from the HTTP endpoints, nor did kube-apiserver need to serve.

Memory measurements and comparisons

# create about 9k pods
kubectl get --raw /metrics | grep go_memstats_alloc_bytes

Origin:

Peak memory alloc: go_memstats_alloc_bytes 7.16614552e+08
Stablized memory alloc: go_memstats_alloc_bytes 4.2804752e+08

Optimized:

Peak memory alloc: go_memstats_alloc_bytes 3.46012872e+08
Stablized memory alloc: go_memstats_alloc_bytes 2.82520896e+08

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This commit allows referencing k8s objects from storage without actually copying them, which reduces significant memory and CPU usage in both serving and storage.

Which issue(s) this PR fixes:

Fixes #121657

Special notes for your reviewer:

The current PR only have minimal changes for easier reviews and discussion.

The implementation contains three parts:

Populating the restStorage object from kube-apiserver initialization code, which is needed for retrieving k8s objects
Creating a StorageInformers factory reusing the current code-autogen of staging/src/k8s.io/client-go/informers/factory.go to automatically generate that takes restStorage instead of k8s_client to initialize. This new factory will have exactly same interfaces as the current informer factory
Adding an interface for apiserver plugins to allow to be initialized with informers created by StorageInformers

As for plugin migration, the migration code should be as minimal as just adding an interface and replacing informers to the ones created by StorageInformers. Likely, we will migrate all plugins to use the new informers for lower overheads. To safely rollout, we could use a feature gate to guard the change.

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

…ectly from its storage Before this commit, admission plugins use informers to sync k8s objects and as a result, there is a copy of k8s object in kube-apiserver. This commit allows referencing k8s objects from storage without actually copying them, which reduces significant memory usage in both serving and storage. Also observed a visible amount of CPU usage reduced in kube-apiserver startup as the admission plugins didn't need to fetch objects from the HTTP endpoints, nor did kube-apiserver need to serve. Memory measurements and comparisons ``` kubectl get --raw /metrics | grep go_memstats_alloc_bytes ``` Origin: Peak memory alloc: go_memstats_alloc_bytes 7.16614552e+08 Stablized memory alloc: go_memstats_alloc_bytes 4.2804752e+08 Optimized: Peak memory alloc: go_memstats_alloc_bytes 3.46012872e+08 Stablized memory alloc: go_memstats_alloc_bytes 2.82520896e+08

k8s-ci-robot · 2023-11-21T11:58:53Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-11-21T11:59:00Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-11-21T11:59:39Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: linxiulei
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

linxiulei · 2023-11-21T12:01:46Z

/cc @deads2k @jpbetz @liggitt @wojtek-t

jpbetz · 2023-11-21T17:53:32Z

Thanks for prototyping this @linxiulei!

Have we considered an alternative approach where instead of introducing an alternate implementation of informers, we instead implement an alternate implementation of the client? I'm imagining an "internal client" that is optimized to minimize copies. This client would probably only be optimized for reads, since writes MUST go through the admission chain. Would this be viable? Would it eliminate the need to generate alternate informers?

It also just occurred to me that bypassing the normal client flow also bypasses auth..

linxiulei · 2023-11-21T21:45:35Z

re: internal client, I am not too sure if I understand it correctly, but if the internal client doesn't have the same API as the informers have, then it will be larger effort to migrate the code in all those admission plugins (30+ plugins although not all using informers) to use the new client. It'd be even more non-trivial to keep both new client and current design if we need to handle safe fallback for special cases.

This was the major reason that I wanted to implement informer API. Also, please correct me if I'm wrong, I believe admission plugins don't write.

re: bypassing auth. Yeah, it bypasses HTTP interface so every function related to it. I guess adminssion plugins don't need it in common cases

jpbetz · 2023-11-21T23:30:25Z

re: internal client, I am not too sure if I understand it correctly, but if the internal client doesn't have the same API as the informers have, then it will be larger effort to migrate the code in all those admission plugins (30+ plugins although not all using informers) to use the new client. It'd be even more non-trivial to keep both new client and current design if we need to handle safe fallback for special cases.

My thought was along the lines of: What if we could use the informers exactly as they exist today, but instead of wiring in the normal client, we somehow wire in a client (implementation of k8s.io/client-go/kubernetes.Interface) that retrieves data directly from storage. Then, if we use that client here to perform reads directly from storage, would this optimization "just work"?

This was the major reason that I wanted to implement informer API. Also, please correct me if I'm wrong, I believe admission plugins don't write.

Yes, that's right. I brought up writes because, if we reimplement k8s.io/client-go/kubernetes.Interface we'd have tp decide what the implementation of writes do. I think I'd be in favor of having writes delegate to the normal client.

re: bypassing auth. Yeah, it bypasses HTTP interface so every function related to it. I guess adminssion plugins don't need it in common cases

This makes me uncomfortable. If we do this, I'd want to keep this optimization narrow as possible. It might make my client idea useless, since we'd be handling just a very few special cases. Do we know which admission plugins benefit most from this optimization? Everything that touches pod?

wojtek-t · 2023-11-22T07:01:34Z

re: bypassing auth. Yeah, it bypasses HTTP interface so every function related to it. I guess adminssion plugins don't need it in common cases

This makes me uncomfortable. If we do this, I'd want to keep this optimization narrow as possible. It might make my client idea useless, since we'd be handling just a very few special cases. Do we know which admission plugins benefit most from this optimization? Everything that touches pod?

This makes my uncomfortable too, because pretty much everything we're doing in the handler chain now no longer works. Auth seems to be the most serious example (@liggitt @enj - please chime in), but there are other consequences too, e.g. auditing, APF (rate-limitting), timeouts-handling, whole observability (logging/monitoring), etc.

I understand the motivation and I admit that the savings are pretty significant, but I'm wondering how we can better ensure safety. Can we somehow hack it around differently to preserve the whole handler chain? Without putting a deeper thought into it, what if the model would be like:

we create a simple wrapper around rest storage that instead of returning the data, puts them into context under some well known key
we wrap this with the whole set of existing handlers from the handler chain
we implement only ListWatch interface for that and the implementation is using the above and gets data from the context...

I admit it sounds much more hacky, but at least seems safer to me. But that's just a brainstorming - I would like to hear better options...

jpbetz · 2023-11-22T15:03:50Z

This makes my uncomfortable too, because pretty much everything we're doing in the handler chain now no longer works.

I was thinking about this last night and was finding it very difficult to reason through all the implications of skipping the handler chain. I agree we just shouldn't do that.

we create a simple wrapper around rest storage that instead of returning the data, puts them into context under some well known key

we wrap this with the whole set of existing handlers from the handler chain

we implement only ListWatch interface for that and the implementation is using the above and gets data from the context...

Do we still end up loosing any defaulting? I'm worried there might be some old object in etcd that need defaulting and break the admission chain when it doesn't happen.

enj · 2023-11-22T15:06:47Z

I haven't looked at the code, but used to have tons of direct etcd calls back in the etcd2 days in OpenShift. It was fast, but keeping the behavior consistent between the REST API and direct to storage paths was near impossible. Any changes we make for performance should keep the code paths for all calls as identical as possible. Audit, auth and similar layers should not be bypassed. I think we do some of the conversion and defaulting in the REST layers so that would need to be preserved.

cc @deads2k @soltysh

enj · 2023-11-22T15:10:49Z

~~Also, don't we need a copy of one of everything for the watch cache anyway? Can we not rely on that instead?~~

Nevermind, I think this has the same issues if used directly without the REST layer.

logicalhan · 2023-11-22T16:42:04Z

Audit, auth and similar layers should not be bypassed. I think we do some of the conversion and defaulting in the REST layers so that would need to be preserved.

+1000 (skipping those handlers seems pretty sketch to me)

logicalhan · 2023-11-22T16:44:10Z

/hold

I think we need sig-auth to really scrutinize this.

linxiulei · 2023-11-22T21:07:56Z

Everything that touches pod?

Yes, pod is the biggest factor in memory usage for most use cases.

Apologize in advance to ask more questions rather than addressing comments

re: auth, the current kubeClientConfig for apiserver admission plugins is either using an ephemeral token created at initialization time or self-signed cert, does this make any differences?

re: handler chain, this only does READ requests from restStorage and no writes, I wondered if there are handler chains doing mutations in READ requests?

re: audit, was it by design that we want to audit self READ requests?

wojtek-t · 2023-11-23T07:36:14Z

I think we do some of the conversion and defaulting in the REST layers so that would need to be preserved.

I think conversion isn't a problem here, but yeah, I forgot about defaulting which isn't part of the chain itself...
So my idea doesn't address it, or at least in not that obvious way that I hoped.

wojtek-t · 2023-11-23T07:37:29Z

I think at this point it seems that if it's even possible, this would require a KEP.
But I wouldn't recomment to write a KEP without having a discussion with the people on this PR before...

linxiulei · 2023-11-23T16:29:37Z

I think we do some of the conversion and defaulting in the REST layers so that would need to be preserved.

@enj https://github.com/kubernetes/kubernetes/pull/121979/files#diff-05cd394e18da9e91dbf8b3d82e1e0813be670ee37db43ca5af6f91a39f78d08fR704 does reference REST object, which has the defaulting in List()/Get(). So this should have been transparently preserved.

k8s-triage-robot · 2024-02-21T16:54:32Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

linxiulei · 2024-03-04T16:55:23Z

Update after bringing this to sig-auth (pls see meeting notes)

Relevant to

/hold

I think we need sig-auth to really scrutinize this.

Auth is not big risk here given the current implementation uses a hardcoded method to just make admission plugins work to fetch k8s objects
Audit might be a problem but given most objects are fetched via watches, the auditing is skipped.

Also discussed major risks that had not reached consensus because overall risk is high

There are no mechanisms to well test the objects returned from the new code path will be identical to what are returned by informers, which are well tested everywhere in k8s use cases, although for now, the returned objects in the proposal should be theoretically the same.
The potential modifications occur in the clients (i.e. admission plugins) with proposed design would bring larger blast radius since the modifications will be applied to the apiserver REST storage that essentially change the k8s objects and result in incorrect objects

k8s-ci-robot · 2024-03-06T11:36:50Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2024-04-05T12:19:23Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-05-05T13:13:23Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2024-05-05T13:13:27Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Nov 21, 2023

linxiulei marked this pull request as draft November 21, 2023 11:59

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2023

k8s-ci-robot requested review from deads2k and liggitt November 21, 2023 11:59

k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 21, 2023

k8s-ci-robot requested review from jpbetz and wojtek-t November 21, 2023 12:01

linxiulei mentioned this pull request Nov 21, 2023

apiserver admission plugins should zero-copy k8s objects #121657

Open

wojtek-t self-assigned this Nov 22, 2023

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 22, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 21, 2024

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 6, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 5, 2024

k8s-ci-robot closed this May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prototype] Make apiserver admission plugins retrieve k8s objects dir… #121979

[prototype] Make apiserver admission plugins retrieve k8s objects dir… #121979

linxiulei commented Nov 21, 2023

k8s-ci-robot commented Nov 21, 2023

k8s-ci-robot commented Nov 21, 2023

k8s-ci-robot commented Nov 21, 2023

linxiulei commented Nov 21, 2023

jpbetz commented Nov 21, 2023 •

edited

linxiulei commented Nov 21, 2023

jpbetz commented Nov 21, 2023 •

edited

wojtek-t commented Nov 22, 2023

jpbetz commented Nov 22, 2023 •

edited

enj commented Nov 22, 2023

enj commented Nov 22, 2023 •

edited

logicalhan commented Nov 22, 2023

logicalhan commented Nov 22, 2023

linxiulei commented Nov 22, 2023

wojtek-t commented Nov 23, 2023

wojtek-t commented Nov 23, 2023

linxiulei commented Nov 23, 2023

k8s-triage-robot commented Feb 21, 2024

linxiulei commented Mar 4, 2024

k8s-ci-robot commented Mar 6, 2024

k8s-triage-robot commented Apr 5, 2024

k8s-triage-robot commented May 5, 2024

k8s-ci-robot commented May 5, 2024

[prototype] Make apiserver admission plugins retrieve k8s objects dir… #121979

[prototype] Make apiserver admission plugins retrieve k8s objects dir… #121979

Conversation

linxiulei commented Nov 21, 2023

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Nov 21, 2023

k8s-ci-robot commented Nov 21, 2023

k8s-ci-robot commented Nov 21, 2023

linxiulei commented Nov 21, 2023

jpbetz commented Nov 21, 2023 • edited

linxiulei commented Nov 21, 2023

jpbetz commented Nov 21, 2023 • edited

wojtek-t commented Nov 22, 2023

jpbetz commented Nov 22, 2023 • edited

enj commented Nov 22, 2023

enj commented Nov 22, 2023 • edited

logicalhan commented Nov 22, 2023

logicalhan commented Nov 22, 2023

linxiulei commented Nov 22, 2023

wojtek-t commented Nov 23, 2023

wojtek-t commented Nov 23, 2023

linxiulei commented Nov 23, 2023

k8s-triage-robot commented Feb 21, 2024

linxiulei commented Mar 4, 2024

k8s-ci-robot commented Mar 6, 2024

k8s-triage-robot commented Apr 5, 2024

k8s-triage-robot commented May 5, 2024

k8s-ci-robot commented May 5, 2024

jpbetz commented Nov 21, 2023 •

edited

jpbetz commented Nov 21, 2023 •

edited

jpbetz commented Nov 22, 2023 •

edited

enj commented Nov 22, 2023 •

edited