New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[prototype] Make apiserver admission plugins retrieve k8s objects dir… #121979
Conversation
…ectly from its storage Before this commit, admission plugins use informers to sync k8s objects and as a result, there is a copy of k8s object in kube-apiserver. This commit allows referencing k8s objects from storage without actually copying them, which reduces significant memory usage in both serving and storage. Also observed a visible amount of CPU usage reduced in kube-apiserver startup as the admission plugins didn't need to fetch objects from the HTTP endpoints, nor did kube-apiserver need to serve. Memory measurements and comparisons ``` kubectl get --raw /metrics | grep go_memstats_alloc_bytes ``` Origin: Peak memory alloc: go_memstats_alloc_bytes 7.16614552e+08 Stablized memory alloc: go_memstats_alloc_bytes 4.2804752e+08 Optimized: Peak memory alloc: go_memstats_alloc_bytes 3.46012872e+08 Stablized memory alloc: go_memstats_alloc_bytes 2.82520896e+08
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: linxiulei The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks for prototyping this @linxiulei! Have we considered an alternative approach where instead of introducing an alternate implementation of informers, we instead implement an alternate implementation of the client? I'm imagining an "internal client" that is optimized to minimize copies. This client would probably only be optimized for reads, since writes MUST go through the admission chain. Would this be viable? Would it eliminate the need to generate alternate informers? It also just occurred to me that bypassing the normal client flow also bypasses auth.. |
re: internal client, I am not too sure if I understand it correctly, but if the internal client doesn't have the same API as the informers have, then it will be larger effort to migrate the code in all those admission plugins (30+ plugins although not all using informers) to use the new client. It'd be even more non-trivial to keep both new client and current design if we need to handle safe fallback for special cases. This was the major reason that I wanted to implement informer API. Also, please correct me if I'm wrong, I believe admission plugins don't write. re: bypassing auth. Yeah, it bypasses HTTP interface so every function related to it. I guess adminssion plugins don't need it in common cases |
My thought was along the lines of: What if we could use the informers exactly as they exist today, but instead of wiring in the normal client, we somehow wire in a client (implementation of
Yes, that's right. I brought up writes because, if we reimplement
This makes me uncomfortable. If we do this, I'd want to keep this optimization narrow as possible. It might make my client idea useless, since we'd be handling just a very few special cases. Do we know which admission plugins benefit most from this optimization? Everything that touches pod? |
This makes my uncomfortable too, because pretty much everything we're doing in the handler chain now no longer works. Auth seems to be the most serious example (@liggitt @enj - please chime in), but there are other consequences too, e.g. auditing, APF (rate-limitting), timeouts-handling, whole observability (logging/monitoring), etc. I understand the motivation and I admit that the savings are pretty significant, but I'm wondering how we can better ensure safety. Can we somehow hack it around differently to preserve the whole handler chain? Without putting a deeper thought into it, what if the model would be like:
I admit it sounds much more hacky, but at least seems safer to me. But that's just a brainstorming - I would like to hear better options... |
I was thinking about this last night and was finding it very difficult to reason through all the implications of skipping the handler chain. I agree we just shouldn't do that.
Do we still end up loosing any defaulting? I'm worried there might be some old object in etcd that need defaulting and break the admission chain when it doesn't happen. |
I haven't looked at the code, but used to have tons of direct etcd calls back in the etcd2 days in OpenShift. It was fast, but keeping the behavior consistent between the REST API and direct to storage paths was near impossible. Any changes we make for performance should keep the code paths for all calls as identical as possible. Audit, auth and similar layers should not be bypassed. I think we do some of the conversion and defaulting in the REST layers so that would need to be preserved. |
Nevermind, I think this has the same issues if used directly without the REST layer. |
+1000 (skipping those handlers seems pretty sketch to me) |
/hold I think we need sig-auth to really scrutinize this. |
Yes, pod is the biggest factor in memory usage for most use cases. Apologize in advance to ask more questions rather than addressing comments re: auth, the current re: handler chain, this only does READ requests from restStorage and no writes, I wondered if there are handler chains doing mutations in READ requests? re: audit, was it by design that we want to audit self READ requests? |
I think conversion isn't a problem here, but yeah, I forgot about defaulting which isn't part of the chain itself... |
I think at this point it seems that if it's even possible, this would require a KEP. |
@enj https://github.com/kubernetes/kubernetes/pull/121979/files#diff-05cd394e18da9e91dbf8b3d82e1e0813be670ee37db43ca5af6f91a39f78d08fR704 does reference REST object, which has the defaulting in List()/Get(). So this should have been transparently preserved. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Update after bringing this to sig-auth (pls see meeting notes) Relevant to
Also discussed major risks that had not reached consensus because overall risk is high
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Make apiserver admission plugins retrieve k8s objects directly from its storage
Before this commit, admission plugins use informers to sync k8s objects and as a result, there is an additional copy of k8s object in kube-apiserver. This commit allows referencing k8s objects from storage without actually copying them, which reduces significant memory usage in both serving and storage.
Also observed a visible amount of CPU usage reduced in kube-apiserver startup as the admission plugins didn't need to fetch objects from the HTTP endpoints, nor did kube-apiserver need to serve.
Memory measurements and comparisons
Origin:
Optimized:
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This commit allows referencing k8s objects from storage without actually copying them, which reduces significant memory and CPU usage in both serving and storage.
Which issue(s) this PR fixes:
Fixes #121657
Special notes for your reviewer:
The current PR only have minimal changes for easier reviews and discussion.
The implementation contains three parts:
restStorage
object from kube-apiserver initialization code, which is needed for retrieving k8s objectsStorageInformers
factory reusing the current code-autogen ofstaging/src/k8s.io/client-go/informers/factory.go
to automatically generate that takesrestStorage
instead of k8s_client to initialize. This new factory will have exactly same interfaces as the current informer factoryStorageInformers
As for plugin migration, the migration code should be as minimal as just adding an interface and replacing informers to the ones created by
StorageInformers
. Likely, we will migrate all plugins to use the new informers for lower overheads. To safely rollout, we could use a feature gate to guard the change.Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: