-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only create one etcd client per transport #111559
Conversation
|
Hi @negz. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: negz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/ok-to-test let's get a taste on the CI, can you undraft to run one round of tests and see the results? |
staging/src/k8s.io/apiserver/pkg/storage/storagebackend/factory/etcd3.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apiserver/pkg/storage/storagebackend/factory/etcd3.go
Outdated
Show resolved
Hide resolved
|
/cc @wojtek-t |
staging/src/k8s.io/apiserver/pkg/storage/storagebackend/factory/etcd3.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apiserver/pkg/storage/storagebackend/factory/factory.go
Show resolved
Hide resolved
|
does anyone know why we create one etcd client per type? |
| return fmt.Sprintf("%v", tc) // gives: {[server1 server2] keyFile certFile caFile} | ||
| } | ||
|
|
||
| // For returns a compacted etcd v3 client for the supplied transport config. One |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // For returns a compacted etcd v3 client for the supplied transport config. One | |
| // For returns a compacting etcd v3 client for the supplied transport config. One |
|
@wojtek-t worth to run any of the bigger scaling tests? |
|
/test pull-kubernetes-kubemark-e2e-gce-scale |
|
/triage accepted |
|
Hey folks, just checking in. Should I update this PR with some new tests? I'd like to get some signal around what we'd need to feel comfortable moving forward with this. |
|
/test pull-kubernetes-dependencies |
|
@negz: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
the lifecycle of etcd clients held by various storage implementations came up during review of #112050 and I wanted to make sure that by making the CRD etcd clients shared, we weren't going to tear down the shared etcd client when we update storage in response to a CRD update or removal at kubernetes/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go Lines 522 to 523 in 4d2128b
|
|
/assign Will take a look at it. |
|
If you still need this PR then please rebase, if not, please close the PR |
I wouldn't say I need this per se, but it seems like it would be a good thing for the API server to not be creating potentially hundreds of superfluous etcd clients (with the associated network and memory overhead). I'll find time to rebase this week, but more broadly I'm not sure I will have the bandwidth in the near future to proactively lobby to get this fixed, which is something I imagine will be needed to drive this PR to completion given it's been sitting open for months with little momentum. |
|
I'm +1 on the goal of not creating duplicate etcd clients. I'm -1 on adding a new client cache... the ones we have created for client-go have been sort of a nightmare, and I'm not eager to repeat/expand that experience Since we control all constructions of etcd clients (as opposed to client-go which is constructed by callers from The Internet), I would expect us to be able to move the client construction earlier in the server startup so a single instance can be shared by multiple registries. @enj @ritazh, would any of the storage cleanup work you did in 1.26 around unifying encryption make it easier to ensure we construct etcd clients for a single etcd server a single time? As an aside, I do wonder if this will change the performance characteristics of large servers with lots of requests going to parallel resources at the same time, which previously got their own connections, and now would multiplex over a single (?) connection. @wojtek-t, do our scale tests exercise saturating the apiserver <-> etcd connection? |
💯% agree here, lets not make the same mistake twice.
#114458 might work. |
|
Thanks for the direction folks. I'll close this PR and we can follow up on the issue that tracks the problem (#111622). @enj unfortunately I'm oversubscribed at the moment (and about to take some vacation) so I won't get a chance to dig into this until early January. Definitely happy to do so then. If you want to take a shot in the meantime what I'd be looking for is:
My personal motivation is driving down the memory usage. Currently when you load too many @crossplane providers (and thus too many CRDs) on an EKS or GKE cluster the API server does not scale up elegantly. It seems to be OOM killed and it takes some time for the control plane to recover and become available (presumably it's eventually given more memory). If using fewer etcd clients does indeed result the kind of improvement I saw (i.e. ~30% less memory with 1,800 CRDs) that's ~30% more CRDs we can load before things go south. (CC @apelisse who has been working to drive down API server memory and CPU usage under CRD load.) |
I think that etcd client uses GRPC, GRPC should loadbalance , no? |
sure, but multiplexing unrelated resources over a single connection at least opens the door to contention or capacity issues that separate connections would have side-stepped, right? |
|
In practice I believe all connections were made to the same backend previously. I will be a little surprised if gRPC's multiplexing (over single HTTP/2 connection) is worse than the kernel/network fabric's multiplexing (over multiple connections). |
I don't think we're really saturating it. Most of problems we were and are suffering from are at the kube-apiserver level and the tests were focusing on that. It shouldn't be hard to slightly extend the test to saturate it (e.g. by adding N pods running LIST calls), but we don't have it as part of default tests. I think my intuition is what @lavalamp wrote above, but we should probably test that anywah before making such change. I will queue taking a look at the other PR and looking what's the quickest way of testing it. |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Previously we would create one etcd client per type of custom resource, resulting in a TCP connection to the etcd cluster per type of resource, and excessive memory usage (mostly because we also created an expensive logger per client).
With ~1,900 CRDs loaded on an otherwise idle
kindcluster I'm seeing a ~35% reduction in RSS memory with this change. Garbage collection appears to be kicking in at ~5.2GiB RSS rather than ~8Gib RSS. Here's a profile:You can see that we're using dramatically fewer etcd connections:
Which issue(s) this PR fixes:
Fixes #111622
Special notes for your reviewer:
I'm opening this as a draft to illustrate my thinking. Notably it's missing any new tests at the moment - I'd like to get some signal that this direction looks good before I spend time on that.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: