New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider providing separate etcd destination for CRDs #118858
Comments
similar to #4432 |
|
/triage accepted |
@sttts Can you please comment on what is the best way forward here ? I can start investigating implementation for etcd cluster override for CRDs but would like to verify that it is aligned with long term direction. /cc @serathius |
Formally Sig-API-Machinery is responsible for this topic. The Sig meeting on every second Wednesday might be a good place to bring it up. There is an agenda document. Just put it on there. cc @fedebongio |
Thanks @sttts . I will attend the next meeting to discuss this with the community. |
cc @jpbetz |
This is a good idea, except that it would also need to include a way to deploy the 2nd etcd cluster. Would we adopt a standard operator? |
Proliferation of CRD's along with the behavior of their controllers is something that is definitely causing scalability ceilings for single clusters. This idea sounds helpful, and would cover one part of the equation in terms of prioritising/maintaining availability of core etcd cluster and providing additional capacity. Another side of the equation we need to address is ensuring api server massive memory growth & spikes could also be mitigated when dealing with vast amounts of objects. |
Binary protocols for CRD (@benluddy is planning to submit a KEP for 1.29) should help a lot with CRD scalability. I'm love to see what scale limits clusters with lots of CRDs hitting limits after that is available. I'm also very curious what limit clusters are hitting. Is is apiserver CPU? etcd CPU or storage space? Depending on the limit hit, a separate etcd may or may not help. |
This is https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4222-cbor-serializer |
What would you like to be added?
As of now, Kubernetes api server provides a mechanism to push events to a separate etcd cluster using the --etcd-servers-overrides="/events#" flag. This issue is to request similar mechanism for sending CRDs to separate etcd cluster.
Why is this needed?
Primary motivation is to keep the main etcd cluster performant.
CRD listing - Some workloads use their CRDs for events (example argo). These events cause issues similar to the Kubernetes native events - they have spiky writes and they keep getting LIST calls typically from monitoring tools. The motivation for moving these out of main etcd cluster is similar to the reasoning for moving Kubernetes events out of main etcd cluster.
CRD count - Some CRDs produce millions of objects and affect the performance of main etcd cluster.
The text was updated successfully, but these errors were encountered: