Consider providing separate etcd destination for CRDs #118858

geetasg · 2023-06-25T21:14:01Z

What would you like to be added?

As of now, Kubernetes api server provides a mechanism to push events to a separate etcd cluster using the --etcd-servers-overrides="/events#" flag. This issue is to request similar mechanism for sending CRDs to separate etcd cluster.

Why is this needed?

Primary motivation is to keep the main etcd cluster performant.

CRD listing - Some workloads use their CRDs for events (example argo). These events cause issues similar to the Kubernetes native events - they have spiky writes and they keep getting LIST calls typically from monitoring tools. The motivation for moving these out of main etcd cluster is similar to the reasoning for moving Kubernetes events out of main etcd cluster.
CRD count - Some CRDs produce millions of objects and affect the performance of main etcd cluster.

geetasg · 2023-06-25T21:14:40Z

similar to #4432

aojea · 2023-06-26T11:11:11Z

/sig api-machinery
/cc @wojtek-t @sttts

sttts · 2023-06-26T12:41:02Z

--etcd-servers-overrides could certainly be extended to cover CRDs too. I remember that there was such a discussion in the past. If I remember right, the only reason against that was that we are not really confident --etcd-servers-overrides is the right long-term solution. But that discussion has been years ago. I don't think there has been much progress to find a more abstract way to configure storage.

alexzielenski · 2023-06-27T19:59:59Z

/triage accepted

geetasg · 2023-07-05T21:36:24Z

@sttts Can you please comment on what is the best way forward here ? I can start investigating implementation for etcd cluster override for CRDs but would like to verify that it is aligned with long term direction. /cc @serathius

sttts · 2023-07-06T12:08:04Z

Formally Sig-API-Machinery is responsible for this topic. The Sig meeting on every second Wednesday might be a good place to bring it up. There is an agenda document. Just put it on there. cc @fedebongio

geetasg · 2023-07-19T16:57:37Z

Thanks @sttts . I will attend the next meeting to discuss this with the community.

wenjiaswe · 2023-08-02T06:24:28Z

cc @jpbetz

jberkus · 2023-08-24T18:06:40Z

This is a good idea, except that it would also need to include a way to deploy the 2nd etcd cluster. Would we adopt a standard operator?

jmhbnz · 2023-08-24T18:28:08Z

Proliferation of CRD's along with the behavior of their controllers is something that is definitely causing scalability ceilings for single clusters.

This idea sounds helpful, and would cover one part of the equation in terms of prioritising/maintaining availability of core etcd cluster and providing additional capacity. Another side of the equation we need to address is ensuring api server massive memory growth & spikes could also be mitigated when dealing with vast amounts of objects.

jpbetz · 2023-08-24T18:37:50Z

Binary protocols for CRD (@benluddy is planning to submit a KEP for 1.29) should help a lot with CRD scalability. I'm love to see what scale limits clusters with lots of CRDs hitting limits after that is available.

I'm also very curious what limit clusters are hitting. Is is apiserver CPU? etcd CPU or storage space? Depending on the limit hit, a separate etcd may or may not help.

liangyuanpeng · 2024-01-16T05:41:37Z

Binary protocols for CRD (@benluddy is planning to submit a KEP for 1.29) should help a lot with CRD scalability

This is https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4222-cbor-serializer

geetasg added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 25, 2023

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 25, 2023

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 26, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 27, 2023

liangyuanpeng mentioned this issue Jan 16, 2024

Reduce overhead of work objects for numerous member clusters scenario karmada-io/karmada#4549

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider providing separate etcd destination for CRDs #118858

Consider providing separate etcd destination for CRDs #118858

geetasg commented Jun 25, 2023

geetasg commented Jun 25, 2023

aojea commented Jun 26, 2023

sttts commented Jun 26, 2023

alexzielenski commented Jun 27, 2023

geetasg commented Jul 5, 2023

sttts commented Jul 6, 2023

geetasg commented Jul 19, 2023

wenjiaswe commented Aug 2, 2023

jberkus commented Aug 24, 2023

jmhbnz commented Aug 24, 2023 •

edited

jpbetz commented Aug 24, 2023 •

edited

liangyuanpeng commented Jan 16, 2024

Consider providing separate etcd destination for CRDs #118858

Consider providing separate etcd destination for CRDs #118858

Comments

geetasg commented Jun 25, 2023

What would you like to be added?

Why is this needed?

geetasg commented Jun 25, 2023

aojea commented Jun 26, 2023

sttts commented Jun 26, 2023

alexzielenski commented Jun 27, 2023

geetasg commented Jul 5, 2023

sttts commented Jul 6, 2023

geetasg commented Jul 19, 2023

wenjiaswe commented Aug 2, 2023

jberkus commented Aug 24, 2023

jmhbnz commented Aug 24, 2023 • edited

jpbetz commented Aug 24, 2023 • edited

liangyuanpeng commented Jan 16, 2024

jmhbnz commented Aug 24, 2023 •

edited

jpbetz commented Aug 24, 2023 •

edited