-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource Management of ETCD Load #120781
Comments
/sig api-machinery |
/assign @wenjiaswe |
@jiahuif I don't really think this is an etcd problem - it's as efficient as it's going to get. Kubernetes needs to be responsible for not overloading it. |
Agree. But... Interestingly, somewhat related, @logicalhan proposed his interesting project extensible-etcd recently: https://docs.google.com/document/d/16XEGyPBisZvmmoIHSZzv__LoyOeluC5a4x353CX0SIM/edit#bookmark=id.n170uancbaqb, it would potentially help with etcd performance limit. |
We had seen projects like this, but really etcd is very nice the way it is, and we don't have much interest in switching to something else. That is why I think a Resource Quota is an interesting idea. Or maybe something like Kubelet being able to "queue" etcd requests to stop overloading. |
cc @serathius for thoughts |
/sig etcd |
If we pursued this, it would be in-tree for etcd. |
Okay, so should I re-create the ticket there? I would also be happy to work on etcd. I am also not convinced that we want to handle this in etcd. I don't want to hold back the creation of key-value pairs, I want to stop the key-value pairs even being sent to etcd in the first place. |
This sounds like a flow control issue. Are you using https://kubernetes.io/docs/concepts/cluster-administration/flow-control/? |
What would you like to be added?
There should be Resource Management for tracking etcd load. This can then be used as a Resource Quota for Jobs to ensure that they (and potentially kubelet) do not spin up Pods faster than etcd can handle.
K8s Docs:
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#extended-resources
https://kubernetes.io/docs/concepts/policy/resource-quotas/
Why is this needed?
For people dealing with extremely high-throughput batch work (i.e 1000's jobs / second, lasting 1-2 mins each), etcd starts to become a real problem. There should be an in-k8s solution to this.
Whether it is through Resource Quotas or some other medium, I think this is something interesting to be explored. Happy to make this Ticket more detailed as required, and start the KEP if this is something that is interesting to folks.
Links to back up this point:
The text was updated successfully, but these errors were encountered: