-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods status changes on one worker node can be lerveraged to make flooding attack to all other nodes in the kubernetes cluster #110596
Comments
@younaman: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @wangyysde |
/sig security network |
Knock knock! Are there any updates or comments? |
@aojea @chrisohaver @neolit123 @pacoxu Are there any suggestions or comments about my questions? Looking forward to your reply :) |
@wangyysde @aojea @chrisohaver @neolit123 @pacoxu It has been 5 days and are there any suggestions or comments about my questions? Looking forward to your reply :) |
I’m not knowledgeable enough on the subject to answer. |
@chrisohaver Thanks for your reply! Because you are the Kubernetes contributor, so I want to know that have you know someone who can give me some comments? Looking forward to your reply! |
For security issues, I think you can submit it to https://hackerone.com/kubernetes/thanks?type=team. For me, I think what we can do on this:
|
@pacoxu I reported similar DoS issue to hackerone.com, however, hackerone told me that "DoS attacks is out of scope." Are you sure that my report will not get a similar "out of scope?" By the way, thanks for your mitigation suggestions! However, I want to know the answers to my questions:
|
I'm not sure. 😓 |
It's hard to call this an "issue". We need to be able to fail readiness on pods and we need to be able to update the endpoints on every client (node) in a reasonably short time window. Otherwise we end up routing traffic to dead endpoints. EndpointSlice was designed to mitigate some of the impact here, but ultimately the updates must flow. kube-proxy has a rate-limited write to iptables, so it can only do that every so often (though looking at it, that default may be too low). The best we could do would be to rate limit updates per namespace or something like that. I'llleave this open to discuss, but it's not a super compelling option. |
@thockin Thanks for your comments and suggestions!
Looking forward to your reply! |
#110268 may help this by making each |
The problem is, I think, that it's a purely static config. We could consider something more dynamic, like the holdoff period being proportional to how long the run took. If it takes 10 seconds to sync, we should probably hold off more than if it takes 0.3 seconds. I think. It could also be dynamic based on offender - we could consider "rare" events to be more urgent and "common" ones to be eligible for backoff. We could even get really smart and say "this particular endpoint seems to be flapping, so we leave it disabled longer". This is a significant change - it will require local decision making and caching and expiry. It's not clear that this is justified, yet. I have not seen a lot of reports of this being a real problem in the wild. I'm in favor of considering options, but I haven't yet come up with one that is obviously right. |
@thockin Thanks for your reply! It is a hard choice between flexibility and security :)Besides, please notice the second point I comment: "The information channel leveraged by this flooding attack will exist forever if you need to leverage the kube-proxy to sync endpoints status to other worker nodes no matter how often the interval is?" Perhaps this problem is intrinsic to kube-proxy design and there does not exist a silver bullet to solve this problem fundamentally? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
to add, pod status update is a privileged api. Having said that. it is unlikely that pod status will change every second unless somebody is presenting itself of a kubelet like privilege api-server and is patching pod object on a tight loop. Both kubelet (the upstream patching side), api-server, and kube-proxy run with rate limits that should prevent that. |
/close there is no evidence of DoS attack , just a way to generate more change of status and consume more resources |
@aojea: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened?
In a Kubernetes cluster, each node has its Kube-proxy and kubelet components. For Kube-proxy, it will watch the modification of service/endpoints informed by API-server, and update local iptables; For Kubelet, it will send messages to Kube-apiserver when local pods' status change.
So, when a pod status related to a service/endpoints has changed on a worker node, the worker node's kubelet will send messages to the Kube-apiserver on the control plane, while the Kube-apiserver received the pod's status change, it will update the status of the related endpoint in etcd, and push related endpoint changes message to all other nodes' Kube-proxy, all other nodes' Kube-proxy will modify its local iptables.
As a result, a malicious user's service can change its pods status, trigger the Kube-apiserver push endpoint changes to all other nodes, making processes (iptables, calico, etc.) on other nodes consume CPU/memory resources, making flooding attack.
What did you expect to happen?
We have reported a similar DoS issue to Kubernetes hackone.com, however, HackerOne said that: "all Denial-of-Service findings are out-of-scope". So I try to write my issue on GitHub and I want to know that:
Is it a real issue?
If it is a real issue, are there any ways to defend against this problem fundamentally?
As far as I am concerned, the problem is intrinsic to Kubernetes Kube-proxy design, if it is a real issue, does Kubernetes plan to mitigate this issue at least?
How can we reproduce it (as minimally and precisely as possible)?
These 10 pods are all on younaman-thinkpad worker node, and try to change the pod's ready status per second.
2. Deploy one deployment that has 90 pods, all these 90 pods, and the former 10 pods both have label: nginx-1
On our local testbed,(1 master, 2 worker node, each has 4 core CPU and 8GB memory), by doing like this, on the control plane node, the api-server/etcd/calico etc. processes will consume 60% CPU resources, on the other worker nodes, the calico/kube-proxy/iptables etc. processes will consume 20% CPU, and the incoming network bandwidth on other worker nodes is 1M/s. All these mean a flooding attack.
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: