Figure out what to do about external IPs #97110

tallclair · 2020-12-07T17:47:04Z

For CVE-2020-8554, #97076, we decided we couldn't patch it in-tree, and instead provided workarounds to disable (or allowlist) the feature through admission controls. Now that the issue is public, I'd like to open the conversation about a long-term fix.

Unless we missed something (entirely possible), I see a few possible paths forward:

Decide to stop supporting external IPs, deprecate and eventually remove the feature
Attempt to redesign the external IP feature, deprecate the old behavior, and manage the migration to the new feature
Accept the current state, and promote the externalip-webhook to a built-in admission controller
Accept the current state, and support the externalip-webhook as a long-term solution

/sig network architecture
/area security
/priority important-soon

k8s-ci-robot · 2020-12-07T17:47:11Z

@tallclair: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

PushkarJ · 2020-12-07T19:10:35Z

@tallclair : Thanks for sharing the details. Commenting here, since the original issue is closed. This maybe a novice question, but does default deny CNI network policy and / or Mutual Auth (e.g. mTLS) act as a mitigating control?

chadswen · 2020-12-07T19:44:58Z

Does this have the same root cause as the issue discussed in #22650 and fixed in openshift with an admission controller? (openshift issue: openshift/origin#7808 and PR: openshift/origin#7810) There is some existing discussion in the k/k thread if the added context helps form a strategy here.

I first encountered the issue a couple years ago when a tenant accidentally used an external IP incorrectly in one of our clusters, which I discovered while debugging an unreachable etcd node. The external IP Service was specifying the same IP as an etcd node's host IP, and that prevented all nodes running kube-proxy from reaching the etcd node since traffic was routed to the Service instead. Luckily we had other etcd nodes in the cluster and by debugging kube-proxy logs along with ipvsadm it didn't take too long to find the external IP as the root cause and shut it down.

I tried to report the issue via HackerOne several months ago, but I was never able to get the steps to reproduce working consistently in all clusters to meet the HackerOne report requirements, which may be similar to @champtar's experience in this comment. The OPA Gatekeeper based solution is similar to the workaround I came to as well, but it would be great to fix it upstream.

I'm sure there are users that depend on external IP, but for what it's worth I have rarely seen it used correctly.

bowei · 2020-12-07T22:00:04Z

/cc @bowei

tallclair · 2020-12-07T22:50:35Z

Thanks for the additional context @chadswen, this does in fact look like a duplicate of #22650. Since the conversation on that issue didn't go anywhere, hopefully the added visibility of having a CVE associated will help drive progress.

champtar · 2020-12-08T03:00:33Z

@chadswen thanks for digging those old tickets, now I'm not sure I even searched for related open issues before reporting ...
Also I have a similar story, crashed a cluster to find this bug :)

Here is my write up with 18 tests on 3 clusters: https://blog.champtar.fr/K8S_MITM_LoadBalancer_ExternalIPs/
In those tests (that are a bit old now) kube-proxy IPVS is the most affected, and patching the LoadBalancer is the most reliable.

thockin · 2020-12-08T05:52:53Z

#97076 as a proposal for k/k

champtar · 2020-12-08T12:35:15Z

I never understood why the data plane is different between LoadBalancer and ExternalIPs.
That being said, you can completely remove ExternalIP and replace it by a small static LoadBalancer controller with a simple mapping as config namespace/servicename -> ip, or have people run MetalLB controller without MetalLB speaker.

thockin · 2020-12-08T16:06:19Z

you can completely remove ExternalIP and replace it by a small static

LoadBalancer controller A user could. We can't unilaterally rip out a feature. We don't want to pull the rug out from underneath the (relatively small) number of legitimate users.

…

On Tue, Dec 8, 2020 at 4:35 AM Etienne Champetier ***@***.***> wrote: I never understood why the data plane is different between LoadBalancer and ExternalIPs. That being said, you can completely remove ExternalIP and replace it by a small static LoadBalancer controller with a simple mapping as config namespace/servicename -> ip, or have people run MetalLB controller without MetalLB speaker. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#97110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVHTCVZVCAK4G33DTLTSTYMRJANCNFSM4UQZNYYA> .

champtar · 2020-12-08T16:10:12Z

I mean make it configurable, then at some point disable it by default, then even later remove it.

thockin · 2020-12-08T16:13:24Z

I'm proposing to give users a choice - completely off, completely on (default, legacy), or enabled subject to your own policy. I haven't seen enough use-cases with enough commonality to warrant trying to add more standard APIs to control it. Users are adopting more general policy control, and I don't think we should swimp against that current.

…

On Tue, Dec 8, 2020 at 8:10 AM Etienne Champetier ***@***.***> wrote: I mean make it configurable, then at some point disable it by default, then even later remove it. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

danwinship · 2020-12-08T16:27:57Z

Accept the current state, and promote the externalip-webhook to a built-in admission controller

Accept the current state, and support the externalip-webhook as a long-term solution

Note that externalip-webhook's allowed-external-ip-cidrs solves the problem from the CVE but is still insecure if there are mutually-untrusting users, because it doesn't stop one user from creating a Service using the same externalIP+port as another user's existing Service.

thockin · 2020-12-08T16:32:19Z

Right. To *really* solve this we need a much more robust control API and I am not (yet) convinced that is justified.

…

On Tue, Dec 8, 2020 at 8:28 AM Dan Winship ***@***.***> wrote: 1. Accept the current state, and promote the externalip-webhook <https://github.com/kubernetes-sigs/externalip-webhook> to a built-in admission controller 2. Accept the current state, and support the externalip-webhook <https://github.com/kubernetes-sigs/externalip-webhook> as a long-term solution Note that externalip-webhook's allowed-external-ip-cidrs solves the problem from the CVE but is still insecure if there are mutually-untrusting users, because it doesn't stop one user from creating a Service using the same externalIP+port as another user's existing Service. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#97110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVHVFAFQR6KVU3DSE6LSTZH2DANCNFSM4UQZNYYA> .

tgraf · 2020-12-10T15:30:45Z

As a potential reference on how to fix this: We have fixed this in Cilium a while ago [0] by ignoring ExternalIP for any traffic from a pod unless the destination IP is mapped to a node IP (GCE case). It appears that this code [1] is trying to do something similar but is specific to bridge based network implementations.

[0] cilium/cilium@1d8589a
[1]

kubernetes/pkg/proxy/ipvs/proxier.go

Lines 1740 to 1752 in 018942a

    
           externalIPRules := func(args []string) { 
        
           	// Allow traffic for external IPs that does not come from a bridge (i.e. not from a container) 
        
           	// nor from a local process to be forwarded to the service. 
        
           	// This rule roughly translates to "all traffic from off-machine". 
        
           	// This is imperfect in the face of network plugins that might not use a bridge, but we can revisit that later. 
        
           	externalTrafficOnlyArgs := append(args, 
        
           		"-m", "physdev", "!", "--physdev-is-in", 
        
           		"-m", "addrtype", "!", "--src-type", "LOCAL") 
        
           	writeLine(proxier.natRules, append(externalTrafficOnlyArgs, "-j", "ACCEPT")...) 
        
           	dstLocalOnlyArgs := append(args, "-m", "addrtype", "--dst-type", "LOCAL") 
        
           	// Allow traffic bound for external IPs that happen to be recognized as local IPs to stay local. 
        
           	// This covers cases like GCE load-balancers which get added to the local routing table. 
        
           	writeLine(proxier.natRules, append(dstLocalOnlyArgs, "-j", "ACCEPT")...)

thockin · 2020-12-10T17:12:40Z

The problem is that the feature is too loosely defined. Something like that feels right but is, strictly, a breaking change *and* doesn't solve the whole problem (i.e. I can still steal your IP address). I just want to break it completely and let people who *need* it define their own policies in their own ways...

…

On Thu, Dec 10, 2020 at 7:31 AM Thomas Graf ***@***.***> wrote: As a potential reference on how to fix this: We have fixed this in Cilium a while ago [0] by ignoring ExternalIP for any traffic from a pod unless the destination IP is mapped to a node IP (GCE case). It appears that this code [1] is trying to do something similar but is specific to bridge based network implementations. [0] ***@***.*** <cilium/cilium@1d8589a> [1] https://github.com/kubernetes/kubernetes/blob/018942a92a90f91261ba63971827f2ec6b64587e/pkg/proxy/ipvs/proxier.go#L1740-L1752 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#97110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVBNRHHASIV5SVTE7E3SUDSTRANCNFSM4UQZNYYA> .

tgraf · 2020-12-10T18:00:25Z

I'm not sure I understand how you can still steal the IP outside of what ExternalIP was intended for unless you are referring to a potential restriction for ExternalIPs to not cover PodIP and ServiceIP CIDR which would potentially be even better. I'm not sure what we can better at the k8s service implementation layer. I agree with your overall statement though. We often see users try and patch over this with network policies. Maybe it would be cleaner to have separate k8s resources for cluster internal and external services.

champtar · 2020-12-10T18:06:07Z

You can have multiple Services with the same EXTERNAL-IP:PORT

NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
mitm-external-eip-dns   ClusterIP      10.233.55.182   8.8.8.8       53/UDP         4m10s
mitm-external-lb-dns    LoadBalancer   10.233.35.158   8.8.8.8       53:31303/UDP   64s

# kubectl get svc -n kubeproxy-mitm
NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
mitm-external-lb-dns1   LoadBalancer   10.233.47.145   8.8.8.8       53:31545/UDP   4m37s
mitm-external-lb-dns2   LoadBalancer   10.233.40.23    8.8.8.8       53:31556/UDP   4m37s
mitm-external-lb-dns3   LoadBalancer   10.233.28.107   8.8.8.8       53:31206/UDP   4m37s

vassilvk · 2020-12-18T23:13:03Z

@danwinship

Note that externalip-webhook's allowed-external-ip-cidrs solves the problem from the CVE but is still insecure if there are mutually-untrusting users, because it doesn't stop one user from creating a Service using the same externalIP+port as another user's existing Service.

As an alternative to using external-webhook one can use a KubeMod reject rule like this one.

KubeMod rules apply only to the namespace they are deployed to.
This makes it possible to have different policy rules with different allowed CIDRs and IP ranges for each tenant in a multi-tenant cluster.

w8mej · 2020-12-28T16:26:13Z

I concur with @tgraf . It would be much cleaner to separate k8s resources for internal / external cluster services.

pmoust · 2021-01-04T08:40:17Z

It would be much cleaner to separate k8s resources for internal / external cluster services.

It's good to have that classification.
But it doesn't solve for the intra-cluster MITM case, unless I am missing something.
Perhaps added controls (like namespace classification) would help.

thockin · 2021-01-04T23:05:43Z

Gateway should be the basis for the distinction between internal and external.

thockin · 2021-01-04T23:31:30Z

Proposd impl: #97395

aojea · 2021-03-13T21:36:38Z

should we close it now @thockin ?

zhousam · 2021-04-07T07:22:34Z

Proposd impl: #97395

is this PR the final solution to CVE-2020-8554？Do we have plans to imple the second path "Attempt to redesign the external IP feature, deprecate the old behavior, and manage the migration to the new feature" in future version of kubernetes?

fejta-bot · 2021-07-06T08:07:48Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

k8s-triage-robot · 2021-08-05T09:08:08Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

cbron · 2021-08-05T14:55:48Z

The question by @zhousam still needs to be addressed. If the PR is merged and already live in the now released k8s 1.21, should this issue be resolved or is there additional work needed ? cc @thockin @tallclair

cbron · 2021-08-05T15:00:59Z

/remove-lifecycle rotten

aojea · 2021-08-09T10:41:46Z

is this PR the final solution to CVE-2020-8554？Do we have plans to imple the second path "Attempt to redesign the external IP feature, deprecate the old behavior, and manage the migration to the new feature" in future version of kubernetes?

The field is part of Services API that is GA, is not likely to be removed, see comment #97110 (comment)

Gateway should be the basis for the distinction between internal and external.

@thockin refers to https://gateway-api.sigs.k8s.io/ , I think that he means that any attempt to redesign will come from that new API

Hope that answers the remaining questions
/close

k8s-ci-robot · 2021-08-09T10:41:52Z

@aojea: Closing this issue.

In response to this:

is this PR the final solution to CVE-2020-8554？Do we have plans to imple the second path "Attempt to redesign the external IP feature, deprecate the old behavior, and manage the migration to the new feature" in future version of kubernetes?

The field is part of Services API that is GA, is not likely to be removed, see comment #97110 (comment)

Gateway should be the basis for the distinction between internal and external.

@thockin refers to https://gateway-api.sigs.k8s.io/ , I think that he means that any attempt to redesign will come from that new API

Hope that answers the remaining questions
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tallclair added the kind/bug Categorizes issue or PR as related to a bug. label Dec 7, 2020

tallclair mentioned this issue Dec 7, 2020

CVE-2020-8554: Man in the middle using LoadBalancer or ExternalIPs #97076

Closed

JimBugwadia mentioned this issue Dec 7, 2020

Policy to restrict external IPs and load balancer IP kyverno/kyverno#1365

Closed

aojea mentioned this issue Dec 7, 2020

kube-proxy treat ExternalIPs as ClusterIPs #96296

Merged

thockin added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 8, 2020

thockin mentioned this issue Dec 8, 2020

KEP 2200: Block service ExternalIPs via admission kubernetes/enhancements#2176

Merged

bsctl mentioned this issue Dec 9, 2020

Policy to restrict External IPs in Services [CVE-2020-8554] projectcapsule/capsule#159

Closed

thockin mentioned this issue Dec 18, 2020

Add denyserviceexternalips admission (KEP 2200) #97395

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 6, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 5, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 5, 2021

k8s-ci-robot closed this as completed Aug 9, 2021

diogopcx mentioned this issue Jun 27, 2023

CVE-2020-8554 @ Go-k8s.io/api-v0.23.5 diogopcx/CheckmarxDemo#103

Open

tnqn mentioned this issue Jul 13, 2023

Allow specifying ExternalTrafficPolicy for Services with ExternalIPs #119150

Merged

apcxtest mentioned this issue Aug 17, 2023

CVE-2020-8554 @ Go-k8s.io/api-v0.23.5 apcxtest/test-repo-pub#139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out what to do about external IPs #97110

Figure out what to do about external IPs #97110

tallclair commented Dec 7, 2020

k8s-ci-robot commented Dec 7, 2020

PushkarJ commented Dec 7, 2020

chadswen commented Dec 7, 2020

bowei commented Dec 7, 2020

tallclair commented Dec 7, 2020

champtar commented Dec 8, 2020

thockin commented Dec 8, 2020

champtar commented Dec 8, 2020

thockin commented Dec 8, 2020 via email

champtar commented Dec 8, 2020

thockin commented Dec 8, 2020 via email

danwinship commented Dec 8, 2020

thockin commented Dec 8, 2020 via email

tgraf commented Dec 10, 2020

thockin commented Dec 10, 2020 via email

tgraf commented Dec 10, 2020

champtar commented Dec 10, 2020

vassilvk commented Dec 18, 2020

w8mej commented Dec 28, 2020

pmoust commented Jan 4, 2021 •

edited

thockin commented Jan 4, 2021

thockin commented Jan 4, 2021

aojea commented Mar 13, 2021

zhousam commented Apr 7, 2021

fejta-bot commented Jul 6, 2021

k8s-triage-robot commented Aug 5, 2021

cbron commented Aug 5, 2021 •

edited

cbron commented Aug 5, 2021

aojea commented Aug 9, 2021

k8s-ci-robot commented Aug 9, 2021

Figure out what to do about external IPs #97110

Figure out what to do about external IPs #97110

Comments

tallclair commented Dec 7, 2020

k8s-ci-robot commented Dec 7, 2020

PushkarJ commented Dec 7, 2020

chadswen commented Dec 7, 2020

bowei commented Dec 7, 2020

tallclair commented Dec 7, 2020

champtar commented Dec 8, 2020

thockin commented Dec 8, 2020

champtar commented Dec 8, 2020

thockin commented Dec 8, 2020 via email

champtar commented Dec 8, 2020

thockin commented Dec 8, 2020 via email

danwinship commented Dec 8, 2020

thockin commented Dec 8, 2020 via email

tgraf commented Dec 10, 2020

thockin commented Dec 10, 2020 via email

tgraf commented Dec 10, 2020

champtar commented Dec 10, 2020

vassilvk commented Dec 18, 2020

w8mej commented Dec 28, 2020

pmoust commented Jan 4, 2021 • edited

thockin commented Jan 4, 2021

thockin commented Jan 4, 2021

aojea commented Mar 13, 2021

zhousam commented Apr 7, 2021

fejta-bot commented Jul 6, 2021

k8s-triage-robot commented Aug 5, 2021

cbron commented Aug 5, 2021 • edited

cbron commented Aug 5, 2021

aojea commented Aug 9, 2021

k8s-ci-robot commented Aug 9, 2021

pmoust commented Jan 4, 2021 •

edited

cbron commented Aug 5, 2021 •

edited