Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out what to do about external IPs #97110

Closed
tallclair opened this issue Dec 7, 2020 · 30 comments
Closed

Figure out what to do about external IPs #97110

tallclair opened this issue Dec 7, 2020 · 30 comments
Labels
area/security kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@tallclair
Copy link
Member

For CVE-2020-8554, #97076, we decided we couldn't patch it in-tree, and instead provided workarounds to disable (or allowlist) the feature through admission controls. Now that the issue is public, I'd like to open the conversation about a long-term fix.

Unless we missed something (entirely possible), I see a few possible paths forward:

  1. Decide to stop supporting external IPs, deprecate and eventually remove the feature
  2. Attempt to redesign the external IP feature, deprecate the old behavior, and manage the migration to the new feature
  3. Accept the current state, and promote the externalip-webhook to a built-in admission controller
  4. Accept the current state, and support the externalip-webhook as a long-term solution

/sig network architecture
/area security
/priority important-soon

@tallclair tallclair added the kind/bug Categorizes issue or PR as related to a bug. label Dec 7, 2020
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. area/security priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 7, 2020
@k8s-ci-robot
Copy link
Contributor

@tallclair: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@PushkarJ
Copy link
Member

PushkarJ commented Dec 7, 2020

@tallclair : Thanks for sharing the details. Commenting here, since the original issue is closed. This maybe a novice question, but does default deny CNI network policy and / or Mutual Auth (e.g. mTLS) act as a mitigating control?

@chadswen
Copy link
Member

chadswen commented Dec 7, 2020

Does this have the same root cause as the issue discussed in #22650 and fixed in openshift with an admission controller? (openshift issue: openshift/origin#7808 and PR: openshift/origin#7810) There is some existing discussion in the k/k thread if the added context helps form a strategy here.

I first encountered the issue a couple years ago when a tenant accidentally used an external IP incorrectly in one of our clusters, which I discovered while debugging an unreachable etcd node. The external IP Service was specifying the same IP as an etcd node's host IP, and that prevented all nodes running kube-proxy from reaching the etcd node since traffic was routed to the Service instead. Luckily we had other etcd nodes in the cluster and by debugging kube-proxy logs along with ipvsadm it didn't take too long to find the external IP as the root cause and shut it down.

I tried to report the issue via HackerOne several months ago, but I was never able to get the steps to reproduce working consistently in all clusters to meet the HackerOne report requirements, which may be similar to @champtar's experience in this comment. The OPA Gatekeeper based solution is similar to the workaround I came to as well, but it would be great to fix it upstream.

I'm sure there are users that depend on external IP, but for what it's worth I have rarely seen it used correctly.

@bowei
Copy link
Member

bowei commented Dec 7, 2020

/cc @bowei

@tallclair
Copy link
Member Author

Thanks for the additional context @chadswen, this does in fact look like a duplicate of #22650. Since the conversation on that issue didn't go anywhere, hopefully the added visibility of having a CVE associated will help drive progress.

@champtar
Copy link
Contributor

champtar commented Dec 8, 2020

@chadswen thanks for digging those old tickets, now I'm not sure I even searched for related open issues before reporting ...
Also I have a similar story, crashed a cluster to find this bug :)

Here is my write up with 18 tests on 3 clusters: https://blog.champtar.fr/K8S_MITM_LoadBalancer_ExternalIPs/
In those tests (that are a bit old now) kube-proxy IPVS is the most affected, and patching the LoadBalancer is the most reliable.

@thockin
Copy link
Member

thockin commented Dec 8, 2020

#97076 as a proposal for k/k

@thockin thockin added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 8, 2020
@champtar
Copy link
Contributor

champtar commented Dec 8, 2020

I never understood why the data plane is different between LoadBalancer and ExternalIPs.
That being said, you can completely remove ExternalIP and replace it by a small static LoadBalancer controller with a simple mapping as config namespace/servicename -> ip, or have people run MetalLB controller without MetalLB speaker.

@thockin
Copy link
Member

thockin commented Dec 8, 2020 via email

@champtar
Copy link
Contributor

champtar commented Dec 8, 2020

I mean make it configurable, then at some point disable it by default, then even later remove it.

@thockin
Copy link
Member

thockin commented Dec 8, 2020 via email

@danwinship
Copy link
Contributor

  1. Accept the current state, and promote the externalip-webhook to a built-in admission controller
  2. Accept the current state, and support the externalip-webhook as a long-term solution

Note that externalip-webhook's allowed-external-ip-cidrs solves the problem from the CVE but is still insecure if there are mutually-untrusting users, because it doesn't stop one user from creating a Service using the same externalIP+port as another user's existing Service.

@thockin
Copy link
Member

thockin commented Dec 8, 2020 via email

@tgraf
Copy link
Contributor

tgraf commented Dec 10, 2020

As a potential reference on how to fix this: We have fixed this in Cilium a while ago [0] by ignoring ExternalIP for any traffic from a pod unless the destination IP is mapped to a node IP (GCE case). It appears that this code [1] is trying to do something similar but is specific to bridge based network implementations.

[0] cilium/cilium@1d8589a
[1]

externalIPRules := func(args []string) {
// Allow traffic for external IPs that does not come from a bridge (i.e. not from a container)
// nor from a local process to be forwarded to the service.
// This rule roughly translates to "all traffic from off-machine".
// This is imperfect in the face of network plugins that might not use a bridge, but we can revisit that later.
externalTrafficOnlyArgs := append(args,
"-m", "physdev", "!", "--physdev-is-in",
"-m", "addrtype", "!", "--src-type", "LOCAL")
writeLine(proxier.natRules, append(externalTrafficOnlyArgs, "-j", "ACCEPT")...)
dstLocalOnlyArgs := append(args, "-m", "addrtype", "--dst-type", "LOCAL")
// Allow traffic bound for external IPs that happen to be recognized as local IPs to stay local.
// This covers cases like GCE load-balancers which get added to the local routing table.
writeLine(proxier.natRules, append(dstLocalOnlyArgs, "-j", "ACCEPT")...)

@thockin
Copy link
Member

thockin commented Dec 10, 2020 via email

@tgraf
Copy link
Contributor

tgraf commented Dec 10, 2020

I'm not sure I understand how you can still steal the IP outside of what ExternalIP was intended for unless you are referring to a potential restriction for ExternalIPs to not cover PodIP and ServiceIP CIDR which would potentially be even better. I'm not sure what we can better at the k8s service implementation layer. I agree with your overall statement though. We often see users try and patch over this with network policies. Maybe it would be cleaner to have separate k8s resources for cluster internal and external services.

@champtar
Copy link
Contributor

You can have multiple Services with the same EXTERNAL-IP:PORT

NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
mitm-external-eip-dns   ClusterIP      10.233.55.182   8.8.8.8       53/UDP         4m10s
mitm-external-lb-dns    LoadBalancer   10.233.35.158   8.8.8.8       53:31303/UDP   64s

# kubectl get svc -n kubeproxy-mitm
NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
mitm-external-lb-dns1   LoadBalancer   10.233.47.145   8.8.8.8       53:31545/UDP   4m37s
mitm-external-lb-dns2   LoadBalancer   10.233.40.23    8.8.8.8       53:31556/UDP   4m37s
mitm-external-lb-dns3   LoadBalancer   10.233.28.107   8.8.8.8       53:31206/UDP   4m37s

@vassilvk
Copy link

@danwinship

Note that externalip-webhook's allowed-external-ip-cidrs solves the problem from the CVE but is still insecure if there are mutually-untrusting users, because it doesn't stop one user from creating a Service using the same externalIP+port as another user's existing Service.

As an alternative to using external-webhook one can use a KubeMod reject rule like this one.

KubeMod rules apply only to the namespace they are deployed to.
This makes it possible to have different policy rules with different allowed CIDRs and IP ranges for each tenant in a multi-tenant cluster.

@w8mej
Copy link

w8mej commented Dec 28, 2020

I concur with @tgraf . It would be much cleaner to separate k8s resources for internal / external cluster services.

@pmoust
Copy link

pmoust commented Jan 4, 2021

It would be much cleaner to separate k8s resources for internal / external cluster services.

It's good to have that classification.
But it doesn't solve for the intra-cluster MITM case, unless I am missing something.
Perhaps added controls (like namespace classification) would help.

@thockin
Copy link
Member

thockin commented Jan 4, 2021

Gateway should be the basis for the distinction between internal and external.

@thockin
Copy link
Member

thockin commented Jan 4, 2021

Proposd impl: #97395

@aojea
Copy link
Member

aojea commented Mar 13, 2021

should we close it now @thockin ?

@zhousam
Copy link

zhousam commented Apr 7, 2021

Proposd impl: #97395

is this PR the final solution to CVE-2020-8554?Do we have plans to imple the second path "Attempt to redesign the external IP feature, deprecate the old behavior, and manage the migration to the new feature" in future version of kubernetes?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 6, 2021
@k8s-triage-robot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 5, 2021
@cbron
Copy link

cbron commented Aug 5, 2021

The question by @zhousam still needs to be addressed. If the PR is merged and already live in the now released k8s 1.21, should this issue be resolved or is there additional work needed ? cc @thockin @tallclair

@cbron
Copy link

cbron commented Aug 5, 2021

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 5, 2021
@aojea
Copy link
Member

aojea commented Aug 9, 2021

is this PR the final solution to CVE-2020-8554?Do we have plans to imple the second path "Attempt to redesign the external IP feature, deprecate the old behavior, and manage the migration to the new feature" in future version of kubernetes?

The field is part of Services API that is GA, is not likely to be removed, see comment #97110 (comment)

Gateway should be the basis for the distinction between internal and external.

@thockin refers to https://gateway-api.sigs.k8s.io/ , I think that he means that any attempt to redesign will come from that new API

Hope that answers the remaining questions
/close

@k8s-ci-robot
Copy link
Contributor

@aojea: Closing this issue.

In response to this:

is this PR the final solution to CVE-2020-8554?Do we have plans to imple the second path "Attempt to redesign the external IP feature, deprecate the old behavior, and manage the migration to the new feature" in future version of kubernetes?

The field is part of Services API that is GA, is not likely to be removed, see comment #97110 (comment)

Gateway should be the basis for the distinction between internal and external.

@thockin refers to https://gateway-api.sigs.k8s.io/ , I think that he means that any attempt to redesign will come from that new API

Hope that answers the remaining questions
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/security kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests