Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why kube-proxy add external-lb's address to node local iptables rule? #66607

Closed
BSWANG opened this issue Jul 25, 2018 · 62 comments · Fixed by #92312
Closed

Why kube-proxy add external-lb's address to node local iptables rule? #66607

BSWANG opened this issue Jul 25, 2018 · 62 comments · Fixed by #92312
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@BSWANG
Copy link
Contributor

BSWANG commented Jul 25, 2018

/kind friction

What happened:
I have a LoadBalancer type service A of address 1.1.1.1. The external loadbalancer of service A is a TLS decoder, it will convert https requests to http hostport and endpoint. But since the kube-proxy add the external-lb's address to local iptables rule. Requests of https//1.1.1.1 will hijack to local http endpoints. Then https request failed.

What you expected to happen:
Kube-proxy don't add external-lb's address to local iptables. And the request will go through external-lb.

Environment:

  • Kubernetes version (use kubectl version):
    1.10.4
  • Cloud provider or hardware configuration:
    Alibaba Cloud
  • OS (e.g. from /etc/os-release):
    Centos 7.4
  • Kernel (e.g. uname -a):
    3.10.0-693
  • Install tools:
    kubeadm
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Jul 25, 2018
@BSWANG
Copy link
Contributor Author

BSWANG commented Jul 25, 2018

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. kind/friction and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 25, 2018
@BSWANG BSWANG changed the title [Question]Why kube-proxy add external-lb's address to node local iptables rule? Why kube-proxy add external-lb's address to node local iptables rule? Jul 25, 2018
@Lion-Wei
Copy link

I think the reason is, traffic from in cluster have no need to go out side(lb) then come back.
And probably lb should not do TLS decoder work.

@BSWANG
Copy link
Contributor Author

BSWANG commented Jul 27, 2018

@Lion-Wei Will it be an option to config this behavior?Some functions like TLS,Monitor,Logging doing on lb are more reasonable and high performance.
If kube-proxy add this option, I would like to commit a PR to it.

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. and removed kind/friction labels Aug 23, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 22, 2018
@BSWANG
Copy link
Contributor Author

BSWANG commented Nov 28, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 28, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 26, 2019
@BSWANG
Copy link
Contributor Author

BSWANG commented Mar 4, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2019
@thockin thockin added the triage/unresolved Indicates an issue that can not or will not be resolved. label Mar 8, 2019
@jcodybaker
Copy link

We're also seeing this as an issue at DigitalOcean. It's a concern not just for load-balancer TLS termination, but also for supporting proxy protocol as encouraged in ( https://kubernetes.io/docs/tutorials/services/source-ip/ ). Traffic addressed to the LB ip from within the cluster never reaches the load-balancer, and the required proxy header isn't applied, causing a protocol violation.

The in-tree AWS service type loadbalancer supports proxy protocol and TLS termination, but because they populate status.loadbalancer.ingress.hostname rather than .ip they avoid this bug/optimization.

We're willing to put together a PR to address this there's interest from sig-network to accept it. We've considered a kube-proxy flag to disable the optimization, or the more complex option of extending v1.LoadBalancerIngress to include feedback from the cloud provider.

@bowei
Copy link
Member

bowei commented Mar 25, 2019

It seems like what is being implemented here is not modeled all that well in the current kube-proxy construct; your LB is not transparent from an L3/4 perspective and lives somewhere other than the node (it is terminating TLS, adding a proxy protocol field). It seems possible for nothing to break if we remove the external LB IPs from the node as a provider-specific setting, but it will require some thought.

@snuxoll
Copy link

snuxoll commented Apr 19, 2019

If I’m being quite honest this behavior goes against the principle of least surprise. If I want kube-proxy to route my traffic I would use a ClusterIP service or similar and configure the application to use it, if I am hitting a LoadBalancer provided by a cloud provider there is likely to be a specific reason for that and kube-proxy shouldn’t try to second guess it.

@johnl
Copy link

johnl commented Apr 23, 2019

We hit this same problem at Brightbox with our cloud controller manager. Our Load Balancers can terminate SSL and support the PROXY protocol. Confirmed with both ipvs and iptables kube-proxy modes. Any IPs we provide as LoadBalancerIngress are used to redirect outgoing connections and keep them within the cluster, breaking things.

To put it more clearly (imo), the problem is that kube-proxy assumes all load balancer services are straight TCP/UDP and that it can optimize them internally but that is not the case.

So either the assumption is wrong and kube-proxy needs fixing, or the assumption is right and external load balancers should not be doing these fancy things. I obviously think kube-proxy is wrong here :)

As @jcodybaker says above, AWS avoids this problem simply but not listing IP addresses as LoadBalancerIngress, only hostnames, which kube-proxy doesn't resolve.

We've changed our cloud controller to do the same, listing only hostnames in there, and it's fixed our problem too.

This feels a bit like a hack, but maybe it could be argued that's the way for a cloud controller to enable/disable the optimisation - a bit misguiding though.

We were also exposing the IPv6 addresses of our load balancers too, which actually didn't look properly supported further down the stack at all, so things do seem to be a bit of a mess here generally.

So a kube-proxy flag to disable the optimization would be a great start, but I think this all needs a closer look!

Plus, it's worth noting that in ipvs mode, the IP address is actually added locally to each node so it's not possible to selectively support this per-port. As soon as a load balancer lists the IP, it's completely unreachable directly.

@freehan freehan added kind/feature Categorizes issue or PR as related to a new feature. and removed triage/unresolved Indicates an issue that can not or will not be resolved. labels May 2, 2019
@whereisaaron
Copy link

If I create an external loadbalancer I don't expect kube-proxy to hijack traffic addressed to that external loadbalancer.

That completely disregards any functionality the load balancer provides, including TLS termination, proxy protocol, logging, metrics, DPI firewalling, and... the actual the load balancing algorithm!

I'm perplexed that this is default behavior? It only seems useful if your external load balancer doesn't actually do anything useful 😄

As @jcodybaker says above, AWS avoids this problem simply but not listing IP addresses as LoadBalancerIngress, only hostnames, which kube-proxy doesn't resolve. I'd like to be able to lock this out on kube-proxy in general for cluster deployments. This behavior is only going to break things.

Thanks for raising this @BSWANG. This submerged iceberg could have spoiled my day one day, but now I know to watch out for this 👀

@k8s-ci-robot k8s-ci-robot reopened this Apr 28, 2021
@k8s-ci-robot
Copy link
Contributor

@BSWANG: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 28, 2021
@BSWANG
Copy link
Contributor Author

BSWANG commented Apr 28, 2021

reopen due this pr #96454 reverted

@aojea
Copy link
Member

aojea commented Apr 30, 2021

reopen due this pr #96454 reverted

the revert was done because that PR merged accidentally, to avoid misunderstandings, is not that the project doesn't want to implement, is that there are some process to add new features and that PR didn't follow them.
The KEP to implement the change has been approved https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/1860-kube-proxy-IP-node-binding

@Sh4d1 are you still working on this #97681? if you want to include this in 1.22, the process has changed, please see https://groups.google.com/g/kubernetes-sig-network/c/3NYVEgvE1Uc/m/o68JAIt0DAAJ for more details

@Sh4d1
Copy link
Member

Sh4d1 commented Apr 30, 2021

@aojea yep! Layed the groundwork on #101027 but I was waiting on Tim's review, tough I understood he's been a bit busy! ( You're welcome if you want to review!).
Once this one is merged, I can rebase #97681 and it should be good.

Thanks, I'll read that 😄

Miniland1333 added a commit to kreut/libretext that referenced this issue May 2, 2021
Bugfix for OAuth redirect_url not being http. This is due to an upstream proxy issue with kubernetes/kubernetes#66607
@ryanlelek
Copy link

Commenting for support/+1 that self-referencing the cluster from within would be useful

@mkreidenweis-schulmngr 's hair-pin solution (kind of) worked but didn't pick up the correct name of the pod(s) to reroute to and would have worked with further digging.
Setting up a separate cluster solved the issue in the meantime.

Thanks for your work on this

@thockin
Copy link
Member

thockin commented Jul 8, 2021

#101027 was not assigned to me, so I missed it.

@thockin
Copy link
Member

thockin commented Aug 5, 2021

I'm going to close this as a dup, since we do have the KEP open

@thockin thockin closed this as completed Aug 5, 2021
@liudongbo123
Copy link

how is the progress ?

@dcardellino
Copy link

Is there any solution for this?

@thockin
Copy link
Member

thockin commented Oct 25, 2022 via email

@timoreimann
Copy link
Contributor

Hey @thockin 👋

I'd be happy to help pushing this over the finishing line. Is there a place to refresh my memory on where we stand right now and what remains to be done?

I vaguely remember that time was spent on a KEP but don't recall the details.

Appreciate any starting pointers.

@Sh4d1
Copy link
Member

Sh4d1 commented Nov 2, 2022

@timoreimann 👋 you can read kubernetes/enhancements#1860 (comment)
and Tim opened #106242 instead of #101027 .

So if nothing really changed, my last message still stands I guess!

hrak added a commit to hrak/cloudstack-kubernetes-provider that referenced this issue May 5, 2023
This fixes problems with kube-proxy in ipvs mode considering the lb IP as local to the node
See kubernetes/kubernetes#66607
This can also be used to access PROXY proto service from the inside
hrak added a commit to Leaseweb/cloudstack-kubernetes-provider that referenced this issue Mar 12, 2024
This fixes problems with kube-proxy in ipvs mode considering the lb IP as local to the node
See kubernetes/kubernetes#66607
This can also be used to access PROXY proto service from the inside
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet