Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iptables proxier: route local traffic to LB IPs to service chain #77523

Merged
merged 2 commits into from May 31, 2019

Conversation

@andrewsykim
Copy link
Member

@andrewsykim andrewsykim commented May 6, 2019

Signed-off-by: Andrew Sy Kim kiman@vmware.com

What type of PR is this?
/kind bug

What this PR does / why we need it:
For any traffic to an LB IP that originates from the local node, re-route that traffic to the Kubernetes service chain. This allows traffic to an external LB from inside a cluster reachable. The implication of this is that internal traffic to an LB IP will need to go through SNAT. This is likely okay since source IP preservation with externalTrafficPolicy=Local only applies for external traffic anyways. The fix was spelled out in more detail by Tim here #65387.

I think the correct behavior is to actually route the traffic to the LB instead of intercepting it with iptables but with the current set of rules I'm not sure this is possible. We also already have rules that route pods in the cluster cidr that want to reach LB IPs to the service chain:

-A KUBE-XLB-ECF5TUORC5E2ZCRD -s 10.8.0.0/14 -m comment --comment "Redirect pods trying to reach external loadbalancer VIP to clusterIP" -j KUBE-SVC-ECF5TUORC5E2ZCRD

Allowing traffic with --src-type LOCAL to do the same makes sense to me

Which issue(s) this PR fixes:
Fixes #65387 #66607

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

iptables proxier: route local traffic to LB IPs to service chain
@andrewsykim
Copy link
Member Author

@andrewsykim andrewsykim commented May 6, 2019

@m1093782566 @Lion-Wei any ideas if we need this for IPVS proxier?

@andrewsykim
Copy link
Member Author

@andrewsykim andrewsykim commented May 7, 2019

/priority important-soon

andrewsykim added 2 commits May 7, 2019
Signed-off-by: Andrew Sy Kim <kiman@vmware.com>
Signed-off-by: Andrew Sy Kim <kiman@vmware.com>
@andrewsykim andrewsykim force-pushed the fix-xlb-from-local branch from 0cf0983 to 8dfd4de May 7, 2019
@andrewsykim
Copy link
Member Author

@andrewsykim andrewsykim commented May 7, 2019

Validated and tested this on a Kind cluster with metallb (thanks @mauilion!) and on GKE by applying the rules manually. @jcodybaker can you test this on DOKS please (re: #66607)?

@jcodybaker
Copy link

@jcodybaker jcodybaker commented May 8, 2019

@andrewsykim - I've tested this and unfortunately it doesn't seemed to have changed the behavior seen in #66607. I've left my test cluster up, and can provide any debug that's helpful--just message on slack. I've started debugging by looking through the iptables counter. I didn't get the full picture tonight, but it's does seem to be taking a different path than we were seeing before.

@andrewsykim
Copy link
Member Author

@andrewsykim andrewsykim commented May 8, 2019

@jcodybaker can you share the output for sudo iptables-save please?

@mauilion
Copy link

@mauilion mauilion commented May 8, 2019

@andrewsykim
In my testing I was able to reproduce this case:
Bring up a 3 node cluster and schedule pods on 2 of them.
Define a service of type loadbalancer and ensure that the loadbalancer provides an ip address
modify the service and set externalTrafficPolicy to Local
Bring up a pod with hostNetwork: true on the node where there are no endpoints.
before your change I am not able to connect to the service.
after your change I can.

@kinolaev
Copy link

@kinolaev kinolaev commented May 29, 2019

Shouldn't the same policy apply for nodes then?

For externalTrafficPolicy=Local we can remove FW chain on nodes that don’t have local endpoints to send all traffic to LB. This solution looks more consistent, but requires network access from node to LB for traffic from pods and has no benefit (source IP will be lost). I don’t think that we need to choose this solution just for consistency.

@andrewsykim
Copy link
Member Author

@andrewsykim andrewsykim commented May 29, 2019

So I don’t think that we need to choose this solution just for consistency.

Agreed that it shouldn't just be for consistency sake. The question I have boils down to: if we know the destination IPs for a given external LB IP, do we route back to external IP or route directly to the backend? We route to directly to service backend for the pod cidr case, wondering if there was a valid reason for that which also applies to nodes. Need to test all the scenarios first.

@kinolaev
Copy link

@kinolaev kinolaev commented May 29, 2019

I think routing directly to backend is good solution (and personally I prefer it). But if maintainers will decide that source IP must be preserved they will already have a solution for this.

Please see my comment above about moving routing to SVC after check that node doesn't have local endpoints, because otherwise traffic must be routed locally. Do you agree?

@andrewsykim
Copy link
Member Author

@andrewsykim andrewsykim commented May 29, 2019

Please see my comment above about moving routing to SVC after check that node doesn't have local endpoints, because otherwise traffic must be routed locally. Do you agree?

I'm not sure, need to put more thought into this because the existing local endpoint logic only applies for external traffic coming into the cluster (hence externalTrafficPolicy=Local). Preferring local endpoint if they exist and falling back to service chain sounds right but I don't think it's that simple, we are possibly changing the expectations for internal traffic in ways that can be unexpected. This would be addressed by on-going work for service topology which I would say is out-of-scope for what we're trying to fix here. I'm not the authoritative source on this though :)

@kinolaev
Copy link

@kinolaev kinolaev commented May 30, 2019

I see now the root of our discussion: we have different borders of "internal") For me "internal" means pods and services but you also include nodes to "internal". So for me nodes' outgoing traffic is "external" traffic and must be routed to local endpoints or load balancer. But if nodes' outgoing traffic is "internal" traffic then your solution is most logical.
@thockin, waiting for you)

@andrewsykim
Copy link
Member Author

@andrewsykim andrewsykim commented May 30, 2019

Err sorry, forgot to put a milestone on this one. Putting the milestone label so it's on the radar at least. It's a bit last minute so feel free to bump into next release (or we can cherry-pick it later)

/milestone v1.15

@k8s-ci-robot k8s-ci-robot added this to the v1.15 milestone May 30, 2019
Copy link
Member

@thockin thockin left a comment

Thanks. I'll approve now, but can you please send a minor fixup?

writeLine(proxier.natRules, append(args,
"-m", "comment", "--comment", fmt.Sprintf(`"route LOCAL traffic for %s LB IP to service chain"`, svcNameString),
"-m", "addrtype", "--src-type", "LOCAL", "-j", string(svcChain))...)

// First rule in the chain redirects all pod -> external VIP traffic to the
Copy link
Member

@thockin thockin May 31, 2019

Minor point - the comment here says "first rule", but you broke that :)

Can you put the new block after this block, and word the comments in similar fashion?

Copy link
Member Author

@andrewsykim andrewsykim May 31, 2019

Will do, thanks Tim!

@thockin
Copy link
Member

@thockin thockin commented May 31, 2019

Thanks!

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented May 31, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@snuxoll
Copy link

@snuxoll snuxoll commented Jun 20, 2019

@andrewsykim @kinolaev It's not just preserving the source IP that is the issue, the current behavior for routing traffic to LoadBalancer services within the cluster makes a false assumption that the LoadBalancer is always transparent. In the case of DigitalOcean, as an example, this is not the case - the LoadBalancer supports the PROXY protocol as well as TLS termination - sending traffic directly to a pod will often result in unexpected behavior as a result.

This is only an issue when LoadBalancer's provide an IP address in the status spec since k8s does not attempt to magic the LoadBalancer away when a hostname is provided instead, as is the case with ELB's from AWS, for example. If your cloud provider supports network transparent load balancing than I don't think the optimization needs to be thrown away, but cloud controllers should have a way to indicate to the kublet that a LoadBalancer service is not transparent and shouldn't be routed internally.

adolli
Copy link

adolli commented on b926fb9 Jul 7, 2019

link ba19451

@kevin-wangzefeng
Copy link
Member

@kevin-wangzefeng kevin-wangzefeng commented Aug 28, 2019

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Aug 28, 2019

@kevin-wangzefeng: GitHub didn't allow me to request PR reviews from the following users: RainbowMango.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @RainbowMango

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ironcladlou added a commit to ironcladlou/origin that referenced this issue Aug 28, 2019
Kubernetes does not currently support routing packets from the host network
interface to LoadBalancer Service external IPs. Although such routing tends to
work on AWS with our current topology, it only works by coincidence. On other
platforms like Azure and GCP, packets will not route to the load balancer except
when the node is part of the load balancer target pool. At best, the level of
support is undefined and all behavior in this context is considered
unintentional upstream.

This means that connectivity Routes on cloud platorms is only supported from
public and container network clients.

Refactor the e2e tests to exercise Route connectivity only through the container
network. This eliminates a class of test flake whereby tests pass when the test
pod (e.g. running curl against a Route) lands on a node hosting an
ingresscontroller but fails if the pod happens to land on a node that's not
hosting an ingresscontroller (which means the node is not part of the LB target
pool).

kubernetes/kubernetes#77523
kubernetes/kubernetes#65387
ironcladlou added a commit to ironcladlou/origin that referenced this issue Aug 28, 2019
Kubernetes does not currently support routing packets from the host network
interface to LoadBalancer Service external IPs. Although such routing tends to
work on AWS with our current topology, it only works by coincidence. On other
platforms like Azure and GCP, packets will not route to the load balancer except
when the node is part of the load balancer target pool. At best, the level of
support is undefined and all behavior in this context is considered
unintentional by upstream.

The practical implication is that connectivity to Routes on cloud platorms is
only supported from public and container network clients.

Refactor the e2e tests to exercise Route connectivity only through the container
network. This eliminates a class of test flake whereby tests pass when the test
pod (e.g. running curl against a Route) lands on a node hosting an
ingresscontroller but fails if the pod happens to land on a node that's not
hosting an ingresscontroller (which means the node is not part of the LB target
pool).

kubernetes/kubernetes#77523
kubernetes/kubernetes#65387
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment