Networking: external LB in 1.4.0-beta.8 blackholes external ip traffic #33081

kdima · 2016-09-20T11:08:23Z

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"4+", GitVersion:"v1.4.0-beta.8", GitCommit:"3040f87c570a772ce94349b379f41f329494a4f7", GitTreeState:"clean", BuildDate:"2016-09-18T21:06:37Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4+", GitVersion:"v1.4.0-beta.8", GitCommit:"3040f87c570a772ce94349b379f41f329494a4f7", GitTreeState:"clean", BuildDate:"2016-09-18T21:00:36Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: GCE+CoreOS+custom k8s
OS (e.g. from /etc/os-release): CoreOS
Kernel (e.g. uname -a): 4.6.3-coreos
Install tools:
Others:

What happened:
Could not access service via external LB from a kubernetes pod on the node that does not host a pod for that service.

What you expected to happen:
Traffic should have gone to the external LB which would pick the node that is running a pod for that service.

TL;DR External load balancer kube-proxy iptable rules should only match on traffic originating from outside the cluster/pod networks i.e. from the external load balancer.
Otherwise traffic originating from within k8s cluster to external load balanced ip addresses can get blackholed.

We are having an other issue with OnlyLocal.
I have a service with the following definition

{
   "apiVersion": "v1",
   "kind": "Service",
   "metadata": {
      "annotations": {
         "service.alpha.kubernetes.io/external-traffic": "OnlyLocal"
      },
      "labels": {
         "k8s-app": "my-app",
         "kubernetes.io/cluster-service": "true"
      },
      "name": "my-app",
      "namespace": "default"
   },
   "spec": {
      "loadBalancerIP": "1.2.3.4",
      "ports": [
         {
            "name": "http",
            "port": 80,
            "targetPort": 80
         }
      ],
      "selector": {
         "k8s-app": "my-app"
      },
      "type": "LoadBalancer"
   }
}

If I run

curl http://1.2.3.4/test

From my local box that is outside of the k8s cluster I get a response.

When I run the same command from a k8s node that is not the node that is hosting the pod for my-app I get no response.

curl http://1.2.3.4/test -v
*   Trying 1.2.3.4

Checking the iptables rules. Some stuff truncated:

-A KUBE-SERVICES -d 1.2.3.4/32 -p tcp -m comment --comment "default/my-app:http loadbalancer IP" -m tcp --dport 80 -j KUBE-FW-ABZR2FBH2NLOJM2K
-A KUBE-FW-ABZR2FBH2NLOJM2K -m comment --comment "default/my-app:http loadbalancer IP" -j KUBE-XLB-ABZR2FBH2NLOJM2K
-A KUBE-XLB-ABZR2FBH2NLOJM2K -m comment --comment "default/my-app:http has no local endpoints" -j KUBE-MARK-DROP

So it looks like the traffic gets dropped by the kube-proxy.
The KUBE-SERVICES seems to match packets that originate from pods on this machine and not only external traffic. This means that if a pod on this machine wants to call a service through the external LB
(e.g. when using external DNS registrations of services) the traffic will be blackholed if the given node does not have a pod in question.
Proposed solution would be to augment the KUBE-SERVICES rule to filter out traffic originating from clusters/pods private network.

This is related to #29409
@girishkalele

The text was updated successfully, but these errors were encountered:

thockin · 2016-09-22T17:01:19Z

Yeah, this is catching local traffic. The problem is that even if we just omit those rules entirely, the VM is configured by GCE to receive the LB VIP, so it still won't do what you expect (which is to egress the network, hit the LB, and come back). We don't want to make this rule do the old-style DNAT+SNAT because the service is expecting to get valid client IPs.

Maybe we can do this case as if it hit the normal service VIP path?

bprashanth · 2016-09-22T17:29:22Z

meaning, we effectively install a service with 2 clustersIPs, one with the 10-dot and another with the public-lb-vip. Currently the public lb is not much smarter than clusterIP for services.type=lb, so I don't see any immediate benefit in bouncing out?

thockin · 2016-09-22T17:43:09Z

The advantage of bouncing out is that it is what people expect to happen. But I don't think it is possible to circumvent the local interface having that IP...

Doing it as a second entry to the "normal" VIP logic seems easier.

girishkalele · 2016-09-22T18:33:00Z

Would a rule that matches for input interface == cbr0 and sends it to the clusterIP load balancing chain work ? We just need to inhibit the SNAT but do the DNAT across all service endpoints.

For a service clusterIP=10.0.240.83 and external IP=104.198.18.46, we get the following three rules in the KUBE-SERVICES chain. We need to split rule#3 into two rules, one which matches input-if==cbr0 and sends to KUBE-SVC- and one which is != cbr0 and sends to the KUBE-FW- chain.

-A KUBE-SERVICES ! -s 10.180.0.0/14 -d 10.0.240.83/32 -p udp -m comment --comment "default/distributor:distributor cluster IP" -m udp --dport 10001 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.0.240.83/32 -p udp -m comment --comment "default/distributor:distributor cluster IP" -m udp --dport 10001 -j KUBE-SVC-UUSCPQ5WCIPX4DAV
-A KUBE-SERVICES -d 104.198.18.46/32 -p udp -m comment --comment "default/distributor:distributor loadbalancer IP" -m udp --dport 10001 -j KUBE-FW-UUSCPQ5WCIPX4DAV

mwitkow · 2016-09-23T07:35:22Z

GCE L3 load balancer, it's not technically an interface IP. The L3 load balancer IP is configured on the receiving machines through something we used to call the route 66 (a marvelous hack, see coreos/bugs#1195 for a bug with context).
We could, technically remove it and thus let the traffic escape. Or maybe there is some iptables magic that you can do to force a route to the outside?

We'd prefer to not "short-circuit" it into our cluster (using a private IP of the caller and not going to the load balancer): a) we're using DNS for draining/undraining at the moment b) we'd rather not see "external" traffic come with internal IP ranges

thockin · 2016-09-23T16:14:25Z

Interesting note on requirements. I have not found a way to force it to
route outside and back in through the LB, but that doesn't mean there isn't
one.

To get rid of route66, we'd need to not run the standard GCE address
manager stuff, and I am nervous about what other implications that would
have.

On Fri, Sep 23, 2016 at 12:36 AM, Michal Witkowski <notifications@github.com

wrote:

GCE L3 load balancer, it's not technically an interface IP. The L3 load
balancer IP is configured on the receiving machines through something we
used to call the route 66 (a marvelous hack, see coreos/bugs#1195
coreos/bugs#1195 for a bug with context).
We could, technically remove it and thus let the traffic escape. Or maybe
there is some iptables magic that you can do to force a route to the
outside?

We'd prefer to not "short-circuit" it into our cluster (using a private IP
of the caller and not going to the load balancer): a) we're using DNS for
draining/undraining at the moment b) we'd rather not see "external" traffic
come with internal IP ranges

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#33081 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVGH87B5SK9N7J4XiFmlMvRHqOykxks5qs4FrgaJpZM4KBf6b
.

bprashanth · 2016-09-23T17:24:11Z

Hmm, I think you can set a fwmark on packets with internal source and public vip then feed them through a routing table at a higher priority than local to skip the local redirection. I tried this experiment at one point, but I can't remember the outcome. I think it mostly worked but had some fishy behavior that i didn't dig through.

Automatic merge from submit-queue OnlyLocal nodeports 90% unittests. Code changes: * Jump to XLB from nodePorts for OnlyLocal nodeports * Jump to services chain from XLB for clusterCIDR (partially fixes #33081) NodePorts still don't get firewalls: #33586

k8s-github-robot added area/kubectl team/cluster labels Sep 20, 2016

This was referenced Sep 22, 2016

GCP Cloud Provider: Source IP preservation for Virtual IPs kubernetes/enhancements#27

Closed

Iptables forwarding routes for ClusterIP services #27161

Closed

thockin assigned bprashanth and matchstick Sep 22, 2016

bprashanth mentioned this issue Sep 27, 2016

OnlyLocal nodeports #33587

Merged

k8s-github-robot closed this as completed in #33587 Oct 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Networking: external LB in 1.4.0-beta.8 blackholes external ip traffic #33081

Networking: external LB in 1.4.0-beta.8 blackholes external ip traffic #33081

kdima commented Sep 20, 2016

thockin commented Sep 22, 2016

bprashanth commented Sep 22, 2016

thockin commented Sep 22, 2016

girishkalele commented Sep 22, 2016

mwitkow commented Sep 23, 2016

thockin commented Sep 23, 2016

bprashanth commented Sep 23, 2016

Networking: external LB in 1.4.0-beta.8 blackholes external ip traffic #33081

Networking: external LB in 1.4.0-beta.8 blackholes external ip traffic #33081

Comments

kdima commented Sep 20, 2016

thockin commented Sep 22, 2016

bprashanth commented Sep 22, 2016

thockin commented Sep 22, 2016

girishkalele commented Sep 22, 2016

mwitkow commented Sep 23, 2016

thockin commented Sep 23, 2016

bprashanth commented Sep 23, 2016