Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Networking: external LB in 1.4.0-beta.8 blackholes external ip traffic #33081

Closed
kdima opened this issue Sep 20, 2016 · 7 comments · Fixed by #33587
Closed

Networking: external LB in 1.4.0-beta.8 blackholes external ip traffic #33081

kdima opened this issue Sep 20, 2016 · 7 comments · Fixed by #33587
Assignees

Comments

@kdima
Copy link

kdima commented Sep 20, 2016

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):


Is this a BUG REPORT or FEATURE REQUEST? (choose one):

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"4+", GitVersion:"v1.4.0-beta.8", GitCommit:"3040f87c570a772ce94349b379f41f329494a4f7", GitTreeState:"clean", BuildDate:"2016-09-18T21:06:37Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4+", GitVersion:"v1.4.0-beta.8", GitCommit:"3040f87c570a772ce94349b379f41f329494a4f7", GitTreeState:"clean", BuildDate:"2016-09-18T21:00:36Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: GCE+CoreOS+custom k8s
  • OS (e.g. from /etc/os-release): CoreOS
  • Kernel (e.g. uname -a): 4.6.3-coreos
  • Install tools:
  • Others:

What happened:
Could not access service via external LB from a kubernetes pod on the node that does not host a pod for that service.

What you expected to happen:
Traffic should have gone to the external LB which would pick the node that is running a pod for that service.

TL;DR External load balancer kube-proxy iptable rules should only match on traffic originating from outside the cluster/pod networks i.e. from the external load balancer.
Otherwise traffic originating from within k8s cluster to external load balanced ip addresses can get blackholed.

We are having an other issue with OnlyLocal.
I have a service with the following definition

{
   "apiVersion": "v1",
   "kind": "Service",
   "metadata": {
      "annotations": {
         "service.alpha.kubernetes.io/external-traffic": "OnlyLocal"
      },
      "labels": {
         "k8s-app": "my-app",
         "kubernetes.io/cluster-service": "true"
      },
      "name": "my-app",
      "namespace": "default"
   },
   "spec": {
      "loadBalancerIP": "1.2.3.4",
      "ports": [
         {
            "name": "http",
            "port": 80,
            "targetPort": 80
         }
      ],
      "selector": {
         "k8s-app": "my-app"
      },
      "type": "LoadBalancer"
   }
}

If I run

curl http://1.2.3.4/test 

From my local box that is outside of the k8s cluster I get a response.

When I run the same command from a k8s node that is not the node that is hosting the pod for my-app I get no response.

curl http://1.2.3.4/test -v
*   Trying 1.2.3.4

Checking the iptables rules. Some stuff truncated:

-A KUBE-SERVICES -d 1.2.3.4/32 -p tcp -m comment --comment "default/my-app:http loadbalancer IP" -m tcp --dport 80 -j KUBE-FW-ABZR2FBH2NLOJM2K
-A KUBE-FW-ABZR2FBH2NLOJM2K -m comment --comment "default/my-app:http loadbalancer IP" -j KUBE-XLB-ABZR2FBH2NLOJM2K
-A KUBE-XLB-ABZR2FBH2NLOJM2K -m comment --comment "default/my-app:http has no local endpoints" -j KUBE-MARK-DROP

So it looks like the traffic gets dropped by the kube-proxy.
The KUBE-SERVICES seems to match packets that originate from pods on this machine and not only external traffic. This means that if a pod on this machine wants to call a service through the external LB
(e.g. when using external DNS registrations of services) the traffic will be blackholed if the given node does not have a pod in question.
Proposed solution would be to augment the KUBE-SERVICES rule to filter out traffic originating from clusters/pods private network.

This is related to #29409
@girishkalele

@thockin
Copy link
Member

thockin commented Sep 22, 2016

Yeah, this is catching local traffic. The problem is that even if we just omit those rules entirely, the VM is configured by GCE to receive the LB VIP, so it still won't do what you expect (which is to egress the network, hit the LB, and come back). We don't want to make this rule do the old-style DNAT+SNAT because the service is expecting to get valid client IPs.

Maybe we can do this case as if it hit the normal service VIP path?

@bprashanth
Copy link
Contributor

meaning, we effectively install a service with 2 clustersIPs, one with the 10-dot and another with the public-lb-vip. Currently the public lb is not much smarter than clusterIP for services.type=lb, so I don't see any immediate benefit in bouncing out?

@thockin
Copy link
Member

thockin commented Sep 22, 2016

The advantage of bouncing out is that it is what people expect to happen. But I don't think it is possible to circumvent the local interface having that IP...

Doing it as a second entry to the "normal" VIP logic seems easier.

@girishkalele
Copy link

Would a rule that matches for input interface == cbr0 and sends it to the clusterIP load balancing chain work ? We just need to inhibit the SNAT but do the DNAT across all service endpoints.

For a service clusterIP=10.0.240.83 and external IP=104.198.18.46, we get the following three rules in the KUBE-SERVICES chain. We need to split rule#3 into two rules, one which matches input-if==cbr0 and sends to KUBE-SVC- and one which is != cbr0 and sends to the KUBE-FW- chain.

-A KUBE-SERVICES ! -s 10.180.0.0/14 -d 10.0.240.83/32 -p udp -m comment --comment "default/distributor:distributor cluster IP" -m udp --dport 10001 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.0.240.83/32 -p udp -m comment --comment "default/distributor:distributor cluster IP" -m udp --dport 10001 -j KUBE-SVC-UUSCPQ5WCIPX4DAV
-A KUBE-SERVICES -d 104.198.18.46/32 -p udp -m comment --comment "default/distributor:distributor loadbalancer IP" -m udp --dport 10001 -j KUBE-FW-UUSCPQ5WCIPX4DAV

@mwitkow
Copy link

mwitkow commented Sep 23, 2016

GCE L3 load balancer, it's not technically an interface IP. The L3 load balancer IP is configured on the receiving machines through something we used to call the route 66 (a marvelous hack, see coreos/bugs#1195 for a bug with context).
We could, technically remove it and thus let the traffic escape. Or maybe there is some iptables magic that you can do to force a route to the outside?

We'd prefer to not "short-circuit" it into our cluster (using a private IP of the caller and not going to the load balancer): a) we're using DNS for draining/undraining at the moment b) we'd rather not see "external" traffic come with internal IP ranges

@thockin
Copy link
Member

thockin commented Sep 23, 2016

Interesting note on requirements. I have not found a way to force it to
route outside and back in through the LB, but that doesn't mean there isn't
one.

To get rid of route66, we'd need to not run the standard GCE address
manager stuff, and I am nervous about what other implications that would
have.

On Fri, Sep 23, 2016 at 12:36 AM, Michal Witkowski <notifications@github.com

wrote:

GCE L3 load balancer, it's not technically an interface IP. The L3 load
balancer IP is configured on the receiving machines through something we
used to call the route 66 (a marvelous hack, see coreos/bugs#1195
coreos/bugs#1195 for a bug with context).
We could, technically remove it and thus let the traffic escape. Or maybe
there is some iptables magic that you can do to force a route to the
outside?

We'd prefer to not "short-circuit" it into our cluster (using a private IP
of the caller and not going to the load balancer): a) we're using DNS for
draining/undraining at the moment b) we'd rather not see "external" traffic
come with internal IP ranges


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#33081 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVGH87B5SK9N7J4XiFmlMvRHqOykxks5qs4FrgaJpZM4KBf6b
.

@bprashanth
Copy link
Contributor

Hmm, I think you can set a fwmark on packets with internal source and public vip then feed them through a routing table at a higher priority than local to skip the local redirection. I tried this experiment at one point, but I can't remember the outcome. I think it mostly worked but had some fishy behavior that i didn't dig through.

k8s-github-robot pushed a commit that referenced this issue Oct 1, 2016
Automatic merge from submit-queue

OnlyLocal nodeports

90% unittests.
Code changes: 
* Jump to XLB from nodePorts for OnlyLocal nodeports
* Jump to services chain from XLB for clusterCIDR (partially fixes #33081)

NodePorts still don't get firewalls: #33586
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants