Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod to external traffic is not masqueraded #40761

Closed
ravishivt opened this issue Jan 31, 2017 · 6 comments

Comments

Projects
None yet
6 participants
@ravishivt
Copy link

commented Jan 31, 2017

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version): 1.5.1

Environment:

  • Cloud provider or hardware configuration: 3x Ubuntu 16.04 nodes running in VMware vSphere
  • OS (e.g. from /etc/os-release): Ubuntu 16.04
  • Kernel (e.g. uname -a): 4.4.0-57-generic
  • Install tools: kubeadm
  • Others: CNI: Flannel

What happened:

I have a bare-metal cluster with Flannel CNI. I originally was having issues with pods reaching other cluster services, e.g. kubernetes-dashboard trying to reach kube-apiserver. The root cause was filed in #39823, where traffic originating from a pod on the flannel virtual interface destined outside that range was getting masqueraded but then dropped by the iptables FORWARD chain. My workaround was to set the default FORWARD policy to ACCEPT.

Now I'm hitting a related issue. A pod on a flannel virtual interface can't reach any external resource except the specific services kube-proxy is masquerading with --masquerade-all. Running nslookup google.com or ping 8.8.8.8 both timeout. My understanding of --masquerade-all is that it adds -j KUBE-MARK-MASQ to all kubernetes services. This works for pods trying to reach k8s services but what about traffic headed elsewhere. Without also masquerading that traffic, the switch will not know how to route the response back to the node/pod. I manually worked around the issue by adding iptables -t nat -A POSTROUTING -s 10.96.0.0/16 -o ens160 -j MASQUERADE where ens160 is my primary network interface and 10.96.0.0/16 is my flannel overlay network. I then removed the unnecessary --masquerade-all flag in kube-proxy.

My test pod:

$ kubectl get pods busybox
NAME      READY     STATUS    RESTARTS   AGE
busybox   1/1       Running   3          22h
deploy@ravi-kube196:~$ kubectl exec -it busybox -- sh
/ # ping 8.8.8.8

Without iptables masquerade rule, note that the source IP is the pod's flannel IP, 10.96.2.50.

tcpdump -n -tttt -i ens160 icmp
$ sudo tcpdump -n -tttt -i ens160 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes
2017-01-31 21:54:08.145366 IP 10.96.2.50 > 8.8.8.8: ICMP echo request, id 9728, seq 0, length 64
2017-01-31 21:54:09.145593 IP 10.96.2.50 > 8.8.8.8: ICMP echo request, id 9728, seq 1, length 64
2017-01-31 21:54:10.145739 IP 10.96.2.50 > 8.8.8.8: ICMP echo request, id 9728, seq 2, length 64
2017-01-31 21:54:10.149795 IP 10.163.148.2 > 10.163.148.197: ICMP redirect 15.111.203.185 to net 10.163.148.254, length 36
2017-01-31 21:54:11.145928 IP 10.96.2.50 > 8.8.8.8: ICMP echo request, id 9728, seq 3, length 64

I then added sudo iptables -t nat -A POSTROUTING -s 10.96.0.0/16 -o ens160 -j MASQUERADE. Note that the source IP is now the node IP, 10.163.148.197.

tcpdump -n -tttt -i ens160 icmp
$ sudo tcpdump -n -tttt -i ens160 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes
2017-01-31 21:52:13.733076 IP 10.163.148.197 > 8.8.8.8: ICMP echo request, id 9216, seq 0, length 64
2017-01-31 21:52:13.735091 IP 8.8.8.8 > 10.163.148.197: ICMP echo reply, id 9216, seq 0, length 64
2017-01-31 21:52:14.733208 IP 10.163.148.197 > 8.8.8.8: ICMP echo request, id 9216, seq 1, length 64
2017-01-31 21:52:14.735207 IP 8.8.8.8 > 10.163.148.197: ICMP echo reply, id 9216, seq 1, length 64

What you expected to happen:

I'm hoping we can make this more clear to end users about either A) bare-metal cluster external switches have to be aware of CNI networks and how to route between virtual interfaces properly or B) that a general masquerade rule needs to be added to iptables if pod -> external access is desired.

How to reproduce it (as minimally and precisely as possible):

Create a cluster with:

  • Flannel CNI enabled
  • kube-proxy set to --masquerade-all
  • Run a pod on the flannel network and try to reach an external IP.

Anything else do we need to know:

Off-topic mild rant: With the combination of #39823 and this, I really wonder how people are getting bare-metal clusters working properly without extensive networking and iptables knowledge. Either I'm doing something wrong in my k8s setup (fairly standard kubeadm deploy), choosing a host OS with "bad" defaults (Ubuntu 16.04), wanting something non-standard in k8s (pod -> external access), or bare-metal support is in its infancy and needs some better hand-holding in the setup docs.

@jbeda

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2017

I think this is a dupe of #40182. But not 100% sure.

@ravishivt

This comment has been minimized.

Copy link
Author

commented Feb 2, 2017

@jbeda I don't believe this is a duplicate. #39823 and #40182 are likely duplicates but this is separate IMO. Those are for working around iptables FORWARD DROP policy and this is for kube-proxy to masquerade all pod outbound traffic so the response can be routed back to the pod.

@dcbw

This comment has been minimized.

Copy link
Member

commented Mar 20, 2017

I'm not sure that masquerading pod -> external traffic (initiated by the pod, not incoming traffic to a given service) is kube-proxy's job. That's typically the job of the network plugin since the network plugin is what knows about pod networking, sets up the IP addresses of the pods, and knows what CIDRs are internal and which are external to the cluster.

Usually you'd do this with the '--ip-masq' flag to flannel which is 'false' by default and is defined as "setup IP masquerade rule for traffic destined outside of overlay network". Which sounds like what you want.

@thockin thockin added sig/network and removed sig/network labels May 16, 2017

@caseydavenport

This comment has been minimized.

Copy link
Member

commented May 18, 2017

Another option here is the ip-masq-agent.

I agree with @dcbw though that this shouldn't be the kube-proxy's responsibility (although it does provide similar behavior in the other direction when --cluster-cidr is set).

@caseydavenport

This comment has been minimized.

Copy link
Member

commented May 18, 2017

/assign

@caseydavenport

This comment has been minimized.

Copy link
Member

commented May 26, 2017

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.