Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE: Cannot connect from container to subnet connected via VPN #6545

Closed
chaten opened this Issue Apr 7, 2015 · 52 comments

Comments

Projects
None yet
@chaten
Copy link

chaten commented Apr 7, 2015

I have a gce network connected to an aws network over a gcp vpn. I can connect from a kubernetes node to a box in aws but I can't connect from inside a container. Traffic seems to stop at the container bridge. The cluster was created using Google Container Engine(GKE).

@vishh

This comment has been minimized.

Copy link
Member

vishh commented Apr 7, 2015

@vishh vishh added the team/cluster label Apr 7, 2015

@vishh

This comment has been minimized.

Copy link
Member

vishh commented Apr 7, 2015

@roberthbailey

This comment has been minimized.

Copy link
Member

roberthbailey commented Apr 7, 2015

What version of GKE are you running?

@roberthbailey

This comment has been minimized.

Copy link
Member

roberthbailey commented Apr 7, 2015

Older GKE deployments added a tag to the cluster firewall rule (e.g. k8s--all) that only allowed traffic between the VMs in the cluster. Newer deployments don't have this restriction. If you have an older deployment, you should be able to edit the firewall rule (in the cloud console go to networks and click on your network name to see the firewall rules) and remove the tag that restricts the firewall rule to the GKE VMs.

@chaten

This comment has been minimized.

Copy link
Author

chaten commented Apr 7, 2015

This cluster was created today and is using 0.14.1.

The node VMs can talk across the VPN, the containers cannot.

@chaten

This comment has been minimized.

Copy link
Author

chaten commented Apr 8, 2015

I'd also add that we tried to run a plain docker container on the node and talk to the services in aws. This didn't work either.

@chaten

This comment has been minimized.

Copy link
Author

chaten commented Apr 8, 2015

The 10.20.0.0/16 network is connected via VPN.

chaten@k8s-dev1-f-node-1:~$ telnet 10.20.200.114 22
Trying 10.20.200.114...
Connected to 10.20.200.114.
Escape character is '^]'.
SSH-2.0-OpenSSH_6.6p1 Ubuntu-2ubuntu1
^Cquit
Connection closed by foreign host.
chaten@k8s-dev1-f-node-1:~$ ip route
default via 10.240.0.1 dev eth0 
10.24.1.0/24 dev cbr0  proto kernel  scope link  src 10.24.1.1 
10.240.0.1 dev eth0  scope link 
chaten@k8s-dev1-f-node-1:~$ sudo docker run -ti ubuntu bash
root@cae943d77939:/# telnet 10.20.200.114 22
Trying 10.20.200.114...
^C
root@cae943d77939:/# ip route
default via 10.24.1.1 dev eth0 
10.24.1.0/24 dev eth0  proto kernel  scope link  src 10.24.1.4 
@chaten

This comment has been minimized.

Copy link
Author

chaten commented Apr 8, 2015

Removing "iptables=false" and "ip-masq=false" from /etc/default/docker fixes the issue. Why are these disabled?

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Apr 8, 2015

Just to catch up, I have some questions.

What is the IP of the VM in GCE's network? 10.240.0.1 ?

What is the IP of the VM in AWS's network? 10.20.200.114 ?

What is the CIDR assigned to cbr0 on the GCE VM? 10.24.1.0/24 ?

Can you run (as root) iptables-save on your VM? I want to understand why
those docker flags would make a difference.

On Tue, Apr 7, 2015 at 5:23 PM, Michael Chaten notifications@github.com
wrote:

Removing "iptables=false" and "ip-masq=false" from /etc/default/docker
fixes the issue. Why are these disabled?


Reply to this email directly or view it on GitHub
#6545 (comment)
.

@chaten

This comment has been minimized.

Copy link
Author

chaten commented Apr 8, 2015

eth0: 10.245.70.189
cbr0: 10.24.1.1

10.20.200.114 is in AWS

I have two clusters spun up, this is from the modified cluster where I modified default/docker (dev1-f)

# Generated by iptables-save v1.4.14 on Wed Apr  8 00:32:06 2015
*filter
:INPUT ACCEPT [13209:40328411]
:FORWARD ACCEPT [1161:86778]
:OUTPUT ACCEPT [7973:1024920]
:DOCKER - [0:0]
-A FORWARD -o cbr0 -j DOCKER
-A FORWARD -o cbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i cbr0 ! -o cbr0 -j ACCEPT
-A FORWARD -i cbr0 -o cbr0 -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 8080 -j ACCEPT
COMMIT
# Completed on Wed Apr  8 00:32:06 2015
# Generated by iptables-save v1.4.14 on Wed Apr  8 00:32:06 2015
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:DOCKER - [0:0]
:KUBE-PORTALS-CONTAINER - [0:0]
:KUBE-PORTALS-HOST - [0:0]
-A PREROUTING -j KUBE-PORTALS-CONTAINER
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -j KUBE-PORTALS-HOST
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 10.24.1.0/24 ! -o cbr0 -j MASQUERADE
-A POSTROUTING ! -d 10.0.0.0/8 -o eth0 -j MASQUERADE
-A KUBE-PORTALS-CONTAINER -d 10.27.240.2/32 -p tcp -m comment --comment kubernetes -m tcp --dport 443 -j REDIRECT --to-ports 41598
-A KUBE-PORTALS-CONTAINER -d 10.27.240.1/32 -p tcp -m comment --comment kubernetes-ro -m tcp --dport 80 -j REDIRECT --to-ports 37401
-A KUBE-PORTALS-CONTAINER -d 10.27.240.10/32 -p udp -m comment --comment kube-dns -m udp --dport 53 -j REDIRECT --to-ports 44606
-A KUBE-PORTALS-HOST -d 10.27.240.2/32 -p tcp -m comment --comment kubernetes -m tcp --dport 443 -j DNAT --to-destination 10.245.70.189:41598
-A KUBE-PORTALS-HOST -d 10.27.240.1/32 -p tcp -m comment --comment kubernetes-ro -m tcp --dport 80 -j DNAT --to-destination 10.245.70.189:37401
-A KUBE-PORTALS-HOST -d 10.27.240.10/32 -p udp -m comment --comment kube-dns -m udp --dport 53 -j DNAT --to-destination 10.245.70.189:44606
COMMIT
# Completed on Wed Apr  8 00:32:06 2015

And this is from an unmodified cluster (dev1-c)

# Generated by iptables-save v1.4.14 on Wed Apr  8 00:33:28 2015
*filter
:INPUT ACCEPT [38695:92065746]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [31758:6770431]
:DOCKER - [0:0]
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 8080 -j ACCEPT
COMMIT
# Completed on Wed Apr  8 00:33:28 2015
# Generated by iptables-save v1.4.14 on Wed Apr  8 00:33:28 2015
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:KUBE-PORTALS-CONTAINER - [0:0]
:KUBE-PORTALS-HOST - [0:0]
-A PREROUTING -j KUBE-PORTALS-CONTAINER
-A OUTPUT -j KUBE-PORTALS-HOST
-A POSTROUTING ! -d 10.0.0.0/8 -o eth0 -j MASQUERADE
-A KUBE-PORTALS-CONTAINER -d 10.195.240.2/32 -p tcp -m comment --comment "default/kubernetes" -m tcp --dport 443 -j REDIRECT --to-ports 51122
-A KUBE-PORTALS-CONTAINER -d 10.195.240.1/32 -p tcp -m comment --comment "default/kubernetes-ro" -m tcp --dport 80 -j REDIRECT --to-ports 55413
-A KUBE-PORTALS-CONTAINER -d 10.195.240.10/32 -p udp -m comment --comment "default/kube-dns" -m udp --dport 53 -j REDIRECT --to-ports 60616
-A KUBE-PORTALS-HOST -d 10.195.240.2/32 -p tcp -m comment --comment "default/kubernetes" -m tcp --dport 443 -j DNAT --to-destination 10.254.32.136:51122
-A KUBE-PORTALS-HOST -d 10.195.240.1/32 -p tcp -m comment --comment "default/kubernetes-ro" -m tcp --dport 80 -j DNAT --to-destination 10.254.32.136:55413
-A KUBE-PORTALS-HOST -d 10.195.240.10/32 -p udp -m comment --comment "default/kube-dns" -m udp --dport 53 -j DNAT --to-destination 10.254.32.136:60616
COMMIT
# Completed on Wed Apr  8 00:33:28 2015
@ghost

This comment has been minimized.

Copy link

ghost commented Apr 8, 2015

curious. I have no idea why those would matter.

Willing to experiment?

One by one, run these iptables commands and see when the ping to AWS breaks
down.

iptables -t filter -D DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p
tcp -m tcp --dport 8080 -j ACCEPT
iptables -t filter -D FORWARD -o docker0 -j DOCKER
iptables -t filter -D FORWARD -o docker0 -m conntrack --ctstate
RELATED,ESTABLISHED -j ACCEPT
iptables -t filter -D FORWARD -i docker0 ! -o docker0 -j ACCEPT
iptables -t filter -D FORWARD -i docker0 -o docker0 -j ACCEPT

I expect it to still be working when you get here

iptables -t filter -D FORWARD -o cbr0 -j DOCKER
iptables -t filter -D FORWARD -o cbr0 -m conntrack --ctstate
RELATED,ESTABLISHED -j ACCEPT
iptables -t filter -D FORWARD -i cbr0 -o cbr0 -j ACCEPT
iptables -t filter -D FORWARD -i cbr0 ! -o cbr0 -j ACCEPT

On Tue, Apr 7, 2015 at 5:36 PM, Michael Chaten notifications@github.com
wrote:

eth0: 10.245.70.189
cbr0: 10.24.1.1

10.20.200.114 is in AWS

I have two clusters spun up, this is from the modified cluster where I
modified default/docker (dev1-f)

Generated by iptables-save v1.4.14 on Wed Apr 8 00:32:06 2015

*filter
:INPUT ACCEPT [13209:40328411]
:FORWARD ACCEPT [1161:86778]
:OUTPUT ACCEPT [7973:1024920]
:DOCKER - [0:0]
-A FORWARD -o cbr0 -j DOCKER
-A FORWARD -o cbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i cbr0 ! -o cbr0 -j ACCEPT
-A FORWARD -i cbr0 -o cbr0 -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 8080 -j ACCEPT
COMMIT

Completed on Wed Apr 8 00:32:06 2015

Generated by iptables-save v1.4.14 on Wed Apr 8 00:32:06 2015

*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:DOCKER - [0:0]
:KUBE-PORTALS-CONTAINER - [0:0]
:KUBE-PORTALS-HOST - [0:0]
-A PREROUTING -j KUBE-PORTALS-CONTAINER
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -j KUBE-PORTALS-HOST
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 10.24.1.0/24 ! -o cbr0 -j MASQUERADE
-A POSTROUTING ! -d 10.0.0.0/8 -o eth0 -j MASQUERADE
-A KUBE-PORTALS-CONTAINER -d 10.27.240.2/32 -p tcp -m comment --comment kubernetes -m tcp --dport 443 -j REDIRECT --to-ports 41598
-A KUBE-PORTALS-CONTAINER -d 10.27.240.1/32 -p tcp -m comment --comment kubernetes-ro -m tcp --dport 80 -j REDIRECT --to-ports 37401
-A KUBE-PORTALS-CONTAINER -d 10.27.240.10/32 -p udp -m comment --comment kube-dns -m udp --dport 53 -j REDIRECT --to-ports 44606
-A KUBE-PORTALS-HOST -d 10.27.240.2/32 -p tcp -m comment --comment kubernetes -m tcp --dport 443 -j DNAT --to-destination 10.245.70.189:41598
-A KUBE-PORTALS-HOST -d 10.27.240.1/32 -p tcp -m comment --comment kubernetes-ro -m tcp --dport 80 -j DNAT --to-destination 10.245.70.189:37401
-A KUBE-PORTALS-HOST -d 10.27.240.10/32 -p udp -m comment --comment kube-dns -m udp --dport 53 -j DNAT --to-destination 10.245.70.189:44606
COMMIT

Completed on Wed Apr 8 00:32:06 2015

And this is from an unmodified cluster (dev1-c)

Generated by iptables-save v1.4.14 on Wed Apr 8 00:33:28 2015

*filter
:INPUT ACCEPT [38695:92065746]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [31758:6770431]
:DOCKER - [0:0]
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 8080 -j ACCEPT
COMMIT

Completed on Wed Apr 8 00:33:28 2015

Generated by iptables-save v1.4.14 on Wed Apr 8 00:33:28 2015

*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:KUBE-PORTALS-CONTAINER - [0:0]
:KUBE-PORTALS-HOST - [0:0]
-A PREROUTING -j KUBE-PORTALS-CONTAINER
-A OUTPUT -j KUBE-PORTALS-HOST
-A POSTROUTING ! -d 10.0.0.0/8 -o eth0 -j MASQUERADE
-A KUBE-PORTALS-CONTAINER -d 10.195.240.2/32 -p tcp -m comment --comment "default/kubernetes" -m tcp --dport 443 -j REDIRECT --to-ports 51122
-A KUBE-PORTALS-CONTAINER -d 10.195.240.1/32 -p tcp -m comment --comment "default/kubernetes-ro" -m tcp --dport 80 -j REDIRECT --to-ports 55413
-A KUBE-PORTALS-CONTAINER -d 10.195.240.10/32 -p udp -m comment --comment "default/kube-dns" -m udp --dport 53 -j REDIRECT --to-ports 60616
-A KUBE-PORTALS-HOST -d 10.195.240.2/32 -p tcp -m comment --comment "default/kubernetes" -m tcp --dport 443 -j DNAT --to-destination 10.254.32.136:51122
-A KUBE-PORTALS-HOST -d 10.195.240.1/32 -p tcp -m comment --comment "default/kubernetes-ro" -m tcp --dport 80 -j DNAT --to-destination 10.254.32.136:55413
-A KUBE-PORTALS-HOST -d 10.195.240.10/32 -p udp -m comment --comment "default/kube-dns" -m udp --dport 53 -j DNAT --to-destination 10.254.32.136:60616
COMMIT

Completed on Wed Apr 8 00:33:28 2015


Reply to this email directly or view it on GitHub
#6545 (comment)
.

@ghost

This comment has been minimized.

Copy link

ghost commented Apr 8, 2015

ifconfig -a for me?

On Tue, Apr 7, 2015 at 5:55 PM, Michael Chaten notifications@github.com
wrote:

It broke on the last rule


Reply to this email directly or view it on GitHub
#6545 (comment)
.

@chaten

This comment has been minimized.

Copy link
Author

chaten commented Apr 8, 2015

None of the rules above broke it, removing the masquerade does, however

iptables -t nat -D POSTROUTING -s 10.24.1.0/24 ! -o cbr0 -j MASQUERADE

ifconfig -a
cbr0      Link encap:Ethernet  HWaddr 16:a8:0b:0b:ee:83  
          inet addr:10.24.1.1  Bcast:0.0.0.0  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1460  Metric:1
          RX packets:54688 errors:0 dropped:0 overruns:0 frame:0
          TX packets:52205 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:10148375 (9.6 MiB)  TX bytes:12174408 (11.6 MiB)

docker0   Link encap:Ethernet  HWaddr 56:84:7a:fe:97:99  
          inet addr:172.17.42.1  Bcast:0.0.0.0  Mask:255.255.0.0
          BROADCAST MULTICAST  MTU:1460  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:536 (536.0 B)  TX bytes:0 (0.0 B)

eth0      Link encap:Ethernet  HWaddr 42:01:0a:f5:46:bd  
          inet addr:10.245.70.189  Bcast:10.245.70.189  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1460  Metric:1
          RX packets:154517 errors:0 dropped:0 overruns:0 frame:0
          TX packets:82675 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:377240451 (359.7 MiB)  TX bytes:13816416 (13.1 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:331 errors:0 dropped:0 overruns:0 frame:0
          TX packets:331 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:25660 (25.0 KiB)  TX bytes:25660 (25.0 KiB)

veth00937f2 Link encap:Ethernet  HWaddr 92:d0:92:59:08:d6  
          UP BROADCAST RUNNING  MTU:1460  Metric:1
          RX packets:3079 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3147 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:380421 (371.5 KiB)  TX bytes:359382 (350.9 KiB)

veth148bd12 Link encap:Ethernet  HWaddr 16:a8:0b:0b:ee:83  
          UP BROADCAST RUNNING  MTU:1460  Metric:1
          RX packets:3195 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3028 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1062751 (1.0 MiB)  TX bytes:1048884 (1.0 MiB)

veth4958a70 Link encap:Ethernet  HWaddr a2:71:90:68:db:ce  
          UP BROADCAST RUNNING  MTU:1460  Metric:1
          RX packets:20 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1412 (1.3 KiB)  TX bytes:1210 (1.1 KiB)
@chaten

This comment has been minimized.

Copy link
Author

chaten commented Apr 8, 2015

Adding iptables -t nat -A POSTROUTING -s 10.192.1.0/24 ! -o cbr0 -j MASQUERADE to a fresh cluster fixes the problem. (10.192.1.0/24 is the subnet for this new node)

I tried setting --ip-masq=true in /etc/default/docker, but it had no effect when --iptables=false

Seems like we need a masquerade rule.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Apr 8, 2015

Fascinating. I have no idea what that rule is doing.

iptables -t filter -D FORWARD -i cbr0 ! -o cbr0 -j ACCEPT

it should be looking at each packet coming from cbr0 (from a container) and
not destined for cbr0 (to a container) and accepting the packet. but
without a default drop rule, that should be a no-op.

On Tue, Apr 7, 2015 at 6:06 PM, Michael Chaten notifications@github.com
wrote:

None of the rules above broke it, removing the masquerade does, however

iptables -t nat -D POSTROUTING -s 10.24.1.0/24 ! -o cbr0 -j MASQUERADE

ifconfig -a
cbr0 Link encap:Ethernet HWaddr 16:a8:0b:0b:ee:83
inet addr:10.24.1.1 Bcast:0.0.0.0 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1460 Metric:1
RX packets:54688 errors:0 dropped:0 overruns:0 frame:0
TX packets:52205 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:10148375 (9.6 MiB) TX bytes:12174408 (11.6 MiB)

docker0 Link encap:Ethernet HWaddr 56:84:7a:fe:97:99
inet addr:172.17.42.1 Bcast:0.0.0.0 Mask:255.255.0.0
BROADCAST MULTICAST MTU:1460 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:536 (536.0 B) TX bytes:0 (0.0 B)

eth0 Link encap:Ethernet HWaddr 42:01:0a:f5:46:bd
inet addr:10.245.70.189 Bcast:10.245.70.189 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1460 Metric:1
RX packets:154517 errors:0 dropped:0 overruns:0 frame:0
TX packets:82675 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:377240451 (359.7 MiB) TX bytes:13816416 (13.1 MiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:331 errors:0 dropped:0 overruns:0 frame:0
TX packets:331 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:25660 (25.0 KiB) TX bytes:25660 (25.0 KiB)

veth00937f2 Link encap:Ethernet HWaddr 92:d0:92:59:08:d6
UP BROADCAST RUNNING MTU:1460 Metric:1
RX packets:3079 errors:0 dropped:0 overruns:0 frame:0
TX packets:3147 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:380421 (371.5 KiB) TX bytes:359382 (350.9 KiB)

veth148bd12 Link encap:Ethernet HWaddr 16:a8:0b:0b:ee:83
UP BROADCAST RUNNING MTU:1460 Metric:1
RX packets:3195 errors:0 dropped:0 overruns:0 frame:0
TX packets:3028 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1062751 (1.0 MiB) TX bytes:1048884 (1.0 MiB)

veth4958a70 Link encap:Ethernet HWaddr a2:71:90:68:db:ce
UP BROADCAST RUNNING MTU:1460 Metric:1
RX packets:20 errors:0 dropped:0 overruns:0 frame:0
TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1412 (1.3 KiB) TX bytes:1210 (1.1 KiB)


Reply to this email directly or view it on GitHub
#6545 (comment)
.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Apr 8, 2015

On Tue, Apr 7, 2015 at 6:06 PM, Michael Chaten notifications@github.com wrote:

None of the rules above broke it, removing the masquerade does, however

iptables -t nat -D POSTROUTING -s 10.24.1.0/24 ! -o cbr0 -j MASQUERADE

Wait, which is it - this masquerade rule, or the last one I asked you to remove?

ifconfig -a
cbr0 Link encap:Ethernet HWaddr 16:a8:0b:0b:ee:83
inet addr:10.24.1.1 Bcast:0.0.0.0 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1460 Metric:1
RX packets:54688 errors:0 dropped:0 overruns:0 frame:0
TX packets:52205 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:10148375 (9.6 MiB) TX bytes:12174408 (11.6 MiB)

docker0 Link encap:Ethernet HWaddr 56:84:7a:fe:97:99
inet addr:172.17.42.1 Bcast:0.0.0.0 Mask:255.255.0.0
BROADCAST MULTICAST MTU:1460 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:536 (536.0 B) TX bytes:0 (0.0 B)

eth0 Link encap:Ethernet HWaddr 42:01:0a:f5:46:bd
inet addr:10.245.70.189 Bcast:10.245.70.189 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1460 Metric:1
RX packets:154517 errors:0 dropped:0 overruns:0 frame:0
TX packets:82675 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:377240451 (359.7 MiB) TX bytes:13816416 (13.1 MiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:331 errors:0 dropped:0 overruns:0 frame:0
TX packets:331 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:25660 (25.0 KiB) TX bytes:25660 (25.0 KiB)

veth00937f2 Link encap:Ethernet HWaddr 92:d0:92:59:08:d6
UP BROADCAST RUNNING MTU:1460 Metric:1
RX packets:3079 errors:0 dropped:0 overruns:0 frame:0
TX packets:3147 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:380421 (371.5 KiB) TX bytes:359382 (350.9 KiB)

veth148bd12 Link encap:Ethernet HWaddr 16:a8:0b:0b:ee:83
UP BROADCAST RUNNING MTU:1460 Metric:1
RX packets:3195 errors:0 dropped:0 overruns:0 frame:0
TX packets:3028 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1062751 (1.0 MiB) TX bytes:1048884 (1.0 MiB)

veth4958a70 Link encap:Ethernet HWaddr a2:71:90:68:db:ce
UP BROADCAST RUNNING MTU:1460 Metric:1
RX packets:20 errors:0 dropped:0 overruns:0 frame:0
TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1412 (1.3 KiB) TX bytes:1210 (1.1 KiB)


Reply to this email directly or view it on GitHub.

@chaten

This comment has been minimized.

Copy link
Author

chaten commented Apr 8, 2015

Masquerade rule. We initially thought it was the last one you posted, but it was just some network latency.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Apr 8, 2015

It makes more sense if it is this masquerade rule. And yet, we can
not keep this masquerade rule as it is.

What this says is that any traffic from a container on this machine
which is going off the machine should masquerade as the machine. This
means that any pod-to-pod traffic will be masqueraded. You don't want
that.

I don't know the new VPN product. Looking at the instructions,
there's a lot of steps that I could imagine affecting this. I'm going
to see if I can get someone who knows the VPN product internals to
help...

On Tue, Apr 7, 2015 at 6:14 PM, Tim Hockin thockin@google.com wrote:

On Tue, Apr 7, 2015 at 6:06 PM, Michael Chaten notifications@github.com wrote:

None of the rules above broke it, removing the masquerade does, however

iptables -t nat -D POSTROUTING -s 10.24.1.0/24 ! -o cbr0 -j MASQUERADE

Wait, which is it - this masquerade rule, or the last one I asked you to remove?

ifconfig -a
cbr0 Link encap:Ethernet HWaddr 16:a8:0b:0b:ee:83
inet addr:10.24.1.1 Bcast:0.0.0.0 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1460 Metric:1
RX packets:54688 errors:0 dropped:0 overruns:0 frame:0
TX packets:52205 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:10148375 (9.6 MiB) TX bytes:12174408 (11.6 MiB)

docker0 Link encap:Ethernet HWaddr 56:84:7a:fe:97:99
inet addr:172.17.42.1 Bcast:0.0.0.0 Mask:255.255.0.0
BROADCAST MULTICAST MTU:1460 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:536 (536.0 B) TX bytes:0 (0.0 B)

eth0 Link encap:Ethernet HWaddr 42:01:0a:f5:46:bd
inet addr:10.245.70.189 Bcast:10.245.70.189 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1460 Metric:1
RX packets:154517 errors:0 dropped:0 overruns:0 frame:0
TX packets:82675 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:377240451 (359.7 MiB) TX bytes:13816416 (13.1 MiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:331 errors:0 dropped:0 overruns:0 frame:0
TX packets:331 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:25660 (25.0 KiB) TX bytes:25660 (25.0 KiB)

veth00937f2 Link encap:Ethernet HWaddr 92:d0:92:59:08:d6
UP BROADCAST RUNNING MTU:1460 Metric:1
RX packets:3079 errors:0 dropped:0 overruns:0 frame:0
TX packets:3147 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:380421 (371.5 KiB) TX bytes:359382 (350.9 KiB)

veth148bd12 Link encap:Ethernet HWaddr 16:a8:0b:0b:ee:83
UP BROADCAST RUNNING MTU:1460 Metric:1
RX packets:3195 errors:0 dropped:0 overruns:0 frame:0
TX packets:3028 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1062751 (1.0 MiB) TX bytes:1048884 (1.0 MiB)

veth4958a70 Link encap:Ethernet HWaddr a2:71:90:68:db:ce
UP BROADCAST RUNNING MTU:1460 Metric:1
RX packets:20 errors:0 dropped:0 overruns:0 frame:0
TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1412 (1.3 KiB) TX bytes:1210 (1.1 KiB)


Reply to this email directly or view it on GitHub.

@mikedanese

This comment has been minimized.

Copy link
Member

mikedanese commented Apr 8, 2015

As a workaround, what if salt created a rule:

iptables -t nat -A POSTROUTING -s 10.192.1.0/24 ! -o cbr0 ! -d 10.192.0.0/14 -j MASQUERADE

where 10.192.1.0/24 is the local pod subnet and 10.192.0.0/14 is the cluster pod subnet? No masquerading pod to pod.

@ghost

This comment has been minimized.

Copy link

ghost commented Apr 8, 2015

If we know the cluster's subnet this might work. I have someone from
networking involved, we will see what they say.
On Apr 7, 2015 6:33 PM, "Mike Danese" notifications@github.com wrote:

As a workaround, what if salt created a rule:

iptables -t nat -A POSTROUTING -s 10.192.1.0/24 ! -o cbr0 ! -d 10.192.0.0/14 -j MASQUERADE

where 10.192.1.0/24 is the local pod subnet and 10.192.0.0/14 is the
cluster pod subnet?


Reply to this email directly or view it on GitHub
#6545 (comment)
.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Apr 8, 2015

From our networking folks:

The first thing is to confirm the AWS to GCE path is working
correctly. From 10.20.200.114 (AWS) can you please ping 10.24.1.1.
Please make sure your firewall rules allow ICMP traffic from the
correct source address . While that's running, tcpdump on eth0 of the
VM, are the packets arriving correctly?

Once we've confirmed that, we can test in the opposite direction.

On Tue, Apr 7, 2015 at 7:25 PM, thockin-cc notifications@github.com wrote:

If we know the cluster's subnet this might work. I have someone from
networking involved, we will see what they say.
On Apr 7, 2015 6:33 PM, "Mike Danese" notifications@github.com wrote:

As a workaround, what if salt created a rule:

iptables -t nat -A POSTROUTING -s 10.192.1.0/24 ! -o cbr0 ! -d
10.192.0.0/14 -j MASQUERADE

where 10.192.1.0/24 is the local pod subnet and 10.192.0.0/14 is the
cluster pod subnet?


Reply to this email directly or view it on GitHub
<
#6545 (comment)

.


Reply to this email directly or view it on GitHub
#6545 (comment)
.

@chaten

This comment has been minimized.

Copy link
Author

chaten commented Apr 8, 2015

src target tcpdump on target ping
VM AWS yes yes
AWS VM yes yes
Container VM yes yes
Container AWS yes no
AWS container no no
@chaten

This comment has been minimized.

Copy link
Author

chaten commented Apr 8, 2015

aws -> container traceroute goes through the VPN, but goes to a black hole inside of GCE
container -> aws traceroute shows 10.192.1.1 and then a black hole (the cluster is based on 10.192.0.0/14 this time)

I have a GCE firewall rule allowing 10.0.0.0/8 -> node & master for all tcp, udp and icmp traffic.
I have routes setup through the VPN in AWS and a security group allowing all traffic in from 10.0.0.0/8 and all traffic out.

@mikedanese

This comment has been minimized.

Copy link
Member

mikedanese commented Apr 8, 2015

Seems like the pod network might need to be added as a "left" (local) subnet to the gcp vpn. Right now it probably only adds the subnet of the network it's created for to its left subnets. According to openswan (which you guys might use? the logs from the gcp vpn look familiar) doc here : "The configured subnets of the peers may differ, the protocol narrows it to the greatest common subnet."

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Apr 10, 2015

Unfortunately, based on some investigation, it sounds like this will not work with the VPN product as it exists today. The downside of pushing the envelope is that sometimes you push behind what the rest of the ecosystem is ready for. We'll get an internal bug going and see how we can triage this.

Sorry for the let-down.

Tim

@thockin thockin closed this Apr 10, 2015

@jasonmoo

This comment has been minimized.

Copy link

jasonmoo commented Aug 2, 2016

@thockin was this ever triaged internally?

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Aug 2, 2016

This should be fixed with the current GCE VPN product, but I have not manually verified it.

@bobveznat

This comment has been minimized.

Copy link
Contributor

bobveznat commented Aug 30, 2016

I encountered this problem today using the current VPN product (with BGP no less). I found that a specific NAT rule in iptables was a bit overzealous:

-A POSTROUTING ! -d 10.0.0.0/8 -m comment --comment "kubenet: SNAT for outbound traffic from cluster" -m addrtype ! --dst-type LOCAL -j MASQUERADE

That basically says NAT anything that's not going to 10.0.0.0/8. However, my cluster IP space is only 10.48.0.0/14.

I changed the rule to this:
-A POSTROUTING ! -d 10.48.0.0/14 -m comment --comment "kubenet: SNAT for outbound traffic from cluster" -m addrtype ! --dst-type LOCAL -j MASQUERADE

And at least so far this is happier. I believe that I have seen:

  • pods talk to each other across hosts
  • pods talk to resources on my corporate network (across the VPN)
  • pods talk to resources on the Internet
@ghost

This comment has been minimized.

Copy link

ghost commented Aug 31, 2016

If you do that, then pod IPs are lost on traffic that goes to your own
network but isn't in your kube cluster. Basically that rule is designed to
appeas the one-to-one edge NAT, which requires the source IP to be the VM's.

Are you saying that GCE's VPN is dropping pod traffic?

On Tue, Aug 30, 2016 at 2:52 PM, Bob Van Zant notifications@github.com
wrote:

I encountered this problem today using the current VPN product (with BGP
no less). I found that a specific NAT rule in iptables was a bit
overzealous:

-A POSTROUTING ! -d 10.0.0.0/8 -m comment --comment "kubenet: SNAT for
outbound traffic from cluster" -m addrtype ! --dst-type LOCAL -j MASQUERADE

That basically says NAT anything that's not going to 10.0.0.0/8. However,
my cluster IP space is only 10.48.0.0/14.

I changed the rule to this:
-A POSTROUTING ! -d 10.48.0.0/14 -m comment --comment "kubenet: SNAT for
outbound traffic from cluster" -m addrtype ! --dst-type LOCAL -j MASQUERADE

And at least so far this is happier. I believe that I have seen:

  • pods talk to each other across hosts
  • pods talk to resources on my corporate network (across the VPN)
  • pods talk to resources on the Internet


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#6545 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALNCDwacqwB7FsY3xJSrTqpgnvpzuFcLks5qlKY2gaJpZM4D8CcX
.

@bobveznat

This comment has been minimized.

Copy link
Contributor

bobveznat commented Aug 31, 2016

I want the pods to get NATed before going back to my corporate network. Not
sure about everyone else but I'm unwilling to take an entire /14 and
dedicate it to a single k8s cluster (there are only 64 /14s in 10/8) and so
I do not route it anywhere. It's an island.

If we had ipv6 (we do, a /48 in total) my opinion might be different.

On Aug 30, 2016 9:48 PM, "thockin-cc" notifications@github.com wrote:

If you do that, then pod IPs are lost on traffic that goes to your own
network but isn't in your kube cluster. Basically that rule is designed to
appeas the one-to-one edge NAT, which requires the source IP to be the
VM's.

Are you saying that GCE's VPN is dropping pod traffic?

On Tue, Aug 30, 2016 at 2:52 PM, Bob Van Zant notifications@github.com
wrote:

I encountered this problem today using the current VPN product (with BGP
no less). I found that a specific NAT rule in iptables was a bit
overzealous:

-A POSTROUTING ! -d 10.0.0.0/8 -m comment --comment "kubenet: SNAT for
outbound traffic from cluster" -m addrtype ! --dst-type LOCAL -j
MASQUERADE

That basically says NAT anything that's not going to 10.0.0.0/8.
However,
my cluster IP space is only 10.48.0.0/14.

I changed the rule to this:
-A POSTROUTING ! -d 10.48.0.0/14 -m comment --comment "kubenet: SNAT for
outbound traffic from cluster" -m addrtype ! --dst-type LOCAL -j
MASQUERADE

And at least so far this is happier. I believe that I have seen:

  • pods talk to each other across hosts
  • pods talk to resources on my corporate network (across the VPN)
  • pods talk to resources on the Internet


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/kubernetes/kubernetes/issues/
6545#issuecomment-243593182>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/
ALNCDwacqwB7FsY3xJSrTqpgnvpzuFcLks5qlKY2gaJpZM4D8CcX>
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#6545 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAmwf6esnlLZK0WoxrnJuLatRZjdjZLjks5qlQeggaJpZM4D8CcX
.

@jasonmoo

This comment has been minimized.

Copy link

jasonmoo commented Aug 31, 2016

Just to confirm what worked for me:

Setting up a /16 subnet in GCP and deploying a kube cluster to it with it's own /16.
10.10.0.0/16 - subnet
10.11.0.0/16 - kube cluster ip range
Then adding 10.10.0.0/15 to the routing table on the AWS side of the VPN.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Aug 31, 2016

On Tue, Aug 30, 2016 at 10:00 PM, Bob Van Zant notifications@github.com wrote:

I want the pods to get NATed before going back to my corporate network. Not
sure about everyone else but I'm unwilling to take an entire /14 and
dedicate it to a single k8s cluster (there are only 64 /14s in 10/8) and so
I do not route it anywhere. It's an island.

In that case, you need to manage the additional masquerade rule
yourself. You can write a pretty trivial DaemonSet that simply
ensures that the iptables rule you want is installed. It will run on
every machine. We've used this pattern for other "tweak the system"
sorts of things.

I can help you craft that DaemonSet if you need, but it sounds like
you have a good sense of what to do. Just run privileged and
hostNetwork :)

If we had ipv6 (we do, a /48 in total) my opinion might be different.

On Aug 30, 2016 9:48 PM, "thockin-cc" notifications@github.com wrote:

If you do that, then pod IPs are lost on traffic that goes to your own
network but isn't in your kube cluster. Basically that rule is designed to
appeas the one-to-one edge NAT, which requires the source IP to be the
VM's.

Are you saying that GCE's VPN is dropping pod traffic?

On Tue, Aug 30, 2016 at 2:52 PM, Bob Van Zant notifications@github.com
wrote:

I encountered this problem today using the current VPN product (with BGP
no less). I found that a specific NAT rule in iptables was a bit
overzealous:

-A POSTROUTING ! -d 10.0.0.0/8 -m comment --comment "kubenet: SNAT for
outbound traffic from cluster" -m addrtype ! --dst-type LOCAL -j
MASQUERADE

That basically says NAT anything that's not going to 10.0.0.0/8.
However,
my cluster IP space is only 10.48.0.0/14.

I changed the rule to this:
-A POSTROUTING ! -d 10.48.0.0/14 -m comment --comment "kubenet: SNAT for
outbound traffic from cluster" -m addrtype ! --dst-type LOCAL -j
MASQUERADE

And at least so far this is happier. I believe that I have seen:

  • pods talk to each other across hosts
  • pods talk to resources on my corporate network (across the VPN)
  • pods talk to resources on the Internet


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/kubernetes/kubernetes/issues/
6545#issuecomment-243593182>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/
ALNCDwacqwB7FsY3xJSrTqpgnvpzuFcLks5qlKY2gaJpZM4D8CcX>
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#6545 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAmwf6esnlLZK0WoxrnJuLatRZjdjZLjks5qlQeggaJpZM4D8CcX
.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@jayme-github

This comment has been minimized.

Copy link
Contributor

jayme-github commented Sep 27, 2016

I ran into this too as corporate subnets are within 10.0.0.0/8 here. Masquerading of Pod's IPs worked here (and still work on some nodes) because of two additional NAT rules of which I do not know where they come from:

-A POSTROUTING -d 10.192.52.0/22 -o eth0 -j MASQUERADE
-A POSTROUTING -d 10.0.0.0/24 -o eth0 -j MASQUERADE

Routes to those networks are defined as GCE routes with CloudVPN tunnels as next hop.

@thockin how would you do a "one-shot" DaemonSet to add/change some iptables rules? Just run privileged and with hostNetwork, run iptabels ... and than sleep infinity?

@roberthbailey

This comment has been minimized.

Copy link
Member

roberthbailey commented Sep 27, 2016

@mikedanese has documentation about how to run a daemonset to do node configuration in kubernetes/contrib#892

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Sep 27, 2016

I wouldn't sleep inf, I would actually write it to check that the rules you
need are present, and if not add them, then sleep for 1 minute and repeat.
Self-healing FTW

On Tue, Sep 27, 2016 at 12:48 AM, jayme-github notifications@github.com
wrote:

I ran into this too as corporate subnets are within 10.0.0.0/8 here.
Masquerading of Pod's IPs worked here (and still work on some nodes)
because of two additional NAT rules of which I do not know where they come
from:

-A POSTROUTING -d 10.192.52.0/22 -o eth0 -j MASQUERADE
-A POSTROUTING -d 10.0.0.0/24 -o eth0 -j MASQUERADE

Routes to those networks are defined as GCE routes with CloudVPN tunnels
as next hop.

@thockin https://github.com/thockin how would you do a "one-shot"
DaemonSet to add/change some iptables rules? Just run privileged and with
hostNetwork, run iptabels ... and than sleep infinity?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6545 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVGxcXiSMny9uPIWmurXMAAF1bTqpks5quMppgaJpZM4D8CcX
.

@tonglil

This comment has been minimized.

Copy link
Contributor

tonglil commented Oct 3, 2016

Is using the daemonset the recommended/only way to move forward with this?

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Oct 4, 2016

There's a flag that controls what single range gets not-masqueraded, but in
GKE flags don't persist well. A DS is your best bet

On Mon, Oct 3, 2016 at 3:51 PM, Tony Li notifications@github.com wrote:

Is using the daemonset the recommended/only way to move forward with this?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6545 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVDCFU3gCvBP59tC2ulv35GuRcaHqks5qwYbdgaJpZM4D8CcX
.

@tonglil

This comment has been minimized.

Copy link
Contributor

tonglil commented Oct 4, 2016

Ok thank you. How do recommend a way to get the cluster IP programmatically to pass to the iptables command in the DS?

To be clear, for the container address range 10.68.0.0/14, I am running sudo iptables -t nat -A POSTROUTING ! -d 10.68.0.0/14 -o eth0 -j MASQUERADE, which then gives me connectivity.

@roberthbailey

This comment has been minimized.

Copy link
Member

roberthbailey commented Oct 4, 2016

The GKE API will provide you the cluster CIDR. You can use gcloud or a raw API call to get it. I'd recommend making it a parameter on your DS and fetching/configuring it once rather than letting each pod do it dynamically for a couple of reasons:

  1. The value never changes over the life of the cluster so it's unnecessary to keep asking for it (and then implement proper retry / failure handling)
  2. It would require the cloud platform scope on your VMs which you may not otherwise need.
@Smana

This comment has been minimized.

Copy link

Smana commented Oct 28, 2016

Hi !
I faced the same issue.
The following command was required in order to get my pods connects through the VPN on GKE.

iptables -t nat -A POSTROUTING ! -d 10.184.1.0/24 -o eth0 -j MASQUERADE
I would like to know how i can automate this task at node startup, for instance when i scale up the k8s cluster.
What's the best way to achieve that please?

@tonglil

This comment has been minimized.

Copy link
Contributor

tonglil commented Nov 15, 2016

@roberthbailey what is the raw API call to get it?

I attempted the following methods by gcloud compute ssh _node_ first to test, but I am stuck:

  • Running gcloud container clusters describe test-proxy --format 'value(clusterIpv4Cidr)' gives ERROR: (gcloud.container.clusters.describe) ResponseError: code=403, message=Request had insufficient authentication scopes.. And on the new GCI image VMs, gcloud is not installed.
  • kubectl cluster-info dump or kubectl proxy to try to parse the output yields The connection to the server localhost:8080 was refused - did you specify the right host or port?. How can I determine what port to set to connect to Kubernetes?
@roberthbailey

This comment has been minimized.

Copy link
Member

roberthbailey commented Nov 21, 2016

Hi @tonglil - sorry for the slow response, I've been out of the office this past week.

Running gcloud container clusters describe test-proxy --format 'value(clusterIpv4Cidr)' gives ERROR: (gcloud.container.clusters.describe) ResponseError: code=403, message=Request had insufficient authentication scopes.

You'd need to add the cloud platform scope to your VM for it to have permissions to access the GKE API (or a service account with the clusters.read IAM permission). Alternatively, you could fetch the kube-env metadata entry and grep for CLUSTER_IP_RANGE to find the CIDR (although this isn't a stable API and may break/change in future versions).

And on the new GCI image VMs, gcloud is not installed.

Yeah, this is annoying. You can install it into the toolbox (run toolbox and then install gcloud). I've also had success running the google/cloud-sdk docker image to execute gcloud commands on GCI.

kubectl cluster-info dump or kubectl proxy to try to parse the output yields The connection to the server localhost:8080 was refused - did you specify the right host or port?. How can I determine what port to set to connect to Kubernetes?

Since you are running this from a node you'll need to pass kubectl a proper kubeconfig file. The easiest way is to impersonate the kubelet -- copy /var/lib/kubelet/kubeconfig and add in the address of the apiserver (can be found by running ps -elf | grep kubelet and looking for the --api-servers flag). Then pass that file to kubectl as a flag (--kubeconfig).

@tonglil

This comment has been minimized.

Copy link
Contributor

tonglil commented Nov 21, 2016

@roberthbailey no worries, I appreciate your response!

I was able to get the docker image method to run gcloud commands!

ZONE=$(curl -s -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/zone | cut -d/ -f4)

docker run -it google/cloud-sdk:latest gcloud container clusters describe test-proxy --format 'value(clusterIpv4Cidr)' --zone $ZONE
10.72.0.0/14

I avoided the apt-get method, in case apt-get becomes no longer available.

Thank you for providing details on the config steps to get kubectl running, I was able to use kubectl via the described method in case someone else wants to access kubectl on GCI. This method doesn't seem very "automatable". I am able to parse kubectl cluster-info dump after manually editing the kubeconfig file though:

kubectl --kubeconfig kubeconfig cluster-info dump | grep -oE "cluster-cidr=([0-9]{1,3}\.){3}[0-9]{1,3}\/([1-2][0-9]|3[0-2]|[0-9])\b" | cut -d'=' -f2 | uniq

Some questions:

  1. are there plans to include gcloud on GCI images? It seems like a sane default/include?
  2. would it be sensible to ask for a GCE instance's --zone to be available as an environment variable by default?
  3. what are your thoughts on including the cluster-cidr as an environment variable in a pod's containers?
@roberthbailey

This comment has been minimized.

Copy link
Member

roberthbailey commented Nov 22, 2016

  1. are there plans to include gcloud on GCI images? It seems like a sane default/include?

Yes. I filed an internal feature request about this the first time I had to jump through the hoops that you just documented. :)

  1. would it be sensible to ask for a GCE instance's --zone to be available as an environment variable by default?

You could do this in your own shell by adding a curl to the metadata server when the shell starts.
If this is just for gcloud it's also conceivable that gcloud could be smart enough to default to the zone in the local metadata server if another zone isn't specified. This would be inline with the way it deals with credentials, and might be handy in other cases too.

  1. what are your thoughts on including the cluster-cidr as an environment variable in a pod's containers?

You could do this yourself if you wanted (although it might require some pre-processing of yaml/json files to customize per cluster).

We already inject a bunch of environment variables automatically so I suppose it would be possible, but we'd want to be careful about how we do it. We are also talking about making a cluster config as part of cluster bootstrapping in the cluster lifecycle SIG, so that could be another well known place to grab the value from (e.g. a well known config map or API object).

As an aside, I expect that the cluster-cidr will end up turning into cidrs (plural) as we will want to add functionality for a cluster to grow into non-contiguous IP space. So we'd need to be careful to make the mechanism of producing the cidr allow for expansion in the future.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Nov 22, 2016

a) We don't want to inject more automatic variables. We should prefer to
publish the configuration via a ConfigMap and let users ingest it if they
want it.

b) Please don't assume a single CIDR for the whole cluster. It's a bad
assumption, and it will break in the future.

On Tue, Nov 22, 2016 at 12:47 AM, Robert Bailey notifications@github.com
wrote:

  1. are there plans to include gcloud on GCI images? It seems like a
    sane default/include?

Yes. I filed an internal feature request about this the first time I had
to jump through the hoops that you just documented. :)

  1. would it be sensible to ask for a GCE instance's --zone to be
    available as an environment variable by default?

You could do this in your own shell by adding a curl to the metadata
server when the shell starts.
If this is just for gcloud it's also conceivable that gcloud could be
smart enough to default to the zone in the local metadata server if another
zone isn't specified. This would be inline with the way it deals with
credentials, and might be handy in other cases too.

  1. what are your thoughts on including the cluster-cidr as an
    environment variable in a pod's containers?

You could do this yourself if you wanted (although it might require some
pre-processing of yaml/json files to customize per cluster).

We already inject a bunch of environment variables automatically so I
suppose it would be possible, but we'd want to be careful about how we do
it. We are also talking about making a cluster config as part of cluster
bootstrapping in the cluster lifecycle SIG, so that could be another well
known place to grab the value from (e.g. a well known config map or API
object).

As an aside, I expect that the cluster-cidr will end up turning into cidrs
(plural) as we will want to add functionality for a cluster to grow into
non-contiguous IP space. So we'd need to be careful to make the mechanism
of producing the cidr allow for expansion in the future.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6545 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVGHnvymF66P-gB7rK2E-IFZtA2huks5rAqwVgaJpZM4D8CcX
.

@tonglil

This comment has been minimized.

Copy link
Contributor

tonglil commented Nov 22, 2016

Thanks @thockin. I believe we are going to change it so that it only NATs the traffic heading for the VPN so we won't need to find the cluster CIDR(s).

@tonglil

This comment has been minimized.

Copy link
Contributor

tonglil commented Jan 27, 2017

For those who have implemented this, has anyone encountered MTU issues with the VPN + GCP L4 internal load balancers + GKE setup?

We need to lower the MTU on the GKE nodes (running GCI) to 1400, but there doesn't seem to be a good way to do this. We can't do this from the running containers (because app teams use their own containers and shouldn't be concerned with MTU settings). We don't want to override kubelet options (because overriding platform components seem like a bad idea™)

@pydevops

This comment has been minimized.

Copy link

pydevops commented Sep 5, 2017

I use the same DS as specified in https://blog.mrtrustor.net/post/iptables-kubernetes/. It works.

@baracoder

This comment has been minimized.

Copy link

baracoder commented Oct 25, 2017

I am facing the same issue trying to connect to some VMs on azure from GKE via VPN.
Can someone clarify why the MASQUERADE is required in the first place? Shouldn't declaring a route on azure side for the podCidr be enough?

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Oct 26, 2017

@pydevops

This comment has been minimized.

Copy link

pydevops commented Nov 2, 2017

@baracoder If you don't add MASQUERADE (i.e. SNAT), the VPN won't allow the egress traffic from GKE to azure. Or https://cloud.google.com/container-engine/docs/ip-aliases#using_an_existing_subnetwork_with_secondary_ranges can come to rescue

@pdecat

This comment has been minimized.

Copy link

pdecat commented Feb 2, 2018

Update for those still coming here for solutions.

Nowadays, the proper way to achieve this is to deploy kubernetes' IP Masquerade Agent and configure it not to masquerade IP addresses only for really local traffic and not the entire RFC1918 CIDRs.
That way traffic to other destinations in private networks is masqueraded properly and accepted by VPN gateways.

Example ip-masq-agent ConfigMap:

nonMasqueradeCIDRs:
  - 10.184.0.0/14  # The IPv4 CIDR the cluster is using for Pods (required)
  - 172.16.32.0/20 # The IPv4 CIDR of the subnetwork the cluster is using for Nodes (optional, works without but I guess its better with it)
masqLinkLocal: false
resyncInterval: 60s

Note: the ip-masq-agent is installed by default since version 1.7.0 with Network Policy enabled.
It is also supposed to be installed by default when using a cluster CIDR not in the 10.0.0.0/8 range but was not true for some 1.8.5 clusters I checked where Network Policy is disabled.

For reference, see https://cloud.google.com/kubernetes-engine/docs/how-to/ip-masquerade-agent

This eliminates the need to use a startup-script to setup the iptables rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.