kubernetes proxy iptables rules lost after restarting iptables+node #1922

thoraxe · 2015-04-26T21:26:40Z

Effectively restarting iptables and then openshift-node is like rebooting the node host.

This gist contains some relevant information:
https://gist.github.com/thoraxe/a875b22f42051d64c267

Essentially, I am seeing that iptables rules are not re-initialized on my node hosts, specifically those related to kubernetes proxying.

I think this is kind of related to #1089 except that I am not having any issue reaching Docker containers over the SDN.

I'm not sure if this is a "wait for a minute" kind of thing, but it's been more than a minute or two since I restarted the node (~ 3 minutes) and the iptables rules have not reappeared.

Should I be opening this issue upstream (kubernetes)?

thoraxe · 2015-04-26T21:35:33Z

So, the interesting thing is that when I actually rebooted the node, the rules came back.

However, restarting a node host that is also a master definitely does not result in getting the iptables rules back.

So I think there may be an issue with service ordering / start order when it comes to successfully getting the iptables rules back, and I think it may manifest on nodes that are running master...

brenton · 2015-04-27T15:08:11Z

Here's what I'm seeing in my 1 master / 3 node environment:

# on node1
systemctl restart iptables
systemctl restart openshift-node

The iptables rules come back quickly and I can curl a service that has a pod running on that node. However, my other nodes haven't picked up the change after 5 minutes or so.

I then did a clean reboot off all the VMs and 2 of the Nodes came up with the kube-proxy working as expected. I'm still waiting on the 3rd though. It's running the pods as expected but none of the kube-proxy nat'ing was set up after waiting several minutes. Forcefully restarting openshift-node caused the nat'ing to be reloaded and things worked as expected.

I think there are at least two problems here:

It's easy to get into a split brain state where the kube-proxies don't know how to route traffic
There's probably something wrong with systemd units. The machine that didn't come up correctly was also running a Master. There could be race conditions with ordering that might trip up OS reboots. I know for a fact I've had OS reboots return a working environment so I'm suspecting an edge case.

brenton · 2015-04-27T15:21:44Z

In my environment it definitely seems that if I don't restart openshift-sdn-node as well the other Nodes' kube-proxies never start working.

brenton · 2015-04-27T15:25:06Z

@thoraxe, could you see if you are able to reproduce what I'm seeing:

# On the Node hosting the docker registry pod
system restart iptables
system restart openshift-sdn-node
system restart openshift-node

If I run that all other Nodes will be able to access the registry via the Service IP a few seconds after those commands finish.

FWIW, I also did an OS reboot again and all the services came up correctly this time. I'll have to keep looking in to this problem.

sdodson · 2015-04-27T17:25:03Z

Looking at logs when this fails, I suspect that openshift-sdn-node is altering iptables rules at roughly the same time that openshift-node is attempting to do so. We should alter both k8s and openshift-sdn-node to use iptables -w and add re-try logic as -w will immediately return an error if it can't get a lock.

Here's the logs that lead me to believe it's a contention issue between the two

Apr 27 11:40:20 ose3-master.example.com openshift-sdn-node[2058]: + iptables -t nat -D POSTROUTING -s 10.1.0.0/16 '!' -d 10.1.0.0/16 -j MASQUERADE
...
Apr 27 11:40:20 ose3-master.example.com openshift-sdn-node[2058]: + iptables -I INPUT 2 -i lbr0 -m comment --comment 'traffic from docker' -j ACCEPT
Apr 27 11:40:21 ose3-master.example.com openshift-node[2237]: E0427 11:40:21.661746    2237 proxier.go:337] Failed to initialize iptables: error checking rule: exit status 4: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Apr 27 11:40:21 ose3-master.example.com openshift-node[2237]: E0427 11:40:21.661912    2237 node.go:245] WARNING: Could not modify iptables.  iptables must be mutable by this process to use services.  Do you have root permissions?
Apr 27 11:40:22 ose3-master.example.com openshift-node[2237]: I0427 11:40:22.147405    2237 failing_service_config_proxy.go:19] Failed to properly wire up services.  This can happen if you forget to launch with permissions to iptables.  Access to the following services will be impaired: "kubernetes-ro, router, database, frontend, ruby-example, docker-registry, kubernetes"

So, at minimum there's 661ms and at most 1.661s between the time openshift-sdn-node last issued an iptables command and openshift-node attempts to provision the proxies. Given this is happening after a reboot the system may be busy with other things and take longer than expected to commit iptables rules.

@smarterclayton should we file an upstream k8s issue? While openshift-sdn-node may be exacerbating the situation here, there may be a number of other scenarios where an external process is altering iptables rules and we should -w and retry, (ie: puppet runs, etc).

openshift-sdn-node doesn't contain the sdNotify stuff that allow it to signal when it's completed its start up, we should probably add that, it may help the situation but it's not the solution.

smarterclayton · 2015-04-27T17:26:47Z

Ok, so everyone should be using -w then.

We should have two issues, one for kube and one for openshift-sdn. We can repurpose this one for openshift-sdn and assign to @rajatchopra, can you file the upstream?

----- Original Message -----

Looking at ogs when this fails, I suspect that openshift-sdn-node is altering
iptables rules at roughly the same time that openshift-node is attempting to
do so. We should alter both k8s and openshift-sdn-node to use iptables -w
and add re-try logic as -w will immediately return an error if it can't get
a lock.

Here's the logs that lead me to believe it's a contention issue between the
two
Apr 27 11:40:20 ose3-master.example.com openshift-sdn-node[2058]: + iptables
-t nat -D POSTROUTING -s 10.1.0.0/16 '!' -d 10.1.0.0/16 -j MASQUERADE
...
Apr 27 11:40:20 ose3-master.example.com openshift-sdn-node[2058]: + iptables
-I INPUT 2 -i lbr0 -m comment --comment 'traffic from docker' -j ACCEPT
Apr 27 11:40:21 ose3-master.example.com openshift-node[2237]: E0427
11:40:21.661746    2237 proxier.go:337] Failed to initialize iptables: error
checking rule: exit status 4: Another app is currently holding the xtables
lock. Perhaps you want to use the -w option?
Apr 27 11:40:21 ose3-master.example.com openshift-node[2237]: E0427
11:40:21.661912    2237 node.go:245] WARNING: Could not modify iptables.
iptables must be mutable by this process to use services.  Do you have root
permissions?
Apr 27 11:40:22 ose3-master.example.com openshift-node[2237]: I0427
11:40:22.147405    2237 failing_service_config_proxy.go:19] Failed to
properly wire up services.  This can happen if you forget to launch with
permissions to iptables.  Access to the following services will be impaired:
"kubernetes-ro, router, database, frontend, ruby-example, docker-registry,
kubernetes"
So, at minimum there's 661ms and at most 1.661s between the time
openshift-sdn-node last issued an iptables command and openshift-node
attempts to provision the proxies. Given this is happening after a reboot
the system may be busy with other things and take longer than expected to
commit iptables rules.

@smarterclayton should we file an upstream k8s issue? While
openshift-sdn-node may be exacerbating the situation here, there may be a
number of other scenarios where an external process is altering iptables
rules and we should -w and retry, (ie: puppet runs, etc).

openshift-sdn-node doesn't contain the sdNotify stuff that allow it to signal
when it's completed its startup, we should probably add that, it may help
the situation but it's not the solution.

Reply to this email directly or view it on GitHub:
#1922 (comment)

brenton · 2015-04-27T17:34:44Z

@sdodson, I think it's correct to say that with using '-w' (which all distros might not have) we need to have a reasonable default timeout. I think we'd still want retry logic as well if it wasn't too messy.

@smarterclayton, would you want retry logic added?

rajatchopra · 2015-04-27T17:50:35Z

FWIW, I am putting together the code so that openshift-sdn will run as a go routine inside openshift-node, so we can event this stuff easily if we want certain order to be enforced.

smarterclayton · 2015-04-27T17:53:49Z

Let's have the retry discussion upstream for the kube-proxy (for any go code). Retry on ansible etc seems like we'd potentially have races as well.

----- Original Message -----

@sdodson, I think it's correct to say that with using '-w' (which all distros
might not have) we need to have a reasonable default timeout. I think we'd
still want retry logic as well if it wasn't too messy.

@smarterclayton, would you want retry logic added?

Reply to this email directly or view it on GitHub:
#1922 (comment)

brenton · 2015-04-29T17:24:26Z

@thoraxe, I think we can close this now. We're tracking the iptables locking race condition upstream.

sdodson mentioned this issue Apr 27, 2015

iptables invocations fail if another process is modifying iptables rules kubernetes/kubernetes#7370

Closed

brenton closed this as completed Apr 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubernetes proxy iptables rules lost after restarting iptables+node #1922

kubernetes proxy iptables rules lost after restarting iptables+node #1922

thoraxe commented Apr 26, 2015

thoraxe commented Apr 26, 2015

brenton commented Apr 27, 2015

brenton commented Apr 27, 2015

brenton commented Apr 27, 2015

sdodson commented Apr 27, 2015

smarterclayton commented Apr 27, 2015

brenton commented Apr 27, 2015

rajatchopra commented Apr 27, 2015

smarterclayton commented Apr 27, 2015

brenton commented Apr 29, 2015

kubernetes proxy iptables rules lost after restarting iptables+node #1922

kubernetes proxy iptables rules lost after restarting iptables+node #1922

Comments

thoraxe commented Apr 26, 2015

thoraxe commented Apr 26, 2015

brenton commented Apr 27, 2015

brenton commented Apr 27, 2015

brenton commented Apr 27, 2015

sdodson commented Apr 27, 2015

smarterclayton commented Apr 27, 2015

brenton commented Apr 27, 2015

rajatchopra commented Apr 27, 2015

smarterclayton commented Apr 27, 2015

brenton commented Apr 29, 2015