Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes proxy iptables rules lost after restarting iptables+node #1922

Closed
thoraxe opened this issue Apr 26, 2015 · 10 comments
Closed

kubernetes proxy iptables rules lost after restarting iptables+node #1922

thoraxe opened this issue Apr 26, 2015 · 10 comments

Comments

@thoraxe
Copy link
Contributor

thoraxe commented Apr 26, 2015

Effectively restarting iptables and then openshift-node is like rebooting the node host.

This gist contains some relevant information:
https://gist.github.com/thoraxe/a875b22f42051d64c267

Essentially, I am seeing that iptables rules are not re-initialized on my node hosts, specifically those related to kubernetes proxying.

I think this is kind of related to #1089 except that I am not having any issue reaching Docker containers over the SDN.

I'm not sure if this is a "wait for a minute" kind of thing, but it's been more than a minute or two since I restarted the node (~ 3 minutes) and the iptables rules have not reappeared.

Should I be opening this issue upstream (kubernetes)?

@thoraxe
Copy link
Contributor Author

thoraxe commented Apr 26, 2015

So, the interesting thing is that when I actually rebooted the node, the rules came back.

However, restarting a node host that is also a master definitely does not result in getting the iptables rules back.

So I think there may be an issue with service ordering / start order when it comes to successfully getting the iptables rules back, and I think it may manifest on nodes that are running master...

@brenton
Copy link
Contributor

brenton commented Apr 27, 2015

Here's what I'm seeing in my 1 master / 3 node environment:

# on node1
systemctl restart iptables
systemctl restart openshift-node

The iptables rules come back quickly and I can curl a service that has a pod running on that node. However, my other nodes haven't picked up the change after 5 minutes or so.

I then did a clean reboot off all the VMs and 2 of the Nodes came up with the kube-proxy working as expected. I'm still waiting on the 3rd though. It's running the pods as expected but none of the kube-proxy nat'ing was set up after waiting several minutes. Forcefully restarting openshift-node caused the nat'ing to be reloaded and things worked as expected.

I think there are at least two problems here:

  • It's easy to get into a split brain state where the kube-proxies don't know how to route traffic
  • There's probably something wrong with systemd units. The machine that didn't come up correctly was also running a Master. There could be race conditions with ordering that might trip up OS reboots. I know for a fact I've had OS reboots return a working environment so I'm suspecting an edge case.

@brenton
Copy link
Contributor

brenton commented Apr 27, 2015

In my environment it definitely seems that if I don't restart openshift-sdn-node as well the other Nodes' kube-proxies never start working.

@brenton
Copy link
Contributor

brenton commented Apr 27, 2015

@thoraxe, could you see if you are able to reproduce what I'm seeing:

# On the Node hosting the docker registry pod
system restart iptables
system restart openshift-sdn-node
system restart openshift-node

If I run that all other Nodes will be able to access the registry via the Service IP a few seconds after those commands finish.

FWIW, I also did an OS reboot again and all the services came up correctly this time. I'll have to keep looking in to this problem.

@sdodson
Copy link
Member

sdodson commented Apr 27, 2015

Looking at logs when this fails, I suspect that openshift-sdn-node is altering iptables rules at roughly the same time that openshift-node is attempting to do so. We should alter both k8s and openshift-sdn-node to use iptables -w and add re-try logic as -w will immediately return an error if it can't get a lock.

Here's the logs that lead me to believe it's a contention issue between the two

Apr 27 11:40:20 ose3-master.example.com openshift-sdn-node[2058]: + iptables -t nat -D POSTROUTING -s 10.1.0.0/16 '!' -d 10.1.0.0/16 -j MASQUERADE
...
Apr 27 11:40:20 ose3-master.example.com openshift-sdn-node[2058]: + iptables -I INPUT 2 -i lbr0 -m comment --comment 'traffic from docker' -j ACCEPT
Apr 27 11:40:21 ose3-master.example.com openshift-node[2237]: E0427 11:40:21.661746    2237 proxier.go:337] Failed to initialize iptables: error checking rule: exit status 4: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Apr 27 11:40:21 ose3-master.example.com openshift-node[2237]: E0427 11:40:21.661912    2237 node.go:245] WARNING: Could not modify iptables.  iptables must be mutable by this process to use services.  Do you have root permissions?
Apr 27 11:40:22 ose3-master.example.com openshift-node[2237]: I0427 11:40:22.147405    2237 failing_service_config_proxy.go:19] Failed to properly wire up services.  This can happen if you forget to launch with permissions to iptables.  Access to the following services will be impaired: "kubernetes-ro, router, database, frontend, ruby-example, docker-registry, kubernetes"

So, at minimum there's 661ms and at most 1.661s between the time openshift-sdn-node last issued an iptables command and openshift-node attempts to provision the proxies. Given this is happening after a reboot the system may be busy with other things and take longer than expected to commit iptables rules.

@smarterclayton should we file an upstream k8s issue? While openshift-sdn-node may be exacerbating the situation here, there may be a number of other scenarios where an external process is altering iptables rules and we should -w and retry, (ie: puppet runs, etc).

openshift-sdn-node doesn't contain the sdNotify stuff that allow it to signal when it's completed its start up, we should probably add that, it may help the situation but it's not the solution.

@smarterclayton
Copy link
Contributor

Ok, so everyone should be using -w then.

We should have two issues, one for kube and one for openshift-sdn. We can repurpose this one for openshift-sdn and assign to @rajatchopra, can you file the upstream?

----- Original Message -----

Looking at ogs when this fails, I suspect that openshift-sdn-node is altering
iptables rules at roughly the same time that openshift-node is attempting to
do so. We should alter both k8s and openshift-sdn-node to use iptables -w
and add re-try logic as -w will immediately return an error if it can't get
a lock.

Here's the logs that lead me to believe it's a contention issue between the
two

Apr 27 11:40:20 ose3-master.example.com openshift-sdn-node[2058]: + iptables
-t nat -D POSTROUTING -s 10.1.0.0/16 '!' -d 10.1.0.0/16 -j MASQUERADE
...
Apr 27 11:40:20 ose3-master.example.com openshift-sdn-node[2058]: + iptables
-I INPUT 2 -i lbr0 -m comment --comment 'traffic from docker' -j ACCEPT
Apr 27 11:40:21 ose3-master.example.com openshift-node[2237]: E0427
11:40:21.661746    2237 proxier.go:337] Failed to initialize iptables: error
checking rule: exit status 4: Another app is currently holding the xtables
lock. Perhaps you want to use the -w option?
Apr 27 11:40:21 ose3-master.example.com openshift-node[2237]: E0427
11:40:21.661912    2237 node.go:245] WARNING: Could not modify iptables.
iptables must be mutable by this process to use services.  Do you have root
permissions?
Apr 27 11:40:22 ose3-master.example.com openshift-node[2237]: I0427
11:40:22.147405    2237 failing_service_config_proxy.go:19] Failed to
properly wire up services.  This can happen if you forget to launch with
permissions to iptables.  Access to the following services will be impaired:
"kubernetes-ro, router, database, frontend, ruby-example, docker-registry,
kubernetes"

So, at minimum there's 661ms and at most 1.661s between the time
openshift-sdn-node last issued an iptables command and openshift-node
attempts to provision the proxies. Given this is happening after a reboot
the system may be busy with other things and take longer than expected to
commit iptables rules.

@smarterclayton should we file an upstream k8s issue? While
openshift-sdn-node may be exacerbating the situation here, there may be a
number of other scenarios where an external process is altering iptables
rules and we should -w and retry, (ie: puppet runs, etc).

openshift-sdn-node doesn't contain the sdNotify stuff that allow it to signal
when it's completed its startup, we should probably add that, it may help
the situation but it's not the solution.


Reply to this email directly or view it on GitHub:
#1922 (comment)

@brenton
Copy link
Contributor

brenton commented Apr 27, 2015

@sdodson, I think it's correct to say that with using '-w' (which all distros might not have) we need to have a reasonable default timeout. I think we'd still want retry logic as well if it wasn't too messy.

@smarterclayton, would you want retry logic added?

@rajatchopra
Copy link
Contributor

FWIW, I am putting together the code so that openshift-sdn will run as a go routine inside openshift-node, so we can event this stuff easily if we want certain order to be enforced.

@smarterclayton
Copy link
Contributor

Let's have the retry discussion upstream for the kube-proxy (for any go code). Retry on ansible etc seems like we'd potentially have races as well.

----- Original Message -----

@sdodson, I think it's correct to say that with using '-w' (which all distros
might not have) we need to have a reasonable default timeout. I think we'd
still want retry logic as well if it wasn't too messy.

@smarterclayton, would you want retry logic added?


Reply to this email directly or view it on GitHub:
#1922 (comment)

@brenton
Copy link
Contributor

brenton commented Apr 29, 2015

@thoraxe, I think we can close this now. We're tracking the iptables locking race condition upstream.

@brenton brenton closed this as completed Apr 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants