-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flannel fails after upgrade because of lock: iptables: Resource temporarily unavailable. #18637
Comments
@dhrp double checking: This cluster was configured to using proper Flannel CNI, ie. not Canal, correct? |
This seems to be combination of a race condition in iptables access between flannel and kube-proxy due to the latter not mounting the common flock file from host (PR)and upstream Flannel issue failing to recover from a failed attempt to ensure iptable rules flannel-io/flannel#988. |
We should also update Flannel to >= v0.11 to allow configuration of the iptables update interval from hardcoded 5 seconds which might lead to contention. See flannel-io/flannel#933 and flannel-io/flannel#935. |
➤ Jack Luo commented: The bug fix is validated in Rancher: master d0d79b0 Steps:
|
➤ Jack Luo commented: [~jan] Is the test in my previous comment sufficient to validate the bug fix? Are there any other cases needed to be covered? |
@JacieChao Yes, what you tested should cover the scenarios in which the race is most likely to manifest itself. |
What kind of request is this (question/bug/enhancement/feature request):
Bug
Steps to reproduce (least amount of steps as possible):
Setup where this occurred:
Result:
Networking between (some) pods stops working. In my case specifically, I could ping /some/ containers from the one of the host os's, but not others.
When checking the logs of kube-flannel it showed me several lines of:
Other details that may be helpful:
Here is what I believe caused this, and why I'm making this issue on the rancher repository:
A reboot of each node resolved the issue. (But I lost a day figuring this out).
I believe it has to do with:
Environment information
rancher/rancher
/rancher/server
image tag or shown bottom left in the UI):Rancher 2.1.1
all nodes are running:
Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-1075-aws x86_64)
Single install
Cluster information
Cluster type (Hosted/Infrastructure Provider/Custom/Imported):
Rancher created AWS EC2 cluster
Machine type (cloud/VM/metal) and specifications (CPU/memory):
AWS t3 Medium instances
Kubernetes version (use
kubectl version
):Kubernetes 1.11.3
docker version
):The text was updated successfully, but these errors were encountered: