New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-proxy generates the wrong iptables dnat rule #114537
Comments
@cyclinder: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig network |
how is this generated? /priority important-soon? |
@aojea: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @danwinship |
are we setting the
|
I'm not sure, this rule looks rather strange and is only for worker nodes, controller-node is works. |
I didn't find any trace of setting '--persistent' flag in the code, and it seems iptables doesn't support setting this flag(man iptables) |
Most of the iptables docs are in the What other software is running on the worker nodes in this cluster? This looks like the bug where kube-router was accidentally messing up kube-proxy's iptables rules due to iptables binary version mismatches. |
Oh thanks.
It looks like that's the reason, but I haven't found any other software for works nodes.
No special software here 🤔 |
you should check with Calico folks |
@aojea In fact, It was because calico failed to start on the work nodes that I discovered this problem. calico-node logs show a failed connection to "10.233.0.1".
|
from the paste there seems to be only 2 components that fiddle with iptables: kube-proxy and Calico, and we practically discarded kube-proxy, we have to discard Calico now ... maybe @caseydavenport can help us here |
I think it has nothing to do with calico, I stopped calico and restarted kube-proxy, same result, I suspect the problem is caused by incompatibility of iptables bin and kernel version |
well, there seems to be 2 players with the iptables rules, and this strange I can't see how kube-proxy can append the flag, and kube-proxy uses its own binary if it runs containerized, Last report mentioned by DanWinship due to binary incompatibility was because of kube-router #112477 (comment) , I will try to find where this |
I seem to have found the cause of the problem , I set the log_level of kube-proxy to
I found that the iptables versions on the node and in the kube-proxy container are not the same, and that the iptables rules work fine in the kube-proxy container.
So this looks like a different version of the iptables bin, which is causing the iptables-save result to be abnormal |
:/ |
Yay RHEL 7. Oof. It looks like the RHEL 7 Can you check the And can you confirm that if you add a new rule from the container, it shows up mangled in the host? eg
|
It looks like the iptables rules on the node are still broken |
So did this used to work? Did you previously have a cluster with modern-ish kubernetes running on a CentOS 7 base, and it worked, and then you something and it broke? (If so, what?) Or is this a brand new install and it failed right out of the box? |
I'm not sure. The kernel version was upgraded from CentOS 7 base. and then new install a cluster and I found it failed to iptables. Is this related to upgrading the kernel version? |
any idea what the previous kernel version was? (I can't remember... did yum have |
@danwinship sorry for the delay, I'm busy on working these days... previous kernel version should be 3.10 |
I mean exact version, so we can try to bisect if there was some kernel patch that introduced a regression. (You might be able to use |
@danwinship It seems no special here, I'm not sure how the kernel was upgraded.
|
Kernel guys suggest that this is purely an iptables-1.4 display problem; They believe the kernel has the correct representation of the rule (as seen by the fact that iptables 1.8 can consistently show it correctly), it's just that iptables 1.4 isn't displaying it correctly. So in that case, whatever bug you're hitting is somewhere else, and is unrelated to this particular iptables rule... |
Thanks @danwinship for getting to the bottom of this. I agree that the problem I'm having shouldn't be related to the iptables rule, I just found this issue by accident. Now let me close this issue. /close |
What happened?
kube-proxy generates the wrong iptables dnat rule, as shown following:
This causes the worker node to be unable to access the apiserver because of this incorrect iptables rule.
What did you expect to happen?
kube-proxy should be generates correct iptables rule
How can we reproduce it (as minimally and precisely as possible)?
And I found some hostNetwork Pod on work1 failed to start. Since these pods failed to visit apiServer, I check that the firewall has been disable.
I found kube-proxy generates incorrect iptables rule:
--to-destination :0 --persistent --to-destination :0 --persistent --to-destination 0.0.0.0 --persistent
?In controller-node-1, it works fine.
I tried to create a new service, but the iptables rules generated by kube-proxy have the same problem.
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
kubespray
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: