-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Default the kubelet IPTablesMasqueradeBit to the same value as the kube-proxy IPTablesMasqueradeBit #19005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default the kubelet IPTablesMasqueradeBit to the same value as the kube-proxy IPTablesMasqueradeBit #19005
Conversation
…be-proxy IPTablesMasqueradeBit
|
@sjenning Now that we just build the kubelet args, I think this is the sort of wiring pull that @openshift/sig-pod can reasonably own now. @danwinship This appears to be coupling the kubelet flags to config that is logically for a different process. Have we regressed from a previous release or is this just to make life a little easier? If it's the latter, I think I'm more inclined to try a warning or an error in validation which could move to a higher level process when we have structured config changes. |
The PR is to fix a bug in OpenShift. The "coupling" part is to avoid breaking people who were previously working around that bug. The core of the PR is just (Upstream has always required that if you override kube-proxy's iptables-masquerade-bit (which we do), then you have to also override kubelet's iptables-masquerade-bit too (which we didn't). So this just fixes that.) But some people might have already worked around the mismatch themselves. If they fixed it by adding to their node-config, then that's not a problem, it just becomes redundant with our fix. But if they fixed it by adding then our change to set kubelet's iptables-masquerade-bit to 0 now would end up breaking their previously-working configuration. So we check if they're doing that, and if so, we leave everything configured the way it was. Other than that one check, there's no coupling between the kubelet and kube-proxy args; if the user wants to change the mark bit to any other value, they need to set that value in both kubeletArguments and proxyArguments, just like they would have to upstream. I guess an alternative way to avoid breaking people would be to fix up their node-config via ansible at upgrade time. Then we could get rid of the proxyArguments check in the node setup code. |
|
/lgtm |
| // Override kubelet iptables-masquerade-bit value to match overridden kube-proxy | ||
| // iptables-masquerade-bit value, UNLESS the user has overridden kube-proxy to match the | ||
| // previously-not-overridden kubelet value, in which case we don't want to re-break them. | ||
| if len(options.ProxyArguments["iptables-masquerade-bit"]) != 1 || options.ProxyArguments["iptables-masquerade-bit"][0] != "14" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to override this flag only when they are using openshift SDN?
[i.e. when network.IsOpenShiftNetworkPlugin(options.NetworkConfig.NetworkPluginName) == true]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pravisankar no, looks like the defaulting to 0 is done unconditionally for kube-proxy config in pkg/cmd/server/kubernetes/network/options/options.go which is called from start_node.go's StartNode() whether openshift-sdn is enabled or not.
dcbw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
| // Override kubelet iptables-masquerade-bit value to match overridden kube-proxy | ||
| // iptables-masquerade-bit value, UNLESS the user has overridden kube-proxy to match the | ||
| // previously-not-overridden kubelet value, in which case we don't want to re-break them. | ||
| if len(options.ProxyArguments["iptables-masquerade-bit"]) != 1 || options.ProxyArguments["iptables-masquerade-bit"][0] != "14" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pravisankar no, looks like the defaulting to 0 is done unconditionally for kube-proxy config in pkg/cmd/server/kubernetes/network/options/options.go which is called from start_node.go's StartNode() whether openshift-sdn is enabled or not.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, dcbw, sjenning The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold |
|
I filed openshift/openshift-ansible#7589 to do the fix in ansible, but I think it would be a lot simpler to just commit this as-is (since it already got approved and lgtm anyway) and backport it, and then remove the extra check (in master only) in the future once we have the corresponding fix in ansible /hold cancel |
|
/retest |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/test all [submit-queue is verifying that this PR is safe to merge] |
|
Automatic merge from submit-queue (batch tested with PRs 19005, 19032). |
|
Should we just fail the node starting if the two IPTablesMasqueradeBit have different values? Or should we log the warning in system log than fix it silently? |
|
In theory, the kubelet code should not be looking at the kube-proxy config, and vice versa. (And in the future it's possible that they won't even be able to. That's true of the upstream code.) We probably need to do something better upstream. I'll file a bug. |
TL;DR: OpenShift is overriding IPTablesMasqueradeBit in the proxy configuration, but upstream requires that if you override it there, you have to override it in the kubelet config too, which we were not doing.
Upstream kubernetes has two separate codepaths for dealing with iptables masquerade and drop rules; one in kubelet for hostports, and one in kube-proxy for services. They attempt to coordinate with each other (by both using the same iptables chain names), but since kubelet and kube-proxy are configured separately, they have to both be given the same alternative configuration if you want to change something.
We have been overriding the IPTablesMasqueradeBit value in kube-proxy from bit 14 to bit 0 for a long time now. (I don't know why.) But we weren't overriding the corresponding value in kubelet. This means that kubelet will periodically insert a rule into the KUBE-MARK-MASQ chain to masquerade any packet with bit 14 set in its mark, but then iptables will delete that rule the next time it updates anything. This doesn't affect service or hostsubnet functioning (it just means that at certain times, packets destined to be masqueraded get marked with two separate bits, and then we masquerade them if they have either of those two bits set) but it can cause problems with other people trying to use the mark for other things (like egress IPs).
This patch fixes things by overriding the kubelet default value to match what we're overriding the kube-proxy default value to. Except that in theory, some users might have already worked around the bug in the opposite direction, by overriding kube-proxy to use the kubelet/upstream-kube-proxy default value. So if they're doing that then we use that value for both kube-proxy and kubelet to avoid breaking them. Other than that though, if they are explicitly setting the kube-proxy and kubelet arguments to different values, then this lets them lose, just like before and just like upstream. (Though maybe we shouldn't?)
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1552869