-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selectively disable iptables proxy's bridge-nf-call-iptables=1 behavior #20647
Selectively disable iptables proxy's bridge-nf-call-iptables=1 behavior #20647
Conversation
@thockin you added the bridge-nf-call-iptables=1 stuff originally in 3a5c23d, so I assume you might have some comments here. The back-story is that openshift-sdn uses an OVS vswitch not a Linux bridge, but at the moment we still use docker for IPAM, and the IP address assigned to the docker bridge (from which containers are IPAMed) is the same as the IP address assigned to the vswitch port where container traffic exits the OVS vswitch. openshift-sdn removes the container veth from the docker bridge and puts it into the OVS vswitch, so the docker bridge is no longer involved, and thus bridge-nf-call-iptables=1 is no longer useful. But it confuses iptables horribly since two interfaces have the same IP address. This really should be a plugin-by-plugin decision, but we need some way in the future to plumb this through better. For now though, this PR will allow us to remove some awful hacks in openshift. |
Labelling this PR as size/S |
GCE e2e build/test failed for commit bed53f09f9cbed13c694ebb94131162fd6a6be48. |
Test flake is #20633 |
bed53f0
to
3cb0ced
Compare
// to a bridge, as is usually the case. But when containers are attached | ||
// to an SDN switch this is not useful (since there is no bridge involved) | ||
// and it may interfere with that SDN's operation | ||
if containersBridged { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we moved this to kubelet and looked at the network plugin instead? Or otherwise - what is the evolution plan for this? Kube-proxy doesn't really understand network plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this depends on the network plugin, it seems like something the plugin interface should be able to answer, and something that has the network plugin should ask before setting. is the kubelet the right place for that to happen currently?
GCE e2e build/test failed for commit 3cb0cedac03be7f2e4f6f729576bf4c0bcbc0653. |
PR needs rebase |
3cb0ced
to
fbf39fe
Compare
Labelling this PR as size/M |
GCE e2e test build/test passed for commit fbf39fe10888767efede618f6704a2498a581059. |
The author of this PR is not in the whitelist for merge, can one of the admins add the 'ok-to-merge' label? |
@thockin how about this approach? |
@thockin @bprashanth PTAL, thanks! |
// Load the module. It's OK if this fails (e.g. the module is not present) | ||
// because we'll catch the error on the sysctl, which is what we actually | ||
// care about. | ||
exec.Command("modprobe", "br-netfilter").CombinedOutput() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you log something here in case someone already has a plugin that needs br-netfilter and they're getting it for free?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you log something here in case someone already has a plugin that needs br-netfilter and they're getting it for free?
Ok, but not quite sure what you're asking for... logging that the load failed? it's expected to fail in a bunch of systems because br-netfilter used to be built-in to the bridge module, but got split sometime in the late 3.1x timeframe. So new systems need this, older ones don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just run modinfo and log something like: br-netfilter module [not] loaded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll doing that.
LGTM but for the logging nit |
fbf39fe
to
efe2ee9
Compare
@bprashanth updated with more logging as requested, PTAL, thanks! |
@@ -190,14 +189,6 @@ func NewProxier(ipt utiliptables.Interface, exec utilexec.Interface, syncPeriod | |||
return nil, fmt.Errorf("can't set sysctl %s: %v", sysctlRouteLocalnet, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this sysctl call still appropriate for the proxier to set, rather than the network plugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move it to a plugin when we start using plugins by default. For now I'd like to keep things as they are unless absolutely necessary. the no-op plugin should really no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worse, there was a separate PR, somewhere, that made the no-op plugin go away and be a nil pointer.
GCE e2e test build/test passed for commit efe2ee902334ce59b1eb013439eab78bd10e9e09. |
if _, err := utilexec.New().Command("modprobe", "br-netfilter").CombinedOutput(); err != nil { | ||
glog.V(3).Infof("Module br-netfilter not loaded") | ||
} else { | ||
glog.V(3).Infof("Module br-netfilter loaded") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not what I meant. I meant if someone has a CNI plugin that needs br-netfilter, and it currently works because kube-proxy is invoking the load, it will not work after this change. I was asking for the output of modinfo in kube-proxy as a clue that it isn't loaded. This logging statement won't get invoked in the case I'm talking about because we'll init CNI.
I'm kind of sad that the no-op plugin has side effects now. But this appears to be the easiest work around for a potentially disruptive scenario, so LGTM modulo that logging comment (or you can answer with why you don't want to do it). |
efe2ee9
to
32f6f9d
Compare
@bprashanth @thockin is the latest more in line with what you were thinking? |
GCE e2e test build/test passed for commit 32f6f9db0faf362dfb12e73323129c492e889a99. |
warnBrNetfilter = true | ||
} | ||
if warnBrNetfilter { | ||
glog.V(3).Infof("missing br-netfilter module or unset br-nf-call-iptables; proxy may not work as intended") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this an infof? it should be a one time thing and no one runs V(3) in prod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bprashanth done, PTAL
…ugins bridge-nf-call-iptables appears to only be relevant when the containers are attached to a Linux bridge, which is usually the case with default Kubernetes setups, docker, and flannel. That ensures that the container traffic is actually subject to the iptables rules since it traverses a Linux bridge and bridged traffic is only subject to iptables when bridge-nf-call-iptables=1. But with other networking solutions (like openshift-sdn) that don't use Linux bridges, bridge-nf-call-iptables may not be not relevant, because iptables is invoked at other points not involving a Linux bridge. The decision to set bridge-nf-call-iptables should be influenced by networking plugins, so push the responsiblity out to them. If no network plugin is specified, fall back to the existing bridge-nf-call-iptables=1 behavior.
32f6f9d
to
6248939
Compare
GCE e2e test build/test passed for commit 6248939. |
LGTM |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e test build/test passed for commit 6248939. |
Automatic merge from submit-queue |
…tables Auto commit by PR queue bot
https://trello.com/c/vnvUCQPG/112-3-remove-the-bridge-nf-call-iptables-hack-sdn-techdebt Effectively reverts d510a76 "Fix up net.bridge.bridge-nf-call-iptables after kubernetes breaks it" now that upstream kube PR kubernetes/kubernetes#20647 got merged. The hack is no longer necessary.
https://trello.com/c/vnvUCQPG/112-3-remove-the-bridge-nf-call-iptables-hack-sdn-techdebt Effectively reverts d510a76 "Fix up net.bridge.bridge-nf-call-iptables after kubernetes breaks it" now that upstream kube PR kubernetes/kubernetes#20647 got merged. The hack is no longer necessary.
https://trello.com/c/vnvUCQPG/112-3-remove-the-bridge-nf-call-iptables-hack-sdn-techdebt Effectively reverts d510a76 "Fix up net.bridge.bridge-nf-call-iptables after kubernetes breaks it" now that upstream kube PR kubernetes/kubernetes#20647 got merged. The hack is no longer necessary.
bridge-nf-call-iptables appears to only be relevant when the containers are
attached to a Linux bridge, which is usually the case with default Kubernetes
setups, docker, and flannel. That ensures that the container traffic is
actually subject to the iptables rules since it traverses a Linux bridge
and bridged traffic is only subject to iptables when bridge-nf-call-iptables=1.
But with other networking solutions (like openshift-sdn) that don't use Linux
bridges, bridge-nf-call-iptables may not be not relevant, because iptables is
invoked at other points not involving a Linux bridge.
The decision to set bridge-nf-call-iptables should be influenced by networking
plugins in some way, but until that's plumbed through somehow, allow it to
be disabled easily.