Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selectively disable iptables proxy's bridge-nf-call-iptables=1 behavior #20647

Conversation

dcbw
Copy link
Member

@dcbw dcbw commented Feb 4, 2016

bridge-nf-call-iptables appears to only be relevant when the containers are
attached to a Linux bridge, which is usually the case with default Kubernetes
setups, docker, and flannel. That ensures that the container traffic is
actually subject to the iptables rules since it traverses a Linux bridge
and bridged traffic is only subject to iptables when bridge-nf-call-iptables=1.

But with other networking solutions (like openshift-sdn) that don't use Linux
bridges, bridge-nf-call-iptables may not be not relevant, because iptables is
invoked at other points not involving a Linux bridge.

The decision to set bridge-nf-call-iptables should be influenced by networking
plugins in some way, but until that's plumbed through somehow, allow it to
be disabled easily.

@dcbw
Copy link
Member Author

dcbw commented Feb 4, 2016

@thockin you added the bridge-nf-call-iptables=1 stuff originally in 3a5c23d, so I assume you might have some comments here.

The back-story is that openshift-sdn uses an OVS vswitch not a Linux bridge, but at the moment we still use docker for IPAM, and the IP address assigned to the docker bridge (from which containers are IPAMed) is the same as the IP address assigned to the vswitch port where container traffic exits the OVS vswitch. openshift-sdn removes the container veth from the docker bridge and puts it into the OVS vswitch, so the docker bridge is no longer involved, and thus bridge-nf-call-iptables=1 is no longer useful. But it confuses iptables horribly since two interfaces have the same IP address.

This really should be a plugin-by-plugin decision, but we need some way in the future to plumb this through better. For now though, this PR will allow us to remove some awful hacks in openshift.

@k8s-github-robot
Copy link

Labelling this PR as size/S

@k8s-github-robot k8s-github-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 4, 2016
@k8s-bot
Copy link

k8s-bot commented Feb 4, 2016

GCE e2e build/test failed for commit bed53f09f9cbed13c694ebb94131162fd6a6be48.

@dcbw
Copy link
Member Author

dcbw commented Feb 4, 2016

Test flake is #20633

@dcbw dcbw force-pushed the allow-disabling-bridge-nf-call-iptables branch from bed53f0 to 3cb0ced Compare February 4, 2016 17:21
// to a bridge, as is usually the case. But when containers are attached
// to an SDN switch this is not useful (since there is no bridge involved)
// and it may interfere with that SDN's operation
if containersBridged {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we moved this to kubelet and looked at the network plugin instead? Or otherwise - what is the evolution plan for this? Kube-proxy doesn't really understand network plugins.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this depends on the network plugin, it seems like something the plugin interface should be able to answer, and something that has the network plugin should ask before setting. is the kubelet the right place for that to happen currently?

@thockin thockin added this to the v1.2 milestone Feb 4, 2016
@thockin thockin assigned thockin and unassigned bgrant0607 Feb 4, 2016
@k8s-bot
Copy link

k8s-bot commented Feb 4, 2016

GCE e2e build/test failed for commit 3cb0cedac03be7f2e4f6f729576bf4c0bcbc0653.

@k8s-github-robot
Copy link

PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 8, 2016
@dcbw
Copy link
Member Author

dcbw commented Feb 9, 2016

@thockin @liggitt yeah it should definitely have some interaction with networking plugins. I'll rework for that.

@dcbw dcbw force-pushed the allow-disabling-bridge-nf-call-iptables branch from 3cb0ced to fbf39fe Compare February 9, 2016 22:40
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 9, 2016
@k8s-github-robot
Copy link

Labelling this PR as size/M

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 9, 2016
@k8s-bot
Copy link

k8s-bot commented Feb 9, 2016

GCE e2e test build/test passed for commit fbf39fe10888767efede618f6704a2498a581059.

@k8s-github-robot
Copy link

The author of this PR is not in the whitelist for merge, can one of the admins add the 'ok-to-merge' label?

@dcbw
Copy link
Member Author

dcbw commented Feb 10, 2016

@thockin how about this approach?

@dcbw
Copy link
Member Author

dcbw commented Feb 19, 2016

@thockin @bprashanth PTAL, thanks!

// Load the module. It's OK if this fails (e.g. the module is not present)
// because we'll catch the error on the sysctl, which is what we actually
// care about.
exec.Command("modprobe", "br-netfilter").CombinedOutput()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you log something here in case someone already has a plugin that needs br-netfilter and they're getting it for free?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you log something here in case someone already has a plugin that needs br-netfilter and they're getting it for free?

Ok, but not quite sure what you're asking for... logging that the load failed? it's expected to fail in a bunch of systems because br-netfilter used to be built-in to the bridge module, but got split sometime in the late 3.1x timeframe. So new systems need this, older ones don't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just run modinfo and log something like: br-netfilter module [not] loaded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll doing that.

@bprashanth
Copy link
Contributor

LGTM but for the logging nit

@dcbw dcbw force-pushed the allow-disabling-bridge-nf-call-iptables branch from fbf39fe to efe2ee9 Compare February 20, 2016 16:10
@dcbw
Copy link
Member Author

dcbw commented Feb 20, 2016

@bprashanth updated with more logging as requested, PTAL, thanks!

@@ -190,14 +189,6 @@ func NewProxier(ipt utiliptables.Interface, exec utilexec.Interface, syncPeriod
return nil, fmt.Errorf("can't set sysctl %s: %v", sysctlRouteLocalnet, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this sysctl call still appropriate for the proxier to set, rather than the network plugin?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move it to a plugin when we start using plugins by default. For now I'd like to keep things as they are unless absolutely necessary. the no-op plugin should really no-op.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worse, there was a separate PR, somewhere, that made the no-op plugin go away and be a nil pointer.

@k8s-bot
Copy link

k8s-bot commented Feb 20, 2016

GCE e2e test build/test passed for commit efe2ee902334ce59b1eb013439eab78bd10e9e09.

if _, err := utilexec.New().Command("modprobe", "br-netfilter").CombinedOutput(); err != nil {
glog.V(3).Infof("Module br-netfilter not loaded")
} else {
glog.V(3).Infof("Module br-netfilter loaded")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not what I meant. I meant if someone has a CNI plugin that needs br-netfilter, and it currently works because kube-proxy is invoking the load, it will not work after this change. I was asking for the output of modinfo in kube-proxy as a clue that it isn't loaded. This logging statement won't get invoked in the case I'm talking about because we'll init CNI.

@bprashanth
Copy link
Contributor

I'm kind of sad that the no-op plugin has side effects now. But this appears to be the easiest work around for a potentially disruptive scenario, so LGTM modulo that logging comment (or you can answer with why you don't want to do it).

@dcbw dcbw force-pushed the allow-disabling-bridge-nf-call-iptables branch from efe2ee9 to 32f6f9d Compare February 22, 2016 22:18
@dcbw
Copy link
Member Author

dcbw commented Feb 22, 2016

@bprashanth @thockin is the latest more in line with what you were thinking?

@k8s-bot
Copy link

k8s-bot commented Feb 22, 2016

GCE e2e test build/test passed for commit 32f6f9db0faf362dfb12e73323129c492e889a99.

warnBrNetfilter = true
}
if warnBrNetfilter {
glog.V(3).Infof("missing br-netfilter module or unset br-nf-call-iptables; proxy may not work as intended")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this an infof? it should be a one time thing and no one runs V(3) in prod

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bprashanth done, PTAL

…ugins

bridge-nf-call-iptables appears to only be relevant when the containers are
attached to a Linux bridge, which is usually the case with default Kubernetes
setups, docker, and flannel.  That ensures that the container traffic is
actually subject to the iptables rules since it traverses a Linux bridge
and bridged traffic is only subject to iptables when bridge-nf-call-iptables=1.

But with other networking solutions (like openshift-sdn) that don't use Linux
bridges, bridge-nf-call-iptables may not be not relevant, because iptables is
invoked at other points not involving a Linux bridge.

The decision to set bridge-nf-call-iptables should be influenced by networking
plugins, so push the responsiblity out to them.  If no network plugin is
specified, fall back to the existing bridge-nf-call-iptables=1 behavior.
@dcbw dcbw force-pushed the allow-disabling-bridge-nf-call-iptables branch from 32f6f9d to 6248939 Compare February 23, 2016 15:35
@k8s-bot
Copy link

k8s-bot commented Feb 23, 2016

GCE e2e test build/test passed for commit 6248939.

@bprashanth
Copy link
Contributor

LGTM

@bprashanth bprashanth added lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-merge and removed needs-ok-to-merge labels Feb 23, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Feb 25, 2016

GCE e2e test build/test passed for commit 6248939.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

k8s-github-robot pushed a commit that referenced this pull request Feb 25, 2016
@k8s-github-robot k8s-github-robot merged commit 33ef7a9 into kubernetes:master Feb 25, 2016
dcbw added a commit to dcbw/origin that referenced this pull request Feb 25, 2016
https://trello.com/c/vnvUCQPG/112-3-remove-the-bridge-nf-call-iptables-hack-sdn-techdebt

Effectively reverts d510a76
"Fix up net.bridge.bridge-nf-call-iptables after kubernetes breaks it"
now that upstream kube PR kubernetes/kubernetes#20647
got merged.  The hack is no longer necessary.
dcbw added a commit to dcbw/origin that referenced this pull request Mar 30, 2016
https://trello.com/c/vnvUCQPG/112-3-remove-the-bridge-nf-call-iptables-hack-sdn-techdebt

Effectively reverts d510a76
"Fix up net.bridge.bridge-nf-call-iptables after kubernetes breaks it"
now that upstream kube PR kubernetes/kubernetes#20647
got merged.  The hack is no longer necessary.
dcbw added a commit to dcbw/origin that referenced this pull request Apr 15, 2016
https://trello.com/c/vnvUCQPG/112-3-remove-the-bridge-nf-call-iptables-hack-sdn-techdebt

Effectively reverts d510a76
"Fix up net.bridge.bridge-nf-call-iptables after kubernetes breaks it"
now that upstream kube PR kubernetes/kubernetes#20647
got merged.  The hack is no longer necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants