New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't create no-op iptables rules for services with no endpoints #57461

Merged
merged 1 commit into from Feb 23, 2018

Conversation

Projects
None yet
7 participants
@danwinship
Contributor

danwinship commented Dec 20, 2017

Currently for all services we create -t nat -A KUBE-SERVICES rules that match the destination IPs (ClusterIP, ExternalIP, NodePort IPs, etc) and then jump to the appropriate KUBE-SVC-XXXXXX chain. But if the service has no endpoints then the KUBE-SVC-XXXXXX chain will be empty and so nothing happens except that we wasted time (a) forcing iptables-restore to parse the match rules, and (b) forcing the kernel to test matches that aren't going to have any effect.

This PR gets rid of the match rules in this case. Which is to say, it changes things so that every incoming service packet is matched either by nat rules to rewrite it or by filter rules to ICMP reject it, but not both. (Actually, that's not quite true: there are no filter rules to reject Ingress-addressed packets, and I think that's a bug?)

I also got rid of some comments that seemed redundant.

The patch is mostly reindentation, so best viewed with diff -w.

Partial fix for #56842 / Related to #56164 (which it conflicts with but I'll fix that after one or the other merges).

Release note:

Removed some redundant rules created by the iptables proxier, to improve performance on systems with very many services.
@danwinship

This comment has been minimized.

Contributor

danwinship commented Jan 22, 2018

/retest

1 similar comment
@cmluciano

This comment has been minimized.

Member

cmluciano commented Jan 23, 2018

/retest

writeLine(proxier.natRules, append(args, "-j", string(svcXlbChain))...)
}
} else {
if len(proxier.endpointsMap[svcName]) == 0 {

This comment has been minimized.

@dcbw

dcbw Feb 6, 2018

Member

Is this len(proxier.endpointsMap[svcName]) not redundant with the "if hasEndpoints/else" here? hasEndpoints already evaluates to len(proxier.endpointsMap[svcName]) > 0...

This comment has been minimized.

@danwinship

danwinship Feb 6, 2018

Contributor

yup, fixed

}
writeLine(proxier.natRules, append(args, "-j", string(svcChain))...)
} else {
writeLine(proxier.filterRules,

This comment has been minimized.

@dcbw

dcbw Feb 6, 2018

Member

The 'args' here are mostly the same as the args just above, with the exception of -j REJECT and the comment. Any chance to just have a common args and have each one then append what they need in their own block?

This comment has been minimized.

@danwinship

danwinship Feb 6, 2018

Contributor

There is already a lot of redundancy with args anyway:

"-m", protocol, "-p", protocol,
"-d", utilproxy.ToCIDR(svcInfo.clusterIP),
"--dport", strconv.Itoa(svcInfo.port),

vs

"-m", protocol, "-p", protocol,
"-d", utilproxy.ToCIDR(net.ParseIP(externalIP)),
"--dport", strconv.Itoa(svcInfo.port),

vs

"-m", protocol, "-p", protocol,
"-d", utilproxy.ToCIDR(net.ParseIP(ingress.IP)),
"--dport", strconv.Itoa(svcInfo.port),

etc. I think this can wait for a future cleanup. (And the fact that we need to have other args both before and after the shared part makes it not very clean to try to share part of the string array anyway.)

This comment has been minimized.

@dcbw
@smarterclayton

This comment has been minimized.

Contributor

smarterclayton commented Feb 14, 2018

Are there no test cases that cover this area where we can prove this works?

@danwinship

This comment has been minimized.

Contributor

danwinship commented Feb 14, 2018

There are unit tests in pkg/proxy/iptables/proxier_test.go that check that rewrite rules are correctly written for services with endpoints, and reject rules are correctly written for services without endpoints, and those tests continue to pass without having needed any changes, because this patch only removes rules that we weren't checking for (because they're irrelevant).

There are plenty of e2e tests that make sure that services with endpoints work. There doesn't seem to be a comprehensive set of tests that services without endpoints fail immediately rather than slowly timing out, so I could add some more of those.

@dcbw

This comment has been minimized.

Member

dcbw commented Feb 21, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 21, 2018

@danwinship

This comment has been minimized.

Contributor

danwinship commented Feb 22, 2018

/assign @thockin

@thockin

This comment has been minimized.

Member

thockin commented Feb 23, 2018

Is this really a big enough issue to warrant changing anything?

@thockin

Ran out of time right now, this is top of my queue.

@@ -854,30 +856,38 @@ func (proxier *Proxier) syncProxyRules() {
writeLine(proxier.natChains, utiliptables.MakeChainLine(svcXlbChain))
}
activeNATChains[svcXlbChain] = true
} else if activeNATChains[svcXlbChain] {

This comment has been minimized.

@thockin

thockin Feb 23, 2018

Member

What happened to this?

This comment has been minimized.

@danwinship

danwinship Feb 23, 2018

Contributor

AFAICT, there's no way activeNATChains[svcXlbChain] could be already set at this point, since activeNATChains starts out empty, no other service would have the same svcXlbChain name, and no other part of the loop modifies activeNATChains[svcXlbChain]. It looks like cruft from an earlier version of the code.

@@ -913,33 +923,32 @@ func (proxier *Proxier) syncProxyRules() {
}
replacementPortsMap[lp] = socket
}
} // We're holding the port, so it's OK to install iptables rules.

This comment has been minimized.

@thockin

thockin Feb 23, 2018

Member

Reason to remove this comment?

This comment has been minimized.

@danwinship

danwinship Feb 23, 2018

Contributor

There's already a comment explaining the need to hold the port at the start of that block (https://github.com/danwinship/kubernetes/blob/07ead7d8/pkg/proxy/iptables/proxier.go#L894), so the comment at the end seemed redundant. I'll add it back if you want though.

@danwinship

This comment has been minimized.

Contributor

danwinship commented Feb 23, 2018

Is this really a big enough issue to warrant changing anything?

TBH, I'm not 100% sure. This is mostly supposed to be fixing #56842, as an alternative version of a patch they wrote that I didn't like (it required reserving another bit in the iptables mark), but then the filer of that bug wasn't able to test it.

At any rate, without this patch (which, as mentioned in the original comment, is mostly just reindentation), we are definitely adding thousands of no-op iptables rules on some clusters.

@danwinship

This comment has been minimized.

Contributor

danwinship commented Feb 23, 2018

#60306 should also probably fix #56842, and is more critical for OpenShift, so if you're dubious about this PR we could merge just that for now and revisit this later.

@thockin

This comment has been minimized.

Member

thockin commented Feb 23, 2018

/approve

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Feb 23, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, dcbw, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Feb 23, 2018

Automatic merge from submit-queue (batch tested with PRs 55637, 57461, 60268, 60290, 60210). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit e6c2a5d into kubernetes:master Feb 23, 2018

13 checks passed

Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation danwinship authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-unit Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details

openshift-merge-robot added a commit to openshift/origin that referenced this pull request Feb 27, 2018

Merge pull request #18754 from danwinship/upstream-iptables-fixes
Automatic merge from submit-queue (batch tested with PRs 18754, 18761).

kube-proxy iptables performance fixes

Pull in multiple upstream iptables fixes to improve performance in "very large clusters" (ie, Online).

Includes kubernetes/kubernetes#57336, kubernetes/kubernetes#56164, kubernetes/kubernetes#57461, and kubernetes/kubernetes#60306.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1514174

openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Feb 27, 2018

Merge pull request kubernetes#18754 from danwinship/upstream-iptables…
…-fixes

Automatic merge from submit-queue (batch tested with PRs 18754, 18761).

kube-proxy iptables performance fixes

Pull in multiple upstream iptables fixes to improve performance in "very large clusters" (ie, Online).

Includes kubernetes#57336, kubernetes#56164, kubernetes#57461, and kubernetes#60306.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1514174

Origin-commit: e2e14cb4fe6a6789936da736d627ae96ca822116

openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Mar 5, 2018

Merge pull request kubernetes#18754 from danwinship/upstream-iptables…
…-fixes

Automatic merge from submit-queue (batch tested with PRs 18754, 18761).

kube-proxy iptables performance fixes

Pull in multiple upstream iptables fixes to improve performance in "very large clusters" (ie, Online).

Includes kubernetes#57336, kubernetes#56164, kubernetes#57461, and kubernetes#60306.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1514174

Origin-commit: e2e14cb4fe6a6789936da736d627ae96ca822116

openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Mar 23, 2018

Merge pull request kubernetes#18754 from danwinship/upstream-iptables…
…-fixes

Automatic merge from submit-queue (batch tested with PRs 18754, 18761).

kube-proxy iptables performance fixes

Pull in multiple upstream iptables fixes to improve performance in "very large clusters" (ie, Online).

Includes kubernetes#57336, kubernetes#56164, kubernetes#57461, and kubernetes#60306.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1514174

Origin-commit: e2e14cb4fe6a6789936da736d627ae96ca822116

@danwinship danwinship deleted the danwinship:proxier-no-dummy-nat-rules branch Mar 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment