Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iptables feature detection improvements #80368

Merged
merged 3 commits into from Aug 22, 2019

Conversation

@danwinship
Copy link
Contributor

commented Jul 19, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
The iptables code was doing version detection on the iptables binary but feature detection on the iptables-restore binary, to try to support the version of iptables in RHEL 7, which claims to be 1.4.21 but has certain features from iptables 1.6.

The problem is that this particular set of versions and checks resulted in the code passing "-w" ("wait forever for the lock") to iptables, but "-w 5" ("wait at most 5 seconds for the lock") to iptables-restore. On systems with very very many iptables rules, this could result in the kubelet periodic resyncs (which use "iptables") blocking kube-proxy (which uses "iptables-restore") and causing it to time out.

We already have code to grab the lock file by hand when using a version of iptables-restore that doesn't support "-w", and it works fine. So just use that instead, and only pass "-w 5" to iptables-restore when iptables reports a version that actually supports it.

Does this PR introduce a user-facing change?:

Fixes a problem with the iptables proxy mode that could result in long delays
updating Service/Endpoints IPs in very large clusters on RHEL/CentOS 7.

/assign @dcbw

@dcbw

This comment has been minimized.

Copy link
Member

commented Jul 19, 2019

/priority important-soon

@squeed

This comment has been minimized.

Copy link
Contributor

commented Jul 19, 2019

Dan and I were discussing whether or not it's better to do -w or -w 5. In situations where iptables takes >5 seconds to apply (which, for the record, is pretty extreme), this could result in very long delays before progress is made. On the other hand, -w 5 protects us from some various possible bugs.

@danwinship danwinship force-pushed the danwinship:iptables-checks branch from 87f4c2e to 62f4cb0 Jul 19, 2019
@danwinship

This comment has been minimized.

Copy link
Contributor Author

commented Jul 19, 2019

Casey also pointed out that the whole "only use iptables proxier if iptables supports -C" thing doesn't make sense any more anyway, since the current version of the iptables proxier code never calls ipt.EnsureRule. So I dropped that. Congratulations iptables < 1.4.11 users! You can now use the iptables proxier. (Um, do we actually still care about any distros with iptables < 1.4.11?)

@danwinship danwinship force-pushed the danwinship:iptables-checks branch from 62f4cb0 to 1231a95 Jul 19, 2019
@danwinship danwinship changed the title Don't do feature detection on the iptables-restore binary iptables feature detection improvements Jul 19, 2019
Copy link
Member

left a comment

Few general comments but generally looks good

@@ -121,11 +119,13 @@ const NoFlushTables FlushFlag = false

// Versions of iptables less than this do not support the -C / --check flag
// (test whether a rule exists).
const MinCheckVersion = "1.4.11"
var MinCheckVersion = utilversion.MustParseGeneric("1.4.11")

This comment has been minimized.

Copy link
@cmluciano

cmluciano Jul 23, 2019

Member

Why switch this from a const?

This comment has been minimized.

Copy link
@danwinship

danwinship Jul 23, 2019

Author Contributor

Can't do a function call to set a const

}
version, err := utilversion.ParseGeneric(match[1])
if err != nil {
return nil, fmt.Errorf("iptables version %q is not a valid version string: %v", match[1], err)

This comment has been minimized.

Copy link
@cmluciano

cmluciano Jul 23, 2019

Member

It might be useful to spit out an expected version for comparison or crosslink what we're expecting with more text in the error message

This comment has been minimized.

Copy link
@danwinship

danwinship Jul 23, 2019

Author Contributor

It's a "can't happen" situation anyway; we only reach this point if we regex-matched a version number in the output already, so it has to parse. If it doesn't, there's a bug in the code and the only thing a user could do would be to file a bug.

// Checks if iptables-restore has a "wait" flag
func getIPTablesRestoreWaitFlag(version *utilversion.Version) []string {
switch {
case version.AtLeast(WaitRestoreMinVersion):

This comment has been minimized.

Copy link
@cmluciano

cmluciano Jul 23, 2019

Member

Can we use an if statement here since there's only one choice?

This comment has been minimized.

Copy link
@danwinship

danwinship Jul 23, 2019

Author Contributor

ah, yeah, I copied this from getIPTablesWaitFlag where there are two branches

@danwinship danwinship force-pushed the danwinship:iptables-checks branch from 1231a95 to 84a5f3f Jul 23, 2019
@dcbw

This comment has been minimized.

Copy link
Member

commented Jul 24, 2019

@danwinship looks like some unrelated change to pkg/scheduler/apis/config/v1alpha1/zz_generated.conversion.go snuck in, can you remove that hunk? That ought to make CI happier.

@danwinship danwinship force-pushed the danwinship:iptables-checks branch from 84a5f3f to 7ab487e Jul 24, 2019
@dcbw

This comment has been minimized.

Copy link
Member

commented Jul 24, 2019

@cmluciano does the PR look better now?

Copy link
Member

left a comment

Thanks for answering all of my questions

/lgtm

The iptables code was doing version detection on the iptables binary
but feature detection on the iptables-restore binary, to try to
support the version of iptables in RHEL 7, which claims to be 1.4.21
but has certain features from iptables 1.6.

The problem is that this particular set of versions and checks
resulted in the code passing "-w" ("wait forever for the lock") to
iptables, but "-w 5" ("wait at most 5 seconds for the lock") to
iptables-restore. On systems with very very many iptables rules, this
could result in the kubelet periodic resyncs (which use "iptables")
blocking kube-proxy (which uses "iptables-restore") and causing it to
time out.

We already have code to grab the lock file by hand when using a
version of iptables-restore that doesn't support "-w", and it works
fine. So just use that instead, and only pass "-w 5" to
iptables-restore when iptables reports a version that actually
supports it.
@danwinship danwinship force-pushed the danwinship:iptables-checks branch from 7ab487e to fea2b54 Aug 1, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm label Aug 1, 2019
danwinship added 2 commits Jul 16, 2019
Kube-proxy's iptables mode used to care whether utiliptables's
EnsureRule was able to use "iptables -C" or if it had to implement it
hackily using "iptables-save". But that became irrelevant when
kube-proxy was reimplemented using "iptables-restore", and no one ever
noticed. So remove that check.
@danwinship danwinship force-pushed the danwinship:iptables-checks branch from fea2b54 to 81cd27a Aug 1, 2019
@danwinship

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

/assign @dcbw
had to rebase around kube-proxy unit test changes but should be ready to go now

@danwinship

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

/retest

@danwinship

This comment has been minimized.

Copy link
Contributor Author

commented Aug 12, 2019

@dcbw PTAL

@dcbw

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

@danwinship patch description for kube-proxy: drop iptables version check doesn't make a lot of sense to me. Isn't it more that pkg/util/iptables got updated to internally figure out whether "-C" was present or not, and work around that fact if it wasn't? And so things above that, that just call EnsureRule() don't need to care anymore?

@dcbw

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

Otherwise everything else looks good to me.

@danwinship

This comment has been minimized.

Copy link
Contributor Author

commented Aug 21, 2019

Isn't it more that pkg/util/iptables got updated to internally figure out whether "-C" was present or not, and work around that fact if it wasn't?

No, that code was always there. The issue is that EnsureRule is incredibly inefficient if utiliptables is using the workaround, so the iptables proxier refused to run in that case. But nowadays the iptables proxier never calls EnsureRule, so it doesn't matter whether it's efficient or not.

Hm... ok, actually that's not true. It still uses EnsureRule to create -j rules in INPUT/PREROUTING/etc... But that's only 8 EnsureRule calls per syncProxyRules, as opposed to potentially thousands with the original code. And no one is still running iptables 1.4.11 any more anyway...

@dcbw

This comment has been minimized.

Copy link
Member

commented Aug 22, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Aug 22, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, dcbw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dcbw

This comment has been minimized.

Copy link
Member

commented Aug 22, 2019

/test pull-kubernetes-bazel-test

@k8s-ci-robot k8s-ci-robot merged commit 37651f1 into kubernetes:master Aug 22, 2019
23 checks passed
23 checks passed
cla/linuxfoundation danwinship authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 22, 2019
@danwinship danwinship deleted the danwinship:iptables-checks branch Sep 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.