Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more iptables proxy cleanups #106269

Merged
merged 6 commits into from
Nov 10, 2021

Conversation

danwinship
Copy link
Contributor

What type of PR is this?

/kind cleanup
/sig network
/priority important-soon

What this PR does / why we need it:

Further iptables proxy cleanups; we may want to merge only a subset of these commits.

In particular, "proxy/iptables: Abstract out code for writing service-chain-to-endpoint-chain rules" gets rid of duplicated code, but it moves some important logic out of syncProxyRules into a helper, which may be considered a net comprehensibility loss. (But FWIW this is the change that made me notice #106030.) (I guess another possibility would be to make writeServiceToEndpointRules be an inline function inside syncProxyRules, so the actual code would be there right before the first call to it.)

(OTOH, "proxy/iptables: Abstract out shared OpenLocalPort code" seems definitely like a win, since it's doing something separate from the iptables rule generation.)

"proxy/iptables: simplify comment elision code" also may or may not be considered a good idea, and I don't like how we end up with an extra space in the output there, but it would be a little ugly to fix, especially if you want the code to even handle cases like "foo", []string{""}, "bar""foo bar"

Does this PR introduce a user-facing change?

NONE

/assign @thockin @aojea

The iptables and ipvs proxiers both had a check that none of the
elements of svcInfo.LoadBalancerIPStrings() were "", but that was
already guaranteed by the svcInfo code. Drop the unnecessary checks
and remove a level of indentation.
If GetNodeAddresses() fails (eg, because you passed the wrong CIDR to
`--nodeport-addresses`), then any NodePort services would end up with
only half a set of iptables rules. Fix it to just not output the
NodePort-specific parts in that case (in addition to logging an error
about the GetNodeAddresses() failure).
If you pass just an IP address to "-s" or "-d", the iptables command
will fill in the correct mask automatically.

Originally, the proxier was just hardcoding "/32" for all of these,
which was unnecessary but simple. But when IPv6 support was added, the
code was made more complicated to deal with the fact that the "/32"
needed to be "/128" in the IPv6 case, so it would parse the IPs to
figure out which family they were, which in turn involved adding some
checks in case the parsing fails (even though that "can't happen" and
the old code didn't check for invalid IPs, even though that would
break the iptables-restore if there had been any).

Anyway, all of that is unnecessary because we can just pass the IP
strings to iptables directly rather than parsing and unparsing them
first.

(The diff to proxier_test.go is just deleting "/32" everywhere.)
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. sig/network Categorizes an issue or PR as relevant to SIG Network. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 9, 2021
@k8s-ci-robot
Copy link
Contributor

@danwinship: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 9, 2021
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM commit "Remove an unnecessary check"

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit "fix a bug in node address error handling"

Something I have had on my mind for a while. What if we included errors IN the iptables rules? Use the comment and a no-op rule, e.g.

-A KUBE-NODEPORTS -m comment --comment "ns-name/svc-name ERROR: failed to install NodePort rules because no node IPs" -j KUBE-NOOP
-A KUBE-NOOP return

I'm not sure it's a GREAT idea, but it's somewhat more visible...

Maybe we also need better metrics about errors?

pkg/proxy/iptables/proxier.go Show resolved Hide resolved
@@ -1668,3 +1610,38 @@ func (proxier *Proxier) openPort(lp netutils.LocalPort, replacementPortsMap map[
klog.V(2).InfoS("Opened local port", "port", lp)
replacementPortsMap[lp] = socket
}

func (proxier *Proxier) writeServiceToEndpointRules(svcNameString string, svcInfo proxy.ServicePort, svcChain utiliptables.Chain, endpointChains []utiliptables.Chain, args []string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any possibility these rules will diverge in the future with the new features?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... not with any of the pending new features, but I guess it's possible they could diverge with future new features

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM commit "Remove unnecessary /32 and /128 in iptables rules"

ISTR this was added to make the iptables-save that we emit "more canonical" and therefore easier to diff against running state. But then I did some spelunking and this goes all the way back to 2015 and the very first commits.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit "Abstract out shared OpenLocalPort code" - just one nit

pkg/proxy/iptables/proxier.go Show resolved Hide resolved
@danwinship
Copy link
Contributor Author

What if we included errors IN the iptables rules?

I'm not sure anyone would ever see them...

Maybe we also need better metrics about errors?

or Events?

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit "Abstract out code for writing service-chain-to-endpoint" LGTM

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit "add a unit test for the comment elision code" LGTM

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit "simplify comment elision code" one question

// Update client-affinity lists.
if svcInfo.SessionAffinityType() == v1.ServiceAffinityClientIP {
args = append(args, "-m", "recent", "--name", string(endpointChain), "--set")
}
// DNAT to final destination.
args = append(args, "-m", protocol, "-p", protocol, "-j", "DNAT", "--to-destination", epInfo.Endpoint)
proxier.natRules.Write(args)
proxier.natRules.Write(
"-A", string(endpointChain),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried in some other places not to repeat the "-A <chain" when not needed, for fear of making a mistake. It's not SUCH a big deal I guess. This could also be:

args = append(args[:0], "-A", string(endpointChain))		
args = append(proxier.commentIfBelowThreshold(svcNameString)...)

I'm fine doing it your way if you think the win is worthwhile. the point of the comment was to make iptables-save | grep useful. In an earlier comment I suggested maybe comment-only rules. That could help maybe?

-A KUBE-NOOP -m comment --comment "this chain does nothing" -j RETURN

-A KUBE-SEP-5QBLSAUHS7VAMKHU -m comment --comment "kube-system/metrics-server" -j KUBE-NOOP
-A KUBE-SEP-5QBLSAUHS7VAMKHU -s 10.64.3.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-5QBLSAUHS7VAMKHU -p tcp -m comment -m tcp -j DNAT ...
-A KUBE-SEP-5QBLSAUHS7VAMKHU -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-NOOP
-A KUBE-SEP-6IIMKUPVCRULMU5Y -s 10.64.1.9/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-6IIMKUPVCRULMU5Y -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-6QKF7KDE3IMNRERG -m comment --comment "default/hostnames" -j KUBE-NOOP
-A KUBE-SEP-6QKF7KDE3IMNRERG -s 10.64.2.7/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-6QKF7KDE3IMNRERG -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-7X3ZD6IAFJB7UYBN -m comment --comment "kube-system/kube-dns:dns" -j KUBE-NOOP
-A KUBE-SEP-7X3ZD6IAFJB7UYBN -s 10.64.2.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-7X3ZD6IAFJB7UYBN -p udp -m udp -j DNAT ...
-A KUBE-SEP-CAIWW7T7PX3SGADF -m comment --comment "default/kubernetes:https" -j KUBE-NOOP
-A KUBE-SEP-CAIWW7T7PX3SGADF -s 104.155.184.204/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-CAIWW7T7PX3SGADF -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-CPTDVYQW6QHZEJTA -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-NOOP
-A KUBE-SEP-CPTDVYQW6QHZEJTA -s 10.64.2.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-CPTDVYQW6QHZEJTA -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-DERWY4MTP2RTOITK -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-NOOP
-A KUBE-SEP-DERWY4MTP2RTOITK -s 10.64.1.9/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-DERWY4MTP2RTOITK -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-HIAAPJKAI3E74A5J -m comment --comment "kube-system/kube-dns:dns" -j KUBE-NOOP
-A KUBE-SEP-HIAAPJKAI3E74A5J -s 10.64.1.9/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-HIAAPJKAI3E74A5J -p udp -m udp -j DNAT ...
-A KUBE-SEP-J2STD6VR5QCDF6IS -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-NOOP
-A KUBE-SEP-J2STD6VR5QCDF6IS -s 10.64.2.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-J2STD6VR5QCDF6IS -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-SIPEKJQGVNBU5CE6 -m comment --comment "default/hostnames" -j KUBE-NOOP
-A KUBE-SEP-SIPEKJQGVNBU5CE6 -s 10.64.3.9/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-SIPEKJQGVNBU5CE6 -p tcp -m tcp -j DNAT ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an earlier comment I suggested maybe comment-only rules. That could help maybe?

That would mean more total rules, which is definitely a bad thing... sometimes... ("More rules" means both slightly slower save/restore and probably also slower "runtime"; I'm pretty sure the -j KUBE-NOOP won't get optimized out and the kernel will actually waste time figuring out it has to do nothing each time it hits one.) But then, having only one copy of each comment instead of multiple would probably make save/restore slightly faster... we'd have to do perf testing to figure out if it ended up being a net win or loss.

I had also been thinking about changing it to only comment on one rule in each chain. eg:

-A KUBE-SEP-5QBLSAUHS7VAMKHU -s 10.64.3.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-5QBLSAUHS7VAMKHU -m comment --comment "kube-system/metrics-server" -p tcp -m comment -m tcp -j DNAT ...
-A KUBE-SEP-5QBLSAUHS7VAMKHU -s 10.64.1.9/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-6IIMKUPVCRULMU5Y -m comment --comment "kube-system/kube-dns:metrics" -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-6QKF7KDE3IMNRERG -s 10.64.2.7/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-6QKF7KDE3IMNRERG -m comment --comment "default/hostnames" -p tcp -m tcp -j DNAT ...

Both of these approaches are much less iptables-save | grep-friendly than the current system...

@thockin
Copy link
Member

thockin commented Nov 9, 2021

What if we included errors IN the iptables rules?

I'm not sure anyone would ever see them...

We frequently send people to iptables-save for debug. Seeing an error in there might help?

Maybe we also need better metrics about errors?

or Events?

Imagine 5000 nodes all sending an event for the same error

@thockin
Copy link
Member

thockin commented Nov 9, 2021

I am all the way thru this, will watch for push and comments

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM "fix a bug in node address error handling"

Also, in the NodePort code, fix it to properly take advantage of the
fact that GetNodeAddresses() guarantees that if it returns a
"match-all" CIDR, then it doesn't return anything else. That also
makes it unnecessary to loop over the node addresses twice.
…nt-chain rules

The same code appeared twice, once for the SVC chain and once for the
XLB chain, with the only difference being that the XLB version had
more verbose comments.
@danwinship
Copy link
Contributor Author

I removed the comment elision change since we both had some concerns... we can think about it more later. (Left the unit test in anyway though.)

So the only change was adding a comment about nodeAddresses

@thockin
Copy link
Member

thockin commented Nov 10, 2021

Thanks!

/lgtm
/approve

Bring back comment elision commit as a new PR :)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 10, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aojea
Copy link
Member

aojea commented Nov 10, 2021

The job failure is legit,.some gofmt issue

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

1 similar comment
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 10, 2021
args = append(args[:0],
"-A", string(svcXlbChain),
"-m", "comment", "--comment",
fmt.Sprintf(`"Balancing rule %d for %s"`, i, svcNameString),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danwinship do we care that we're losing a bit of context in the rule comment here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no; that was only for the rules from the XLB chain (ie, external endpoints of externalTrafficPolicy: Local services). We didn't have a comment like that in the much more common case of rules from the SVC chain.

(Or maybe we used to have a comment like this in the SVC case, and it got copied to the XLB case, and then later got removed from the SVC case but accidentally not the XLB case as part of the reducing-comment-memory-usage change?)

@dcbw
Copy link
Member

dcbw commented Nov 10, 2021

Just one small nit/comment that I don't think really matters, so:
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 10, 2021
@k8s-ci-robot k8s-ci-robot merged commit d1f8463 into kubernetes:master Nov 10, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Nov 10, 2021
@danwinship danwinship deleted the iptables-proxy-cleanup branch November 19, 2021 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ipvs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/network Categorizes an issue or PR as relevant to SIG Network. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants