-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more iptables proxy cleanups #106269
more iptables proxy cleanups #106269
Conversation
The iptables and ipvs proxiers both had a check that none of the elements of svcInfo.LoadBalancerIPStrings() were "", but that was already guaranteed by the svcInfo code. Drop the unnecessary checks and remove a level of indentation.
If GetNodeAddresses() fails (eg, because you passed the wrong CIDR to `--nodeport-addresses`), then any NodePort services would end up with only half a set of iptables rules. Fix it to just not output the NodePort-specific parts in that case (in addition to logging an error about the GetNodeAddresses() failure).
If you pass just an IP address to "-s" or "-d", the iptables command will fill in the correct mask automatically. Originally, the proxier was just hardcoding "/32" for all of these, which was unnecessary but simple. But when IPv6 support was added, the code was made more complicated to deal with the fact that the "/32" needed to be "/128" in the IPv6 case, so it would parse the IPs to figure out which family they were, which in turn involved adding some checks in case the parsing fails (even though that "can't happen" and the old code didn't check for invalid IPs, even though that would break the iptables-restore if there had been any). Anyway, all of that is unnecessary because we can just pass the IP strings to iptables directly rather than parsing and unparsing them first. (The diff to proxier_test.go is just deleting "/32" everywhere.)
@danwinship: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM commit "Remove an unnecessary check"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit "fix a bug in node address error handling"
Something I have had on my mind for a while. What if we included errors IN the iptables rules? Use the comment and a no-op rule, e.g.
-A KUBE-NODEPORTS -m comment --comment "ns-name/svc-name ERROR: failed to install NodePort rules because no node IPs" -j KUBE-NOOP
-A KUBE-NOOP return
I'm not sure it's a GREAT idea, but it's somewhat more visible...
Maybe we also need better metrics about errors?
@@ -1668,3 +1610,38 @@ func (proxier *Proxier) openPort(lp netutils.LocalPort, replacementPortsMap map[ | |||
klog.V(2).InfoS("Opened local port", "port", lp) | |||
replacementPortsMap[lp] = socket | |||
} | |||
|
|||
func (proxier *Proxier) writeServiceToEndpointRules(svcNameString string, svcInfo proxy.ServicePort, svcChain utiliptables.Chain, endpointChains []utiliptables.Chain, args []string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any possibility these rules will diverge in the future with the new features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm... not with any of the pending new features, but I guess it's possible they could diverge with future new features
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM commit "Remove unnecessary /32 and /128 in iptables rules"
ISTR this was added to make the iptables-save that we emit "more canonical" and therefore easier to diff against running state. But then I did some spelunking and this goes all the way back to 2015 and the very first commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit "Abstract out shared OpenLocalPort code" - just one nit
I'm not sure anyone would ever see them...
or Events? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit "Abstract out code for writing service-chain-to-endpoint" LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit "add a unit test for the comment elision code" LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit "simplify comment elision code" one question
pkg/proxy/iptables/proxier.go
Outdated
// Update client-affinity lists. | ||
if svcInfo.SessionAffinityType() == v1.ServiceAffinityClientIP { | ||
args = append(args, "-m", "recent", "--name", string(endpointChain), "--set") | ||
} | ||
// DNAT to final destination. | ||
args = append(args, "-m", protocol, "-p", protocol, "-j", "DNAT", "--to-destination", epInfo.Endpoint) | ||
proxier.natRules.Write(args) | ||
proxier.natRules.Write( | ||
"-A", string(endpointChain), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried in some other places not to repeat the "-A <chain" when not needed, for fear of making a mistake. It's not SUCH a big deal I guess. This could also be:
args = append(args[:0], "-A", string(endpointChain))
args = append(proxier.commentIfBelowThreshold(svcNameString)...)
I'm fine doing it your way if you think the win is worthwhile. the point of the comment was to make iptables-save | grep
useful. In an earlier comment I suggested maybe comment-only rules. That could help maybe?
-A KUBE-NOOP -m comment --comment "this chain does nothing" -j RETURN
-A KUBE-SEP-5QBLSAUHS7VAMKHU -m comment --comment "kube-system/metrics-server" -j KUBE-NOOP
-A KUBE-SEP-5QBLSAUHS7VAMKHU -s 10.64.3.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-5QBLSAUHS7VAMKHU -p tcp -m comment -m tcp -j DNAT ...
-A KUBE-SEP-5QBLSAUHS7VAMKHU -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-NOOP
-A KUBE-SEP-6IIMKUPVCRULMU5Y -s 10.64.1.9/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-6IIMKUPVCRULMU5Y -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-6QKF7KDE3IMNRERG -m comment --comment "default/hostnames" -j KUBE-NOOP
-A KUBE-SEP-6QKF7KDE3IMNRERG -s 10.64.2.7/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-6QKF7KDE3IMNRERG -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-7X3ZD6IAFJB7UYBN -m comment --comment "kube-system/kube-dns:dns" -j KUBE-NOOP
-A KUBE-SEP-7X3ZD6IAFJB7UYBN -s 10.64.2.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-7X3ZD6IAFJB7UYBN -p udp -m udp -j DNAT ...
-A KUBE-SEP-CAIWW7T7PX3SGADF -m comment --comment "default/kubernetes:https" -j KUBE-NOOP
-A KUBE-SEP-CAIWW7T7PX3SGADF -s 104.155.184.204/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-CAIWW7T7PX3SGADF -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-CPTDVYQW6QHZEJTA -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-NOOP
-A KUBE-SEP-CPTDVYQW6QHZEJTA -s 10.64.2.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-CPTDVYQW6QHZEJTA -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-DERWY4MTP2RTOITK -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-NOOP
-A KUBE-SEP-DERWY4MTP2RTOITK -s 10.64.1.9/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-DERWY4MTP2RTOITK -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-HIAAPJKAI3E74A5J -m comment --comment "kube-system/kube-dns:dns" -j KUBE-NOOP
-A KUBE-SEP-HIAAPJKAI3E74A5J -s 10.64.1.9/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-HIAAPJKAI3E74A5J -p udp -m udp -j DNAT ...
-A KUBE-SEP-J2STD6VR5QCDF6IS -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-NOOP
-A KUBE-SEP-J2STD6VR5QCDF6IS -s 10.64.2.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-J2STD6VR5QCDF6IS -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-SIPEKJQGVNBU5CE6 -m comment --comment "default/hostnames" -j KUBE-NOOP
-A KUBE-SEP-SIPEKJQGVNBU5CE6 -s 10.64.3.9/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-SIPEKJQGVNBU5CE6 -p tcp -m tcp -j DNAT ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an earlier comment I suggested maybe comment-only rules. That could help maybe?
That would mean more total rules, which is definitely a bad thing... sometimes... ("More rules" means both slightly slower save/restore and probably also slower "runtime"; I'm pretty sure the -j KUBE-NOOP
won't get optimized out and the kernel will actually waste time figuring out it has to do nothing each time it hits one.) But then, having only one copy of each comment instead of multiple would probably make save/restore slightly faster... we'd have to do perf testing to figure out if it ended up being a net win or loss.
I had also been thinking about changing it to only comment on one rule in each chain. eg:
-A KUBE-SEP-5QBLSAUHS7VAMKHU -s 10.64.3.3/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-5QBLSAUHS7VAMKHU -m comment --comment "kube-system/metrics-server" -p tcp -m comment -m tcp -j DNAT ...
-A KUBE-SEP-5QBLSAUHS7VAMKHU -s 10.64.1.9/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-6IIMKUPVCRULMU5Y -m comment --comment "kube-system/kube-dns:metrics" -p tcp -m tcp -j DNAT ...
-A KUBE-SEP-6QKF7KDE3IMNRERG -s 10.64.2.7/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-6QKF7KDE3IMNRERG -m comment --comment "default/hostnames" -p tcp -m tcp -j DNAT ...
Both of these approaches are much less iptables-save | grep
-friendly than the current system...
We frequently send people to
Imagine 5000 nodes all sending an event for the same error |
I am all the way thru this, will watch for push and comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM "fix a bug in node address error handling"
Also, in the NodePort code, fix it to properly take advantage of the fact that GetNodeAddresses() guarantees that if it returns a "match-all" CIDR, then it doesn't return anything else. That also makes it unnecessary to loop over the node addresses twice.
…nt-chain rules The same code appeared twice, once for the SVC chain and once for the XLB chain, with the only difference being that the XLB version had more verbose comments.
494b5e0
to
f0287b4
Compare
I removed the comment elision change since we both had some concerns... we can think about it more later. (Left the unit test in anyway though.) So the only change was adding a comment about nodeAddresses |
Thanks! /lgtm Bring back comment elision commit as a new PR :) |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The job failure is legit,.some gofmt issue |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
1 similar comment
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
f0287b4
to
a4e6d2f
Compare
args = append(args[:0], | ||
"-A", string(svcXlbChain), | ||
"-m", "comment", "--comment", | ||
fmt.Sprintf(`"Balancing rule %d for %s"`, i, svcNameString), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship do we care that we're losing a bit of context in the rule comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no; that was only for the rules from the XLB chain (ie, external endpoints of externalTrafficPolicy: Local
services). We didn't have a comment like that in the much more common case of rules from the SVC chain.
(Or maybe we used to have a comment like this in the SVC case, and it got copied to the XLB case, and then later got removed from the SVC case but accidentally not the XLB case as part of the reducing-comment-memory-usage change?)
Just one small nit/comment that I don't think really matters, so: |
What type of PR is this?
/kind cleanup
/sig network
/priority important-soon
What this PR does / why we need it:
Further iptables proxy cleanups; we may want to merge only a subset of these commits.
In particular, "
proxy/iptables: Abstract out code for writing service-chain-to-endpoint-chain rules
" gets rid of duplicated code, but it moves some important logic out ofsyncProxyRules
into a helper, which may be considered a net comprehensibility loss. (But FWIW this is the change that made me notice #106030.) (I guess another possibility would be to makewriteServiceToEndpointRules
be an inline function insidesyncProxyRules
, so the actual code would be there right before the first call to it.)(OTOH, "
proxy/iptables: Abstract out shared OpenLocalPort code
" seems definitely like a win, since it's doing something separate from the iptables rule generation.)"
proxy/iptables: simplify comment elision code
" also may or may not be considered a good idea, and I don't like how we end up with an extra space in the output there, but it would be a little ugly to fix, especially if you want the code to even handle cases like"foo", []string{""}, "bar"
→"foo bar"
Does this PR introduce a user-facing change?
/assign @thockin @aojea