Minimizing iptables-restore input size #3454

danwinship · 2022-08-02T17:39:29Z

One-line PR description: initial proposal
Issue link: Minimizing iptables-restore input size #3453

/sig network
/assign @thockin @aojea

thockin

This should be a lock for 1.26, let's just commit to metrics and so forth.

keps/sig-network/3453-minimize-iptables-restore/README.md

thockin · 2022-09-09T22:09:15Z

keps/sig-network/3453-minimize-iptables-restore/README.md

+that would exist in larger clusters (ie, the rule sets that would
+benefit the most from the partial restore feature).
+
+One possible approach for dealing with that would be to run the


I actually really like this idea. Could we generate both the dump and the patch, apply the patch, then read it back and ensure that it matches the full dump? If it fails, set a metric and fall back on full-dump for the future. It would be heavyweight but we only need to do it when the gate is on.

@danwinship Can we pin down the plan and convert words from "possible" to "will" ?

Having tried to figure out how I would implement this, I'm now pretty convinced this is a bad idea. The "checking" code would be much much much more complicated than the code that it is checking, meaning that if it ever triggered, it would most likely indicate a bug in the checking code, not an actual bug in the syncing code. (And recent events (kubernetes/kubernetes#112477) show that iptables upstream does not consider iptables-save output to be "stable-API-like", so there's yet another source of flakes if we're trying to parse that.)

Also, we have no reason to believe that this failure mode will actually occur. I was just trying to come up with all of the ways that the code could fail, and this is one of them. But it's not an especially plausible one. Much more likely is that if there was a bug, it would involve one of the existing code branches in syncProxyRules (eg, we don't sync correctly if only the firewall rules change) and it would be caught very quickly.

keps/sig-network/3453-minimize-iptables-restore/README.md

jpbetz · 2022-09-09T22:25:32Z

keps/sig-network/3453-minimize-iptables-restore/README.md

+and if we find that it is happening (in e2e tests or real clusters),
+we can then debug or revert the bad code.
+
+#### Subtle Synchronization Delays


Any interest in changing the settings for bounded frequency runner? Or updating it to account for the fact that many syncs are expected to be much cheaper?

keps/sig-network/3453-minimize-iptables-restore/README.md

BenTheElder · 2022-09-09T22:18:32Z

keps/sig-network/3453-minimize-iptables-restore/README.md

+## Drawbacks
+
+Assuming the code is not buggy, there are no drawbacks. The new code
+would be strictly superior to the old code.


I don't think this should block the KEP at all, but if someone is touching these rules outside of kube-proxy we're more likely to clobber and correct them frequently without this change.

keps/sig-network/3453-minimize-iptables-restore/README.md

aojea · 2022-09-11T20:43:02Z

@danwinship have you talked with Phil Sutter about this?
Is not doing iptables-restore some kind of caching and diff too?

https://developers.redhat.com/blog/2020/04/27/optimizing-iptables-nft-large-ruleset-performance-in-user-space#max_out_the_receive_buffer

danwinship · 2022-09-13T19:03:55Z

Is not doing iptables-restore some kind of caching and diff too?

It does not do any diff. It fetches all of the existing chains, merges the provided data with the existing data, and then uploads all of the updated data.

iptables-legacy can't do any better than that (given the APIs available to it). In theory, iptables-nft could do something more clever, but that would be a bunch of work, and the iptables command-line tools are deprecated and not really being improved much any more.

aojea · 2022-09-14T10:15:01Z

LGTM

great reading

thockin

@danwinship Can you close out last comment threads so we can merge?

thockin · 2022-09-19T22:01:41Z

keps/sig-network/3453-minimize-iptables-restore/README.md

+that would exist in larger clusters (ie, the rule sets that would
+benefit the most from the partial restore feature).
+
+One possible approach for dealing with that would be to run the


@danwinship Can we pin down the plan and convert words from "possible" to "will" ?

khenidak · 2022-09-23T16:40:28Z

keps/sig-network/3453-minimize-iptables-restore/README.md

+`iptables-restore`. (The `KUBE-SERVICES`, `KUBE-EXTERNAL-SERVICES`,
+and `KUBE-NODEPORTS` chains are written in their entirety on every
+sync.)
+


is there is a situation where one chain update will put the system in an unpredictable state (even for a very short while)? e.g., chain KUBE-SVC-* written out to iptables but the rest of chains are yet to be written?

if so (and i fully understand how much effort it entails, I just want to make sure that we have covered all options) wouldn't it be safer to re arrange rules in a way that one service update can - only - update a service specific chain per table? that way we don't relay on how clever the code is but rather on the structure of rules themselves

I don't think we can do that without multiple writes (or some nasty pre-allocation scheme). At the root of the tree is the KUBE_SERVICES chain which has a list of conditions that mean "this packet is service X". For every service add/remove we need to add to that chain.

e.g., chain KUBE-SVC-* written out to iptables but the rest of chains are yet to be written?

Each iptables-restore is applied atomically. So the only way things would get out of sync would be if we specifically wrote out a set of out-of-sync rules.

wouldn't it be safer to re arrange rules in a way that one service update can - only - update a service specific chain per table? that way we don't relay on how clever the code is but rather on the structure of rules themselves

Using iptables-restore to load the rules essentially requires that there be a chain like KUBE-SERVICES that contains all the services. (And not using iptables-restore would absolutely destroy our performance.)

thockin · 2022-09-24T01:15:44Z

Thanks!

/lgtm
/approve

wojtek-t · 2022-09-27T11:07:29Z

I don't see any obvious way to do that... I may not have permissions?

Maybe - I just added it - item number 42:
https://github.com/orgs/kubernetes/projects/98/views/1

@thockin - FYI

rhockenbury · 2022-09-27T15:18:58Z

Hello @wojtek-t @danwinship,

The enhancement tracking board is for the enhancement tracking issues rather than the PRs. In this case, we'll want to get #3453 properly opted in by adding the lead-opted-in label. I'll go ahead and take care of that.

thockin · 2022-10-01T01:18:49Z

Thanks!

/lgtm
/approve

thockin · 2022-10-02T21:39:42Z

Needs approve from @wojtek-t I guess

wojtek-t · 2022-10-03T09:06:47Z

Needs approve from @wojtek-t I guess

@danwinship - can you please take a look at the two remaining comments by me here?

thockin · 2022-10-03T21:26:05Z

Still
/lgtm

danwinship · 2022-10-04T13:26:06Z

repushed with an UNRESOLVED section about the "partial sync is different from full sync" check, and associated update to Beta graduation criteria

wojtek-t

/lgtm
/approve PRR

I'm approving it because I want to make it ready for feature freeze and I don't want to block alpha on our discussions.
But I would like to continue this discussion somewhere.

@danwinship
I'm holding it for now, but feel free to redirect this discussion and me elsewhere and unhold this PR.

wojtek-t · 2022-10-04T13:31:10Z

keps/sig-network/3453-minimize-iptables-restore/README.md

+
+Additionally, kube-proxy will always do a full resync when there are
+topology-related changes to Node labels, and it will always do a full
+resync at least once every `iptablesSyncPeriod`.


What if we would say that we implement such comparison code but keep it disabled by default (to avoid confusing users), but enable it in our tests?

k8s-ci-robot · 2022-10-04T13:33:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, thockin, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [wojtek-t]
~~keps/sig-network/OWNERS~~ [thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot assigned aojea and thockin Aug 2, 2022

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 2, 2022

k8s-ci-robot requested review from caseydavenport and thockin August 2, 2022 17:39

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Aug 2, 2022

This was referenced Aug 2, 2022

minimize iptables-restore input kubernetes/kubernetes#110268

Merged

Minimizing iptables-restore input size #3453

Closed

danwinship force-pushed the kep-3453-minimize-iptables-restore branch from 651039c to a9796f6 Compare August 24, 2022 16:14

thockin reviewed Sep 9, 2022

View reviewed changes

keps/sig-network/3453-minimize-iptables-restore/README.md Outdated Show resolved Hide resolved

jpbetz reviewed Sep 9, 2022

View reviewed changes

BenTheElder reviewed Sep 9, 2022

View reviewed changes

aojea reviewed Sep 11, 2022

View reviewed changes

keps/sig-network/3453-minimize-iptables-restore/README.md Show resolved Hide resolved

aojea reviewed Sep 11, 2022

View reviewed changes

keps/sig-network/3453-minimize-iptables-restore/README.md Show resolved Hide resolved

aojea reviewed Sep 11, 2022

View reviewed changes

keps/sig-network/3453-minimize-iptables-restore/README.md Show resolved Hide resolved

aojea reviewed Sep 11, 2022

View reviewed changes

keps/sig-network/3453-minimize-iptables-restore/README.md Show resolved Hide resolved

danwinship force-pushed the kep-3453-minimize-iptables-restore branch from a9796f6 to 2df8697 Compare September 12, 2022 20:30

thockin reviewed Sep 19, 2022

View reviewed changes

thockin added stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status lead-opted-in Denotes that an issue has been opted in to a release labels Sep 21, 2022

danwinship force-pushed the kep-3453-minimize-iptables-restore branch from 2df8697 to 0c621e0 Compare September 23, 2022 13:54

khenidak reviewed Sep 23, 2022

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 24, 2022

wojtek-t assigned wojtek-t and unassigned thockin, aojea and wojtek-t Sep 27, 2022

rhockenbury removed stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Sep 27, 2022

k8s-ci-robot assigned thockin Oct 1, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 1, 2022

danwinship force-pushed the kep-3453-minimize-iptables-restore branch from 05eee79 to 80206a2 Compare October 3, 2022 19:21

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Oct 3, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 3, 2022

KEP-3453: initial proposal

228aac0

danwinship force-pushed the kep-3453-minimize-iptables-restore branch from 80206a2 to 228aac0 Compare October 4, 2022 13:25

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 4, 2022

wojtek-t reviewed Oct 4, 2022

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 4, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 4, 2022

k8s-ci-robot merged commit 1f23d02 into kubernetes:master Oct 4, 2022

k8s-ci-robot added this to the v1.26 milestone Oct 4, 2022

danwinship deleted the kep-3453-minimize-iptables-restore branch October 4, 2022 14:11

danwinship mentioned this pull request Oct 4, 2022

KEP-3453 minimize iptables-restore to Beta #3577

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimizing iptables-restore input size #3454

Minimizing iptables-restore input size #3454

danwinship commented Aug 2, 2022

thockin left a comment

thockin Sep 9, 2022

thockin Sep 19, 2022

danwinship Sep 21, 2022 •

edited

Loading

jpbetz Sep 9, 2022

BenTheElder Sep 9, 2022

aojea commented Sep 11, 2022

danwinship commented Sep 13, 2022

aojea commented Sep 14, 2022

thockin left a comment

thockin Sep 19, 2022

khenidak Sep 23, 2022

thockin Sep 24, 2022

danwinship Sep 26, 2022

thockin commented Sep 24, 2022

wojtek-t commented Sep 27, 2022

rhockenbury commented Sep 27, 2022

thockin commented Oct 1, 2022

thockin commented Oct 2, 2022

wojtek-t commented Oct 3, 2022

thockin commented Oct 3, 2022

danwinship commented Oct 4, 2022

wojtek-t left a comment

wojtek-t Oct 4, 2022

k8s-ci-robot commented Oct 4, 2022

Minimizing iptables-restore input size #3454

Minimizing iptables-restore input size #3454

Conversation

danwinship commented Aug 2, 2022

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danwinship Sep 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aojea commented Sep 11, 2022

danwinship commented Sep 13, 2022

aojea commented Sep 14, 2022

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin commented Sep 24, 2022

wojtek-t commented Sep 27, 2022

rhockenbury commented Sep 27, 2022

thockin commented Oct 1, 2022

thockin commented Oct 2, 2022

wojtek-t commented Oct 3, 2022

thockin commented Oct 3, 2022

danwinship commented Oct 4, 2022

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 4, 2022

danwinship Sep 21, 2022 •

edited

Loading