Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lean iptables updates #324

Merged
merged 11 commits into from
Mar 28, 2023
Merged

Lean iptables updates #324

merged 11 commits into from
Mar 28, 2023

Conversation

dajudge
Copy link
Contributor

@dajudge dajudge commented Jul 26, 2022

Why

We were repeatedly expecting connectivity issues when scaling larger clusters up/down and believe it's related to the way kilo applies changes to the iptables when nodes are added/removed: in many cases that resulted in many rules (or even entire chains) being dropped and recreated during scaling.

Idea

After taking a closer look at the iptables rules created by kilo we figured we have identified two different classes of the same:

  1. some rules need to be applied strictly in order and always at the end of a chain (e.g. the FORWARD to KILO-IPIP and DROP rule in the INPUT chain - also e.g. the MASQUERADE rule at the end of the KILO-NAT chain)
  2. other rules don't have a strict ordering requirement (apart from having to be in front of the rules that need to be at the end of a chain where applicable)

How

So we started preparing this patch that puts the different rules in separate slices and

  1. first applies the order-sensitive/append-only rules using the existing mechanics
  2. then inserts the 2nd class of rules each one at the beginning of the respective chain if they're not already there - regardless of order
  3. removes any rules of the 2nd class that are not required anymore

chain rules are treated as order-sensitive/append-only which does not matter a lot beyond being applied in the first wave of changes

Results

That way we were able to accomplish the following:

  1. the first class of rules gets only applied once because they don't change after the first reconciliation (it's only the chains and the static rules at the end of some chains)
  2. for updates of the 2nd class of rules (they tend to be per ip/subnet) already existing rules never get dropped unless they need to be removed

We were able to confirm that this reduces the number of mutating iptables operations after the initial iptables configuration to the absolute minimum required.

@squat
Copy link
Owner

squat commented Jul 26, 2022

In general I think this is an important priority for Kilo. Another way I've considered solving this in the past is by using ipsets and atomically swapping out the IP addresses in the sets so that Kilo does not have to change rules so often. This means that whenever new Nodes or Peers join a cluster, this would instead require an "instantaneous" ipset operation rather than changing iptables rules. This would bring other performance benefits with it as the evaluation of a firewall rule containing ipset is faster than evaluating many independent iptables rules.

@clive-jevons
Copy link
Contributor

In general I think this is an important priority for Kilo. Another way I've considered solving this in the past is by using ipsets and atomically swapping out the IP addresses in the sets so that Kilo does not have to change rules so often. This means that whenever new Nodes or Peers join a cluster, this would instead require an "instantaneous" ipset operation rather than changing iptables rules. This would bring other performance benefits with it as the evaluation of a firewall rule containing ipset is faster than evaluating many independent iptables rules.

I do really like the sound of that approach. However, how would you feel about us starting with the implementation in this PR, so that our immediate problem can be solved (we verified that it does indeed solve our problems on one of our test clusters 😁 ), and then have a stab at doing a PR using ipsets at some point later? 😬

@dajudge dajudge marked this pull request as ready for review July 26, 2022 15:32
@@ -105,7 +127,14 @@ func NewIPv6Rule(table, chain string, spec ...string) Rule {
return &rule{table, chain, spec, ProtocolIPv6}
}

func (r *rule) Add(client Client) error {
func (r *rule) Prepend(client Client) error {
if err := client.Insert(r.table, r.chain, 1, r.spec...); err != nil {
Copy link
Contributor Author

@dajudge dajudge Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clive-jevons should we maybe implement & use a InsertUnique here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intention of the Client interface was to only shadow the existing API of the ipsets client? As such the InsertUnique semantics should probably be part of the API here? Could be wrong though - some guidance by @squat or @leonnicolas would be good here, I think 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look a couple of lines down the Append() function is implemented via AppendUnique() - hence the question.

I think this is relevant when kilo was terminated abruptly without having a chance to clean up and then spins up again - without the check for Exists() the Insert() would probably cause an error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very fair point 😉 I'd still be interested in the feedback / guidance from the maintainers on this 😬
But absolutely agree with the necessity of the functionality - where it ends up being implemented 👍

Copy link
Contributor Author

@dajudge dajudge Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right re the interface - it's implemented in an external dependency which doesn't mirror AppendUnique() for Insert(). So my suggestion here would be to implement this here while at the same time making a PR in the dependency to implement InsertUnique() and later switch to it once it's there. 🤔

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progress here: it's been merged to main. Once there is a new release we can make use of the new function.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm we can probably just pin to the latest commit on main rather than wait for a release, no?

Copy link
Contributor Author

@dajudge dajudge Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have done that and replaced iptables.Insert() with iptables.InsertUnique()

@dajudge
Copy link
Contributor Author

dajudge commented Jul 27, 2022

The idea of using ipsets sounds great! I think this PR here might even be a nice step towards that direction by introducing the distinction between the two iptables rules classes:

  • the append rules seem to be the catch-all kind of rules that have to be in place statically and I don't think ipsets make much sense here?
  • the prepend rules seem to be the ip-/subnet-specific ones that would benefit from a move to ipsets

The PR at hand already introduces a carrier datastructure RuleSet to carry more semantic information than a []Rule slice - which sounds like a great start to later put more semantics in for ipsets - while in this iteration still relying on the proven approach of using iptables.

So all in all I think this PR could even be considered as a nice evolutionary step towards an ipsets-based implementation.

WDYT?

@dajudge
Copy link
Contributor Author

dajudge commented Jul 28, 2022

We just ran a small test against a cluster running this PR (plus the metrics PR for observability) and this is what it looks like in grafana 😍

image

The two iptables related charts display the following metrics:

  • sum(rate(kilo_iptables_operations_total[5m])) by (operation)
  • sum(rate(kilo_iptables_operations_total{operation!~"(List|Exists|ListChains)"}[5m])) by (chain)

We also didn't observe any disconnects during autoscaling as far as we can tell.

@squat
Copy link
Owner

squat commented Oct 20, 2022

@dajudge what's the status of this PR? Is it ready for a final review so we can merge and maybe include in a Kilo 0.6.0? 😻

@dajudge
Copy link
Contributor Author

dajudge commented Oct 20, 2022

@dajudge what's the status of this PR? Is it ready for a final review so we can merge and maybe include in a Kilo 0.6.0? heart_eyes_cat

Yes. We've been running patched version of kilo with this PR included since July.

We're currently running it on over half a dozen kubernetes clusters with dozens of nodes in total. We have not encountered any issues and it's even playing nicely with kube-router ever since (we had severe problems with that before).

From our perspective this PR is ready to go! 😍 🚀 👍

@clive-jevons
Copy link
Contributor

@squat <- any chance of getting this merged and into a release? 😬 🙏 Anything we can do to help move this forward?

@squat
Copy link
Owner

squat commented Mar 28, 2023

@clive-jevons thanks for your patience!
If you can help resolve the merge conflict on the PR then I would be very happy to merge this bad boy!

@dajudge
Copy link
Contributor Author

dajudge commented Mar 28, 2023

rebased / merge conflicts resolved.

running it on a test cluster now.

looks good at first glance. 👍

@squat
Copy link
Owner

squat commented Mar 28, 2023

thanks @dajudge

Copy link
Owner

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for being patient @dajudge @clive-jevons <3

@squat squat merged commit 12ad275 into squat:main Mar 28, 2023
@dajudge dajudge deleted the lean-iptables-updates branch March 28, 2023 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants