Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add iptables NOTRACK rules for excluding GENEVE traffic from nf_conntrack #67

Merged
merged 1 commit into from
Sep 9, 2022

Conversation

mustafakemalgilor
Copy link
Contributor

On some busy cloud deployments, it has been reported that the nodes
hosting the ovn-chassis are getting their nf-conntrack tables full
and starting to drop connections. The reason is that GENEVE uses
source port randomization, and the number of "unique" flows is high
from the nf-conntrack's perspective. This is no surprise since flows
are usually identified with their five-tuple (srcip/[srcport]/dstip/dstport/tproto)
by network elements, and GENEVE is leveraging this fact to have an
even distribution in load-balancing systems in between (*). The
randomization causes the nf_conntrack table to be flooded with many
GENEVE flows, eventually leading to connection drops in a busy
environment. As there is no particular reason and benefit to track
these flows at the moment, the solution is to exclude GENEVE traffic
from nf-conntrack tracking. This can be done by putting rules with
-j NOTRACK jump into relevant iptables chains, which many people
already use as a solution to this problem.

This change incorporates the relevant rules to the charm code, so the
rules become present by default.

Closes-bug: #1978806

@mustafakemalgilor
Copy link
Contributor Author

Copy link
Contributor

@fnordahl fnordahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this, Mustafa!

I like the simplicity of this approach.

There is the remaining issue of keeping the charm out of the loop for bringing up the system to the desired state at boot though. However thinking about it, if we were to fiddle with things such as the iptables-persistent package to make the system do this itself, we may also interfere with whatever method the end user of the system has to manage their firewall rules, if any exist. So with that in mind we may want to accept your approach to this. There should be a comment about this though so we know it has been discussed.

What do you think?

Other than that I added a couple of nits in in-line comments, and it would also be great if you added a unit test so that we avoid breaking your change in the future.


[1]:(https://datatracker.ietf.org/doc/html/rfc8926#section-3.3)
"""
ch_core.hookenv.log("Configuring iptables rules")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be explicit about which log level is used for all calls to log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, done.

try:
self.run(*args)
ch_core.hookenv.log(
"Rule `{}` append to `{}` chain skipped (already exists)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, log level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mustafakemalgilor
Copy link
Contributor Author

mustafakemalgilor commented Sep 6, 2022

However thinking about it, if we were to fiddle with things such as the iptables-persistent package to make the system do this itself, we may also interfere with whatever method the end user of the system has to manage their firewall rules, if any exist. So with that in mind we may want to accept your approach to this. There should be a comment about this though so we know it has been discussed.

I agree with you. I believe it would be inconvenient for users to be obliged to use some specific way, and that may interfere with their (existing or future) intentions and workflows. Using iptables-persistent might complicate things further; assume that the user has added temporary iptables rules to the system, and the charm installed iptables-persistent and saved the rules. That would cause effects beyond the end user's intent. I believe using iptables-persistent or equivalent mechanisms would be a bit overkill for this specific task and might have unintended side effects.

Also, if we were removing this for whatever reason in the future, this approach is simpler to remove too. No strings attached.

On some busy cloud deployments, it has been reported that the nodes
hosting the ovn-chassis are getting their nf-conntrack tables full
and starting to drop connections. The reason is that GENEVE uses
source port randomization, and the number of "unique" flows is high
from the nf-conntrack's perspective. This is no surprise since flows
are usually identified with their five-tuple (srcip/[srcport]/dstip/dstport/tproto)
by network elements, and GENEVE is leveraging this fact to have an
even distribution in load-balancing systems in between [1]. The
randomization causes the nf_conntrack table to be filled with many
GENEVE flows, eventually leading to connection drops in a busy
environment. As there is no particular reason and benefit to track
these flows at the moment, the solution is to exclude GENEVE traffic
from nf-conntrack tracking. This can be done by putting rules with
`-j NOTRACK` jump into relevant iptables chains, which many people
already use as a solution to this problem.

This change incorporates the relevant rules to the charm code, so the
rules become present by default.

Closes-bug: #1978806

[1]: https://datatracker.ietf.org/doc/html/rfc8926#section-3.3
@mustafakemalgilor
Copy link
Contributor Author

Added unit tests.

@lathiat
Copy link

lathiat commented Sep 7, 2022

I agree with you. I believe it would be inconvenient for users to be obliged to use some specific way, and that may interfere with their (existing or future) intentions and workflows. Using iptables-persistent might complicate things further; assume that the user has added temporary iptables rules to the system, and the charm installed iptables-persistent and saved the rules. That would cause effects beyond the end user's intent. I believe using iptables-persistent or equivalent mechanisms would be a bit overkill for this specific task and might have unintended side effects.

Also, if we were removing this for whatever reason in the future, this approach is simpler to remove too. No strings attached.

This is a legitimate concern, however, in general the charms are currently designed to assume they are (more or less) the only service setting up the system with the possible execption of other charms in our ecosystem.

Most services are the only major charm on the system but nova-compute/ceph-osd/ovn-chassis/neutron-openvswitch are one of the few exceptions as we tend to colocate those together onto the base system. They also often have LXD on the base system.

In general I wouldn't worry too much about anything the user is doing however other daemons that might be on the system may matter - LXD is one (which sometimes installs some DHCP checksum mangle table rules but I think may not when no default dhcp network exists.. have to check that.. at worst we may duplicate those rules) and the other tends to be third-party SDN charms but I am not sure they will apply when OVN is running.

Having said that there is nothing particularly special about netfilter-persistent it is fairly simple wrapper around iptables-save/iptables-restore ultimately (see https://salsa.debian.org/debian/iptables-persistent). If you used netfilter-persistent you can tell it not to automatically save new rules (so remembers only yours) and also not to purge other rules when it starts (/etc/default/netfilter-persistent).

However a custom systemd service would not be all that different ultimately written to explicitly only insert/replace the relevant matching rules only. Though we tend to try not to re-invent the wheel in that area.

There is also some charm history for using 'ufw' (e.g. the ovn-central charm and swift charms)

I tried to have a look to see if we could could do some other twiddle to stop conntrack from functioning in the root namespace and I couldn't find anything.. there was a patch that stops conntrack from automatically processing packets inside network namespaces but I can't see an equivalent for conntrack other than preventing the conntrack modules from loading (and sosreport for example goes to extensive lengths to prevent accidentally loading the iptables or conntrack modules because of these side effects).

It does seem like an ultimately better solution may be some kind of upstream patch to allow conntrack not to be used in the root namespace/iptables when its used by something else (e.g. another namespace, or, in this case OVS except where OVS is explicitly learning entries into it).. or.. to have OVN use a separate set of netfilter conntrack buckets so that the root bucket being full wouldn't actually effect OVN - or even run all of nova/ovn inside it's own network namespace. But a patch here would still be needed for existing systems likely.

Also I am not up on the state of the art of people wanting to use the nftables interface versus the iptables compatability interface, but there is an nft equivalent rule:
nft add rule ip raw prerouting tcp dport { 80, 443 } notrack

Docs:
https://wiki.nftables.org/wiki-nftables/index.php/Setting_packet_connection_tracking_metainformation

@fnordahl
Copy link
Contributor

fnordahl commented Sep 7, 2022

Thank you for providing your insights @lathiat.

Having said that there is nothing particularly special about netfilter-persistent it is fairly simple wrapper around iptables-save/iptables-restore ultimately (see https://salsa.debian.org/debian/iptables-persistent). If you used netfilter-persistent you can tell it not to automatically save new rules (so remembers only yours) and also not to purge other rules when it starts (/etc/default/netfilter-persistent).

Even if the persistent package was configured to not automatically save rules, the iptables-save command would save the entire state at the point in time the charm calls it, which would include any rules not managed by the charm.

However a custom systemd service would not be all that different ultimately written to explicitly only insert/replace the relevant matching rules only. Though we tend to try not to re-invent the wheel in that area.

Totally agree there, if something needs a systemd service it should probably be in a deb or snap.

There is also some charm history for using 'ufw' (e.g. the ovn-central charm and swift charms)

ufw would indeed be a good candidate, it also already has charm library support. It does unfortunately not appear to have support for managing the notrack attribute. Another complication is that the ovn-chassis charm usually runs uncontained on bare metal, and unconditionally enabling ufw may cause undesired side effects.

[ snip ]

Also I am not up on the state of the art of people wanting to use the nftables interface versus the iptables compatability interface, but there is an nft equivalent rule: nft add rule ip raw prerouting tcp dport { 80, 443 } notrack

nftables does indeed appear to be the future, there are however warnings about using the iptables compatibilty interface and nft simultaneously, as that would load both code paths into the kernel. I also did not find any .d directory type infrastructure for nft in Ubuntu, and managing /etc/nftables.conf directly by the charm does not appear scalable.

In conclusion, since this change is to work around a standing issue for the side effects of interactions between the nf_conntrack and geneve tunneling services in the kernel as used by OVN, and there is no clear and simple path to persist the workaround in the system, I'm leaning towards accepting the proposed solution as-is.

@fnordahl fnordahl merged commit f3d1ee5 into openstack-charmers:master Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants