-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add iptables NOTRACK rules for excluding GENEVE traffic from nf_conntrack #67
Add iptables NOTRACK rules for excluding GENEVE traffic from nf_conntrack #67
Conversation
Previous work: https://review.opendev.org/c/x/charm-ovn-chassis/+/854607 |
4ecdf4d
to
1e8a673
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this, Mustafa!
I like the simplicity of this approach.
There is the remaining issue of keeping the charm out of the loop for bringing up the system to the desired state at boot though. However thinking about it, if we were to fiddle with things such as the iptables-persistent
package to make the system do this itself, we may also interfere with whatever method the end user of the system has to manage their firewall rules, if any exist. So with that in mind we may want to accept your approach to this. There should be a comment about this though so we know it has been discussed.
What do you think?
Other than that I added a couple of nits in in-line comments, and it would also be great if you added a unit test so that we avoid breaking your change in the future.
lib/charms/ovn_charm.py
Outdated
|
||
[1]:(https://datatracker.ietf.org/doc/html/rfc8926#section-3.3) | ||
""" | ||
ch_core.hookenv.log("Configuring iptables rules") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be explicit about which log level is used for all calls to log
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted, done.
try: | ||
self.run(*args) | ||
ch_core.hookenv.log( | ||
"Rule `{}` append to `{}` chain skipped (already exists)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, log level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I agree with you. I believe it would be inconvenient for users to be obliged to use some specific way, and that may interfere with their (existing or future) intentions and workflows. Using Also, if we were removing this for whatever reason in the future, this approach is simpler to remove too. No strings attached. |
1e8a673
to
bda0bfe
Compare
On some busy cloud deployments, it has been reported that the nodes hosting the ovn-chassis are getting their nf-conntrack tables full and starting to drop connections. The reason is that GENEVE uses source port randomization, and the number of "unique" flows is high from the nf-conntrack's perspective. This is no surprise since flows are usually identified with their five-tuple (srcip/[srcport]/dstip/dstport/tproto) by network elements, and GENEVE is leveraging this fact to have an even distribution in load-balancing systems in between [1]. The randomization causes the nf_conntrack table to be filled with many GENEVE flows, eventually leading to connection drops in a busy environment. As there is no particular reason and benefit to track these flows at the moment, the solution is to exclude GENEVE traffic from nf-conntrack tracking. This can be done by putting rules with `-j NOTRACK` jump into relevant iptables chains, which many people already use as a solution to this problem. This change incorporates the relevant rules to the charm code, so the rules become present by default. Closes-bug: #1978806 [1]: https://datatracker.ietf.org/doc/html/rfc8926#section-3.3
bda0bfe
to
09a5d3c
Compare
Added unit tests. |
This is a legitimate concern, however, in general the charms are currently designed to assume they are (more or less) the only service setting up the system with the possible execption of other charms in our ecosystem. Most services are the only major charm on the system but nova-compute/ceph-osd/ovn-chassis/neutron-openvswitch are one of the few exceptions as we tend to colocate those together onto the base system. They also often have LXD on the base system. In general I wouldn't worry too much about anything the user is doing however other daemons that might be on the system may matter - LXD is one (which sometimes installs some DHCP checksum mangle table rules but I think may not when no default dhcp network exists.. have to check that.. at worst we may duplicate those rules) and the other tends to be third-party SDN charms but I am not sure they will apply when OVN is running. Having said that there is nothing particularly special about netfilter-persistent it is fairly simple wrapper around iptables-save/iptables-restore ultimately (see https://salsa.debian.org/debian/iptables-persistent). If you used netfilter-persistent you can tell it not to automatically save new rules (so remembers only yours) and also not to purge other rules when it starts (/etc/default/netfilter-persistent). However a custom systemd service would not be all that different ultimately written to explicitly only insert/replace the relevant matching rules only. Though we tend to try not to re-invent the wheel in that area. There is also some charm history for using 'ufw' (e.g. the ovn-central charm and swift charms) I tried to have a look to see if we could could do some other twiddle to stop conntrack from functioning in the root namespace and I couldn't find anything.. there was a patch that stops conntrack from automatically processing packets inside network namespaces but I can't see an equivalent for conntrack other than preventing the conntrack modules from loading (and sosreport for example goes to extensive lengths to prevent accidentally loading the iptables or conntrack modules because of these side effects). It does seem like an ultimately better solution may be some kind of upstream patch to allow conntrack not to be used in the root namespace/iptables when its used by something else (e.g. another namespace, or, in this case OVS except where OVS is explicitly learning entries into it).. or.. to have OVN use a separate set of netfilter conntrack buckets so that the root bucket being full wouldn't actually effect OVN - or even run all of nova/ovn inside it's own network namespace. But a patch here would still be needed for existing systems likely. Also I am not up on the state of the art of people wanting to use the nftables interface versus the iptables compatability interface, but there is an nft equivalent rule: Docs: |
Thank you for providing your insights @lathiat.
Even if the persistent package was configured to not automatically save rules, the iptables-save command would save the entire state at the point in time the charm calls it, which would include any rules not managed by the charm.
Totally agree there, if something needs a systemd service it should probably be in a deb or snap.
[ snip ]
nftables does indeed appear to be the future, there are however warnings about using the iptables compatibilty interface and nft simultaneously, as that would load both code paths into the kernel. I also did not find any .d directory type infrastructure for nft in Ubuntu, and managing /etc/nftables.conf directly by the charm does not appear scalable. In conclusion, since this change is to work around a standing issue for the side effects of interactions between the nf_conntrack and geneve tunneling services in the kernel as used by OVN, and there is no clear and simple path to persist the workaround in the system, I'm leaning towards accepting the proposed solution as-is. |
On some busy cloud deployments, it has been reported that the nodes
hosting the ovn-chassis are getting their nf-conntrack tables full
and starting to drop connections. The reason is that GENEVE uses
source port randomization, and the number of "unique" flows is high
from the nf-conntrack's perspective. This is no surprise since flows
are usually identified with their five-tuple (srcip/[srcport]/dstip/dstport/tproto)
by network elements, and GENEVE is leveraging this fact to have an
even distribution in load-balancing systems in between (*). The
randomization causes the nf_conntrack table to be flooded with many
GENEVE flows, eventually leading to connection drops in a busy
environment. As there is no particular reason and benefit to track
these flows at the moment, the solution is to exclude GENEVE traffic
from nf-conntrack tracking. This can be done by putting rules with
-j NOTRACK
jump into relevant iptables chains, which many peoplealready use as a solution to this problem.
This change incorporates the relevant rules to the charm code, so the
rules become present by default.
Closes-bug: #1978806