Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iptables-nft-save v1.8.7 (nf_tables): Could not fetch rule set generation id: Invalid argument #9053

Closed
dwilliams782 opened this issue Aug 1, 2022 · 4 comments · Fixed by #9097
Labels
bug priority/P0 Release Blocker
Milestone

Comments

@dwilliams782
Copy link

What is the issue?

I've just upgraded to the latest edge (edge-22.7.3) and have the following iptables error: Could not fetch rule set generation id.

I have got around this by using legacy:

# proxy-init configuration
proxyInit:
  # -- Variant of iptables that will be used to configure routing. Currently,
  # proxy-init can be run either in 'nft' or in 'legacy' mode. The mode will
  # control which utility binary will be called. The host must support
  # whichever mode will be used
  iptablesMode: "legacy"

How can it be reproduced?

Upgrade to the latest version of edge on a GKE cluster with COS nodes.

Logs, error output, etc

 lastState:
      terminated:
        containerID: docker://0be32ff22b90b0f1efd6900c568bbad320462709b6636ba9cda8e4f8ee201a96
        exitCode: 1
        finishedAt: "2022-08-01T09:36:47Z"
        message: |2+
           msg="iptables-nft-save v1.8.7 (nf_tables): Could not fetch rule set generation id: Invalid argument\n\n"
          Error: exit status 4
          time="2022-08-01T09:36:47Z" level=error msg="aborting firewall configuration"
          Usage:
            proxy-init [flags]

output of linkerd check -o short

Status check results are √

Environment

  • Kubernetes 1.22.8-gke.202
  • GKE nodes running Container-optimised OS with Docker (cos)
  • Linkerd version edge-22.7.3

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

No response

@olix0r olix0r added this to the stable-2.12.0 milestone Aug 1, 2022
@olix0r olix0r added the priority/P0 Release Blocker label Aug 1, 2022
@kleimkuhler
Copy link
Contributor

Could possibly be fixed by #8859.

@mateiidavid
Copy link
Member

I was thinking at first #8859 might fix it too; I think this might be an issue with Google's Container Optimized OS (COS) not supporting nft and its relevant kernel modules, though.

@dwilliams782 was kind enough to ssh onto his GCOS host and run through a few commands to check the state of the modules. First, here's what I expected to see (all output comes from my Debian host):

# 
#  Run 'lsmod | grep 'nft'
#  
# 
nft_chain_route_ipv4    16384  4
nft_chain_route_ipv6    16384  4
nft_chain_nat_ipv6     16384  16
nf_nat_ipv6            16384  2 ip6t_MASQUERADE,nft_chain_nat_ipv6
nft_limit              16384  45
nft_counter            16384  1496
nft_compat             20480  2336
nft_chain_nat_ipv4     16384  84
nf_nat_ipv4            16384  3 ipt_MASQUERADE,nft_chain_nat_ipv4,iptable_nat
nf_tables             143360  3732 nft_chain_route_ipv4,nft_compat,nft_chain_nat_ipv6,nft_chain_nat_ipv4,nft_counter,nft_limit,nft_chain_route_ipv6
nfnetlink              16384  6 nft_compat,nf_conntrack_netlink,nf_tables,ip_set,nfnetlink_log
x_tables               45056  23 xt_conntrack,xt_statistic,iptable_filter,nft_compat,xt_multiport,xt_NFLOG,ip6t_MASQUERADE,xt_tcpudp,ipt_MASQUERADE,xt_addrtype,xt_physdev,xt_nat,xt_comment,xt_owner,xt_set,ipt_REJECT,ipt_rpfilter,iptable_raw,ip_tables,xt_limit,iptable_mangle,xt_REDIRECT,xt_mark

These are some of the modules that I expected to see. Most of the nft prefixed modules are used by nf_tables, confirmed by running lsmod | grep nf_tables`:

nf_tables             143360  3732 nft_chain_route_ipv4,nft_compat,nft_chain_nat_ipv6,nft_chain_nat_ipv4,nft_counter,nft_limit,nft_chain_route_ipv6

Using systemctl, we can also check if the nft service is active (or at least, installed):

:; sudo systemctl status nftables
● nftables.service - nftables
   Loaded: loaded (/lib/systemd/system/nftables.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:nft(8)
           http://wiki.nftables.org

# 
# Note: my Debian version does not use nftables by default
# 

In contrast, here's the output from GCOS:

$ lsmod | grep 'nf_tables'
$ systemctl status nftables
Unit nftables.service could not be found.

Full lsmod output from GCOS:

$ lsmod
Module                  Size  Used by
xt_CT                  16384  16
xt_REDIRECT            16384  4
xt_multiport           16384  18
ipt_rpfilter           16384  1
iptable_raw            16384  1
ip_set_hash_ip         32768  1
ip_set_hash_net        40960  2
ip_set                 40960  2 ip_set_hash_ip,ip_set_hash_net
veth                   24576  0
wireguard              86016  0
ip6_udp_tunnel         16384  1 wireguard
udp_tunnel             20480  1 wireguard
libchacha20poly1305    20480  1 wireguard
poly1305_x86_64        28672  1 libchacha20poly1305
chacha_x86_64          28672  1 libchacha20poly1305
libchacha              16384  1 chacha_x86_64
curve25519_x86_64      94208  1 wireguard
libcurve25519_generic    53248  2 curve25519_x86_64,wireguard
libblake2s             16384  1 wireguard
blake2s_x86_64         20480  1 libblake2s
libblake2s_generic     20480  1 blake2s_x86_64
dummy                  16384  0
xt_recent              20480  24
xt_statistic           16384  335
ip_vs_sh               16384  0
ip_vs_wrr              16384  0
ip_vs_rr               16384  0
ip_vs                 147456  6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
ip6table_nat           16384  1
xt_mark                16384  146
xt_nat                 16384  658
xt_MASQUERADE          16384  3
xt_addrtype            16384  28
iptable_nat            16384  3
nf_nat                 61440  5 ip6table_nat,xt_nat,iptable_nat,xt_MASQUERADE,xt_REDIRECT
br_netfilter           24576  0
xt_state               16384  0
aesni_intel           368640  0
glue_helper            20480  1 aesni_intel
crypto_simd            16384  1 aesni_intel
cryptd                 24576  1 crypto_simd
fuse                  139264  1
configfs               40960  1

Next, I wanted to see if we can find any of the modules. On Debian, I used find /lib/modules | grep tables

:; find /lib/modules | grep tables
/lib/modules/4.19.0-20-amd64/kernel/net/ipv6/netfilter/ip6_tables.ko
/lib/modules/4.19.0-20-amd64/kernel/net/netfilter/x_tables.ko
/lib/modules/4.19.0-20-amd64/kernel/net/netfilter/nf_tables.ko
/lib/modules/4.19.0-20-amd64/kernel/net/netfilter/nf_tables_set.ko
/lib/modules/4.19.0-20-amd64/kernel/net/ipv4/netfilter/ip_tables.ko
/lib/modules/4.19.0-20-amd64/kernel/net/ipv4/netfilter/arp_tables.ko
/lib/modules/4.19.0-20-amd64/kernel/net/bridge/netfilter/ebtables.ko
/lib/modules/4.19.0-12-amd64/kernel/net/ipv6/netfilter/ip6_tables.ko
/lib/modules/4.19.0-12-amd64/kernel/net/netfilter/x_tables.ko
/lib/modules/4.19.0-12-amd64/kernel/net/netfilter/nf_tables.ko
/lib/modules/4.19.0-12-amd64/kernel/net/netfilter/nf_tables_set.ko
/lib/modules/4.19.0-12-amd64/kernel/net/ipv4/netfilter/ip_tables.ko
/lib/modules/4.19.0-12-amd64/kernel/net/ipv4/netfilter/arp_tables.ko
/lib/modules/4.19.0-12-amd64/kernel/net/bridge/netfilter/ebtables.ko
/lib/modules/4.19.0-21-amd64/kernel/net/ipv6/netfilter/ip6_tables.ko
/lib/modules/4.19.0-21-amd64/kernel/net/netfilter/x_tables.ko
/lib/modules/4.19.0-21-amd64/kernel/net/netfilter/nf_tables.ko
/lib/modules/4.19.0-21-amd64/kernel/net/netfilter/nf_tables_set.ko
/lib/modules/4.19.0-21-amd64/kernel/net/ipv4/netfilter/ip_tables.ko
/lib/modules/4.19.0-21-amd64/kernel/net/ipv4/netfilter/arp_tables.ko
/lib/modules/4.19.0-21-amd64/kernel/net/bridge/netfilter/ebtables.ko

On GCOS:

$ find /lib/modules | grep tables
/lib/modules/5.10.90+/kernel/net/bridge/netfilter/ebtables.ko
/lib/modules/5.10.90+/kernel/net/ipv4/netfilter/arp_tables.ko

I found it weird and thought perhaps for some reason iptables-nft-save does not load a target or something similar, perhaps due to the iptables version used, or how the binary was built. I asked @dwilliams782 to try out a manifest where our proxy-init ran as root and privileged; in my mind this would let it load any relevant targets that weren't originally included:

 State:      Terminated
      Reason:   Error
      Message:   msg="iptables-nft-save v1.8.7 (nf_tables): Could not fetch rule set generation id: Invalid argument\n\n"
time="2022-08-03T08:35:30Z" level=error msg="aborting firewall configuration"

Clearly didn't work. My feeling is that GCOS doesn't include the nft kernel modules; the error itself is pretty cryptic, we're not actually using any flags for iptables-nft-save other than -t nat (that is, operating on the nat table). As a reminder, all iptables binaries are symlinked to the same xtables-nft-multi binary:

# Exec into Alpine container that we use for proxy-init
#
/sbin # ls -l iptables-nft iptables-nft-save iptables-nft-restore
lrwxrwxrwx    1 root     root            17 Aug  3 09:24 iptables-nft -> xtables-nft-multi
lrwxrwxrwx    1 root     root            17 Aug  3 09:24 iptables-nft-restore -> xtables-nft-multi
lrwxrwxrwx    1 root     root            17 Aug  3 09:24 iptables-nft-save -> xtables-nft-multi

I did some digging and found this GitLab issue: https://gitlab.com/kalilinux/tools/kali-ci-autopkgtest-lxc/-/issues/1. In the issue, they see a similar error (albeit when using iptables-nft directly, together with --dport). My assumption that GCOS is missing modules seemes to be confirmed here: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/5184#note_637329692. When switching from CoreOS to GCOS, people ran into the same issues (and went through roughly the same steps as @dwilliams782 and I).

I think this is just a compatibility issue (didn't expect to run into one so soon) and there's not much we can do in Linkerd other than document the error. We could perhaps have a compatibility table that we add to as we encounter these. Based on the steps we went through and based on the gitlab issues, I think it's safe to assume Google's Container Optimized OS doesn't include nft modules. Their documentation doesn't include anything about it though.

TL;DR: seems Google Container Optimized OS doesn't support nft.

@kleimkuhler
Copy link
Contributor

Awesome thanks for the investigation into this!

@olix0r
Copy link
Member

olix0r commented Aug 4, 2022

I think this is probably a good motivation to revert the default to legacy--GCOS is a fairly popular OS.

kleimkuhler pushed a commit that referenced this issue Aug 5, 2022
Some hosts may not have 'nft' modules available. Currently, proxy-init
defaults to using 'iptables-nft'; if the host does not have support for
nft modules, the init container will crash, blocking all injected
workloads from starting up.

This change defaults the 'iptablesMode' value to 'legacy'.

* Update linkerd-control-plane/values file default
* Update proxy-init partial to default to 'legacy' when no mode is
  specified
* Change expected values in 'pkg/charts/linkerd2/values_test.go' and in
  'cli/cmd/install_test'
* Update golden files

Fixes #9053

Signed-off-by: Matei David <matei@buoyant.io>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug priority/P0 Release Blocker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants