Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kilo images bundle an outdated version of iptables causing reconciliation errors #388

Closed
TPXP opened this issue Jun 27, 2024 · 3 comments · Fixed by #389
Closed

Kilo images bundle an outdated version of iptables causing reconciliation errors #388

TPXP opened this issue Jun 27, 2024 · 3 comments · Fixed by #389

Comments

@TPXP
Copy link
Contributor

TPXP commented Jun 27, 2024

Hello,

We are running kilo on a cluster with two different node pools, one of which has the following system details:

  Kernel Version:             6.2.0-36-generic
  OS Image:                   Ubuntu 22.04.3 LTS 269b0b8ce7
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.7.13
  Kubelet Version:            v1.29.1
  Kube-Proxy Version:         v1.29.1

Our Grafana dashboard shows these nodes consistently face reconciliation errors
image

Kilo logs clearly point to an error in an iptables call.

{"caller":"mesh.go:262","component":"kilo","error":"failed to reconcile rules: failed to check if rule exists: failed to populate chains for table \"filter\": running [/sbin/iptables -t filter -S --wait]: exit status 1: iptables v1.8.4 (nf_tables): table `filter' is incompatible, use 'nft' tool.\n\n","level":"error","ts":"2024-06-27T14:54:04.579160946Z"}

We also see that our other nodes frequently have segmentation faults in the iptables binary, which correlate with times when kilo calls iptables:

  • dmesg
[26878597.899323] iptables[2537594]: segfault at 7fba32878dd0 ip 00007fba328a00ad sp 00007ffe13a7f4e8 error 4 in libnftnl.so.11.3.0[7fba3289c000+16000]
[26878597.899334] Code: 83 ec 18 0f b7 f6 b9 01 00 00 00 88 54 24 0c 48 8d 54 24 0c e8 b4 c5 ff ff 48 83 c4 18 c3 49 89 f8 48 83 c9 ff 48 89 d7 31 c0 <f2> ae 0f b7 f6 4c 89 c7 f7 d1 e9 94 c5 ff ff 0f b7 f6 31 c9 e9 8a
  • kilo
{"caller":"mesh.go:262","component":"kilo","error":"failed to reconcile rules: failed to check if rule exists: failed to populate chains for table \"nat\": running [/sbin/iptables -t nat -S --wait]: exit status -1: ","level":"error","ts":"2024-06-27T14:17:50.343571655Z"}
  • Node details
  Kernel Version:             5.15.0-79-generic
  OS Image:                   Ubuntu 22.04.4 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.33
  Kubelet Version:            v1.29.1
  Kube-Proxy Version:         v1.29.1

The libnftnl.so.11.3.0 file is not present on the host, it only exists in containers (find found it under /var/lib/containerd/ or /run/containerd). In fact, we found this file in the kilo container at /usr/lib/libnftnl.so.11.3.0.

Cursiously, running /sbin/iptables -t nat -S --wait from a shell inside the kilo container (docker.io/squat/kilo:0.6.0) works without causing a segfault or an error 🤔

The Kilo container image ships with iptables 1.8.4, which is a little old. iptables has seen some recent updates addressing the use "nft" tool instead error, and our other containers touching networking (mostly kube-router) use iptables v1.8.9. Under the hood, kube-router containers run ipset v7.17 while kilo provides ipset v7.6. Since everything seems to run smoothly with kube-router, I think upgrading the kilo image to ship these versions would help. Is there any reason to keep these old versions?

I see the kilo image relies on alpine. Maybe bumping it to Alpine v3.18 (seems like latest versions have a few annoying bugs - see cloudnativelabs/kube-router#1678) will help?

@TPXP TPXP changed the title Kilo containers bundle an outdated version of iptables causing reconciliation errors Kilo images bundle an outdated version of iptables causing reconciliation errors Jun 27, 2024
@squat
Copy link
Owner

squat commented Jun 27, 2024

Thank you for the extremely detailed report. There is no reason for the old iptables packages, there simply haven't been error reports for the packages. The package versions are old because we are using old alpine base images. This should be easily addressed with a base image update.

@TPXP
Copy link
Contributor Author

TPXP commented Jun 27, 2024

Thanks a lot for the very prompt reply, I agree upgrading the base image should indeed fix it 👍

@squat
Copy link
Owner

squat commented Jun 27, 2024

@TPXP can you try the newest Kilo tag to see if that fixes your issue? The newest tag will be 0122dec8f16a61518dd02899501a8e8756387b76. Should be built soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants