Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico incompatible ipset protocol version (again) #8372

Closed
dghubble opened this issue Dec 22, 2023 · 21 comments · Fixed by #8387 or poseidon/terraform-render-bootstrap#378
Closed

Calico incompatible ipset protocol version (again) #8372

dghubble opened this issue Dec 22, 2023 · 21 comments · Fixed by #8387 or poseidon/terraform-render-bootstrap#378

Comments

@dghubble
Copy link
Contributor

Expected Behavior

Calico's use of ipset should be as broadly compatible as possible.

Current Behavior

Calico v3.26.3 crashloops on Fedora CoreOS hosts now:

felix/ipsets.go 569: Bad return code from 'ipset list'. error=exit status 1 family="inet" stderr="ipset v7.11: Kernel and userspace incompatible: settype hash:ip with revision 6 not supported by userspace."

This is similar to an issue that happened a few years ago. #5011

Possible Solution

Before, Calico wasn't shipping a new enough ipset, but here the versions do seem to match.

sudo podman run --rm --privileged -it --net=host calico/node:v3.26.3 ipset --version
ipset v7.11, protocol version: 7

kube-proxy:

kubectl exec -it kube-proxy-rjf56 -n kube-system -- ipset --version
ipset v7.17, protocol version: 7

So I'm not sure why Calico calls to ipset see incompatible versions.

Steps to Reproduce (for bugs)

Your Environment

  • Calico version: v3.26.3
  • Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes v1.29.0
  • Operating System and version: Fedora CoreOS 39.20231119.3.0
  • Kernel: 6.5.11
  • kube-proxy IPVS mode
  • Link to your project (optional): https://github.com/poseidon/typhoon

Notably, on a Flatcar Linux node (5.15 kernel, much older) I don't see this issue.

@msilcher
Copy link

msilcher commented Dec 29, 2023

Same here, started to fail after upgrading kubernetes to 1.29. Tested Calido 3.27 and 3.26.4 and both fail (works fine on k8s 1.28). I'm using debian 12 (kernel 6.5) and cri-o 1.29 by the way.
Logs:
2023-12-29 12:24:38.408 [ERROR][2567] felix/ipsets.go 569: Bad return code from 'ipset list'. error=exit status 1 family="inet" stderr="ipset v7.11: Kernel and userspace incompatible: settype hash:ip,port with revision 7 not supported by userspace.\n"
2023-12-29 12:24:38.408 [WARNING][2567] felix/ipsets.go 319: Failed to resync with dataplane error=exit status 1 family="inet"
2023-12-29 12:24:38.415 [INFO][2567] felix/ipsets.go 309: Retrying after an ipsets update failure... family="inet6"
2023-12-29 12:24:38.415 [ERROR][2567] felix/ipsets.go 569: Bad return code from 'ipset list'. error=exit status 1 family="inet6" stderr="ipset v7.11: Kernel and userspace incompatible: settype hash:ip,port with revision 7 not supported by userspace.\n"

@fasaxc
Copy link
Member

fasaxc commented Jan 2, 2024

I think ipset is just annoyingly loose with its protocol version. The version gets revved per-ipset-type bu the tool only reports the higher version it supports. We'll just need to upgrade ipset to match.

@msilcher
Copy link

msilcher commented Jan 4, 2024

@mazdakn I was willing to test with the image you provided but I get permission errors:
Failed to pull image "gcr.io/xxxxx/xxxxx/node:latest": reading manifest latest in gcr.io/xxxxx/xxxxx/node: unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication

Please let me know if I should try any further. Thanks!

@mazdakn
Copy link
Member

mazdakn commented Jan 4, 2024

@msilcher thanks for willingness to help. I realised this PR won't fix the issue, so I am still working on a fix for it. I'll ping you once I have a proper fix.

@msilcher
Copy link

msilcher commented Jan 4, 2024

@msilcher thanks for willingness to help. I realised this PR won't fix the issue, so I am still working on a fix for it. I'll ping you once I have a proper fix.

Great, thanks for letting me know.

@msilcher
Copy link

any news about this?

@mazdakn
Copy link
Member

mazdakn commented Jan 12, 2024

@msilcher we have a fix without bumping ipset version. It is under final review and more testing. I can provide you with an image to test it as well.

@mazdakn mazdakn linked a pull request Jan 12, 2024 that will close this issue
3 tasks
@msilcher
Copy link

@msilcher we have a fix without bumping ipset version. It is under final review and more testing. I can provide you with an image to test it as well.

sound good! I'm willing to test if you want to share the mentioned image

@mazdakn
Copy link
Member

mazdakn commented Jan 12, 2024

@msilcher Thanks for the help. The fix is in this image: mazdakrn/node:latest. Please let try and let us know about the result. The images is based on master branch, so it should be tested in a cluster based on master or v3.27 (since there should be little difference).

@msilcher
Copy link

@msilcher Thanks for the help. The fix is in this image: mazdakrn/node:latest. Please let try and let us know about the result. The images is based on master branch, so it should be tested in a cluster based on master or v3.27 (since there should be little difference).

So far so good, no errors seen on calico-node pod. Other containers that depend on calico services that were failing before are working fine now! I'll do some more testing later but it looks promising.

@mazdakn
Copy link
Member

mazdakn commented Jan 13, 2024

@msilcher thanks, please let us know when you have performed more tests.

@fasaxc
Copy link
Member

fasaxc commented Jan 15, 2024

Glad to hear it, @msilcher, @mazdakn has fixed this by only ever reading our own IP sets. Should be more robust in future since we should never try to list kube-proxy's IP set.

@msilcher
Copy link

@msilcher thanks, please let us know when you have performed more tests.

I plan to do a full upgrade from k8s 1.28.x to 1.29 using the test image to se if everything runs smooth afterwards. Probably I'll be able to do this tomorrow or the day after. I'll report back once I did.

@mazdakn
Copy link
Member

mazdakn commented Jan 16, 2024

Closing this since the fix is merged to master and also back ported to v3.27.
@msilcher please let us know once you performed more tests.

@mazdakn mazdakn closed this as completed Jan 16, 2024
@msilcher
Copy link

msilcher commented Jan 16, 2024

Closing this since the fix is merged to master and also back ported to v3.27. @msilcher please let us know once you performed more tests.

I switched calico node images in k8s 1.28.5 and tested, everything worked fine. Then upgraded the cluster to k8s 1.29.0 and checked again. Everything is still working fine in my test environment (Ingress, cert-manager, MetalLB, Elasticsearch, Grafana, Cloudflare argo tunnel, Pihole).
No error on calico-node pod at all, just info messages :)
Note: Using dual stack (IPv4 & IPv6) BTW.
calico-node.txt

@mazdakn
Copy link
Member

mazdakn commented Jan 16, 2024

Great, thanks @msilcher for testing:-)

@msilcher
Copy link

Great, thanks @msilcher for testing:-)

You're welcome! I expect to see this fix in 3.27.1 :)
any ETA for this version?

@mazdakn
Copy link
Member

mazdakn commented Jan 16, 2024

@msilcher I don't have an exact date, but most likely mid Feb it should be released.

dghubble added a commit to poseidon/terraform-render-bootstrap that referenced this issue Feb 25, 2024
dghubble added a commit to poseidon/typhoon that referenced this issue Feb 25, 2024
* Update fixes Calico incompatibility with Fedora CoreOS

Rel: projectcalico/calico#8372
dghubble added a commit to poseidon/typhoon that referenced this issue Feb 25, 2024
* Update fixes Calico incompatibility with Fedora CoreOS

Rel: projectcalico/calico#8372
@arana198
Copy link

arana198 commented Apr 30, 2024

Has this been fixed?

My System:

  • 4 * Raspberry PI 5B with Alpine OS
  • 1 * intel nuc running ubuntu server
  • Running k0s

I have same similar issue with the following error on all nodes:

2024-04-30 12:40:40.998 [INFO][11817] felix/int_dataplane.go 1387: Linux interface state changed. ifIndex=924 ifaceName="calico_tmp_B" state=""
2024-04-30 12:40:40.998 [INFO][11817] felix/int_dataplane.go 1431: Linux interface addrs changed. addrs=<nil> ifaceName="calico_tmp_B"
2024-04-30 12:40:40.999 [ERROR][11817] felix/ipsets.go 656: Bad return code from 'ipset list cali40this-host'. error=exit status 1 family="inet" stderr="ipset v7.11: Kernel and userspace incompatible: settype hash:ip with revision 6 not supported by userspace.\n"
2024-04-30 12:40:41.000 [ERROR][11817] felix/ipsets.go 415: Failed to parse ipset cali40this-host error=exit status 1 family="inet"
2024-04-30 12:40:41.000 [WARNING][11817] felix/ipsets.go 346: Failed to resync with dataplane error=exit status 1 family="inet"
2024-04-30 12:40:41.016 [INFO][11817] felix/ipsets.go 337: Retrying after an ipsets update failure... family="inet"

The OS runs

ipset --version
ipset v7.19, protocol version: 7

When I run

Alpine OS

./calicoctl node checksystem
Checking kernel version...
		6.6.14-0-rpi        					OK
Checking kernel modules...
WARNING: Unable to detect the xt_rpfilter module as Loaded/Builtin module or lsmod
		xt_rpfilter         					FAIL
		xt_addrtype         					OK
		xt_multiport        					OK
		xt_u32              					OK
		xt_bpf              					OK
		ipt_rpfilter        					OK
WARNING: Unable to detect the xt_icmp module as Loaded/Builtin module or lsmod
		xt_icmp             					FAIL
		ip_set              					OK
		ip_tables           					OK
		nf_conntrack_netlink					OK
		xt_conntrack        					OK
		xt_set              					OK
		ip6_tables          					OK
WARNING: Unable to detect the ipt_set module as Loaded/Builtin module or lsmod
		ipt_set             					FAIL
		ipt_REJECT          					OK
WARNING: Unable to detect the xt_icmp6 module as Loaded/Builtin module or lsmod
		xt_icmp6            					FAIL
		xt_mark             					OK
WARNING: Unable to detect the ipt_ipvs module as Loaded/Builtin module or lsmod
		ipt_ipvs            					FAIL
WARNING: Unable to detect the vfio-pci module as Loaded/Builtin module or lsmod
		vfio-pci            					FAIL
System doesn't meet one or more minimum systems requirements to run Calico

Ubuntu server is fine but the same error:

sudo ./calicoctl node checksystem
Checking kernel version...
		6.5.0-28-generic    					OK
Checking kernel modules...
		ip6_tables          					OK
		ipt_ipvs            					OK
		vfio-pci            					OK
		xt_bpf              					OK
		ipt_REJECT          					OK
		ipt_set             					OK
		xt_conntrack        					OK
		xt_u32              					OK
		ip_tables           					OK
		xt_mark             					OK
		xt_rpfilter         					OK
		nf_conntrack_netlink					OK
		xt_icmp             					OK
		xt_multiport        					OK
		xt_set              					OK
		ip_set              					OK
		ipt_rpfilter        					OK
		xt_addrtype         					OK
		xt_icmp6            					OK
System meets minimum system requirements to run Calico!

Any help or direction would be appreciated

@fasaxc
Copy link
Member

fasaxc commented Apr 30, 2024

@arana198 Yes, it was fixed, if you're seeing it again, please upgrade to the latest Calico version, try again. Then, if you still see it, open a new issue.

@arana198
Copy link

I am using the latest version v3.27.3 - I'll raise another issue

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment