Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iptables not properly set when using dual stack with ipv6 #7211

Closed
HOSTED-POWER opened this issue Apr 4, 2023 · 20 comments
Closed

Iptables not properly set when using dual stack with ipv6 #7211

HOSTED-POWER opened this issue Apr 4, 2023 · 20 comments
Assignees
Milestone

Comments

@HOSTED-POWER
Copy link

HOSTED-POWER commented Apr 4, 2023

Environmental Info:
K3s Version: v1.25.8+k3s1

Node(s) CPU architecture, OS, and Version: 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux

Cluster Configuration: 1 server

Describe the bug:
When using configserver (csf) with predefined iptables rules, we never had any issues. K3S properly creates all firewall rules (using ipv4 only). Now with ipv6 activated we get timeout

For example: [WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.43.0.1:443/version": dial tcp 10.43.0.1:443: i/o timeout

But this never recovers...

Steps To Reproduce:
Install k3s with these arguments:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="{v1.25.8+k3s1}" K3S_KUBECONFIG_MODE="644" INSTALL_K3S_EXEC="server --disable=traefik --cluster-cidr=10.42.0.0/16,fc00:a0::/64 --service-cidr=10.43.0.0/16,2001:cafe:42:1::/112 --flannel-ipv6-masq" sh -

And you get the non working situation when csf (consfi)

Install like this, and there are 0 issues in the same environment:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="{v1.25.8+k3s1}" K3S_KUBECONFIG_MODE="644" INSTALL_K3S_EXEC="server --disable=traefik" sh -

Another solution:

iptables -I INPUT -d 10.43.0.0/16 -j ACCEPT
iptables -I OUTPUT -d 10.43.0.0/16 -j ACCEPT
iptables -I INPUT -d 10.42.0.0/16 -j ACCEPT
iptables -I OUTPUT -d 10.42.0.0/16 -j ACCEPT

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="{v1.25.8+k3s1}" K3S_KUBECONFIG_MODE="644" INSTALL_K3S_EXEC="server --disable=traefik --cluster-cidr=10.42.0.0/16,fc00:a0::/64 --service-cidr=10.43.0.0/16,2001:cafe:42:1::/112 --flannel-ipv6-masq" sh -

Also leads to a working installation ...

Expected behavior:
A properly working k3s, also with ipv6

Actual behavior:
Not working, coredns and metrics service keeps crashing and get into crashloopbackoff, also there seems a chain what normally empty which is being filled:

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
REJECT     tcp  --  anywhere             10.43.0.10           /* kube-system/kube-dns:dns-tcp has no endpoints */ tcp dpt:domain reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             10.43.0.10           /* kube-system/kube-dns:metrics has no endpoints */ tcp dpt:9153 reject-with icmp-port-unreachable
REJECT     udp  --  anywhere             10.43.0.10           /* kube-system/kube-dns:dns has no endpoints */ udp dpt:domain reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             10.43.92.92          /* kube-system/metrics-server:https has no endpoints */ tcp dpt:https reject-with icmp-port-unreachable

Additional context / logs:

[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.43.0.1:443/version": dial tcp 10.43.0.1:443: i/o timeout
time="2023-04-04T20:16:21Z" level=fatal msg="Error starting daemon: Cannot start Provisioner: failed to get Kubernetes server version: Get "https://10.43.0.1:443/version?timeout=32s\": dial tcp 10.43.0.1:443: i/o timeout"
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.43.0.1:443/version": dial tcp 10.43.0.1:443: i/o timeout

Warning Unhealthy 3m14s (x2 over 3m17s) kubelet Readiness probe failed: Get "https://10.42.0.3:10250/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 2m49s (x16 over 3m16s) kubelet Readiness probe failed: Get "https://10.42.0.3:10250/readyz": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 2m48s kubelet Readiness probe failed: Get "https://10.42.0.3:10250/readyz": dial tcp 10.42.0.3:10250: connect: connection refused

Some logs:

daemon.log
not-working-kube.txt
working-kube.txt

@brandond
Copy link
Member

brandond commented Apr 4, 2023

When using configserver (csf) with predefined iptables rules, we never had any issues. K3S properly creates all firewall rules (using ipv4 only). Now with ipv6 activated we get timeout

Please take a look at the thread over at #7203 (comment) - can you confirm whether or not you have a default-drop or default-deny rule at the end of your INPUT chain?

This workstation, because it does have some firewalling enabled (corporate policies) has the INPUT chain configured with a default policy of DROP. It has rules to accept local traffic from the "normal" interfaces, but cni0 is not part of those rules.

@brandond
Copy link
Member

brandond commented Apr 4, 2023

cc @rbrtbnfgl since I believe this is related to the kube-router change we're discussing in that other thread.

@rbrtbnfgl
Copy link
Contributor

rbrtbnfgl commented Apr 4, 2023

Yes but I don't know if we had to change to ACCEPT again only for the INPUT chain when a node has a firewall that drops all traffic on it. This was the reason of upstream to maintain ACCEPT.

@brandond
Copy link
Member

brandond commented Apr 4, 2023

It does feel like the kube-router default ACCEPT rule has been covering up problems for a lot of folks that SHOULD have opened up their firewall rules for K3s. Now that we just RETURN they are running into problems because they didn't properly configure their iptables rules for K3s.

On the one hand I want to say it's working as designed (users need to properly configure their host iptables rules if they are blocking traffic), on the other it is a breaking change for users who are upgrading and suddenly configurations that previously worked now do not.

I wonder if there is a way to fix the timeout issue in #6691 without also breaking clusters when users upgrade on a node that doesn't have properly configured user-managed iptables rules.

@HOSTED-POWER
Copy link
Author

I lost more than a day trying to fix this, what kind of "preparation" for iptables is needed? I can add them if I know which ones :)

@HOSTED-POWER
Copy link
Author

It's just bad luck that we started with this ipv6 implementation at the same time 1.25.8 replaced 1.25.7, since indeed, I now installed the v1.25.7 and I have 0 issues. It's curious this happens with a minor version update, never expected this.

This is really a very breaking change as far as I can see :(

@rbrtbnfgl
Copy link
Contributor

if you are using some firewall on the node it should be documented on the docs https://docs.k3s.io/advanced#additional-os-preparations
It says to add pod and services IPs on the trusted zone.

@brandond
Copy link
Member

brandond commented Apr 5, 2023

While I appreciate that the current behavior is probably correct from a security perspective, I am very concerned that it is also a breaking change for many users who were relying on the old behavior for proper functioning of their cluster.

@rbrtbnfgl would it be possible to put the allow/return behavior behind a CLI flag that defaults to the old ALLOW by default?

cc @cwayne18 @caroline-suse-rancher

@rbrtbnfgl
Copy link
Contributor

rbrtbnfgl commented Apr 5, 2023

It does feel like the kube-router default ACCEPT rule has been covering up problems for a lot of folks that SHOULD have opened up their firewall rules for K3s. Now that we just RETURN they are running into problems because they didn't properly configure their iptables rules for K3s.

On the one hand I want to say it's working as designed (users need to properly configure their host iptables rules if they are blocking traffic), on the other it is a breaking change for users who are upgrading and suddenly configurations that previously worked now do not.

I wonder if there is a way to fix the timeout issue in #6691 without also breaking clusters when users upgrade on a node that doesn't have properly configured user-managed iptables rules.

The issue is related to the iptables rules added by kube-router at the begin of the chain. The packets are rightly marked if they need to be accepted but the ACCEPT rule is executed before the other rules on the chain. If the ACCEPT rule that match the mark is appended at the end of the chain the packets are accepted by kube-router rule but after they check every the rules on that chain.

@HOSTED-POWER
Copy link
Author

While I appreciate that the current behavior is probably correct from a security perspective, I am very concerned that it is also a breaking change for many users who were relying on the old behavior for proper functioning of their cluster.

@rbrtbnfgl would it be possible to put the allow/return behavior behind a CLI flag that defaults to the old ALLOW by default?

cc @cwayne18 @caroline-suse-rancher

To be honest we were very happy with the current implementation, it made it very easy and always working. I don't see a lot of advantage to change this, since we need to open it anyway. Why not rely on k3s to do this for us?

@rbrtbnfgl
Copy link
Contributor

I can easily change kube-router to add the ACCEPT rules at the end of the chain to maintain the previous behaviour and the fix for #6691

@brandond
Copy link
Member

brandond commented Apr 5, 2023

@rbrtbnfgl do you think you can get that in for the next release? If so I believe that would probably save us a lot of additional issues.

@HOSTED-POWER
Copy link
Author

Wow I can't wait to test the fix, is there an easy way to do this? Or will it be released soon?

PS: Do we really need to uninstall k3s completely if we want to enable ipv6 on a already installed k3s? That seems a lot of hassle, but it seems to be noted in the documentation?

@rbrtbnfgl
Copy link
Contributor

it's enough to run k3s-killall.sh and then start K3s

@brandond
Copy link
Member

brandond commented Apr 7, 2023

Wow I can't wait to test the fix, is there an easy way to do this? Or will it be released soon?

See #7203 (comment)
As @rbrtbnfgl you will need to use k3s-killall.sh to clear the iptables rules, before starting the new version.

PS: Do we really need to uninstall k3s completely if we want to enable ipv6 on a already installed k3s? That seems a lot of hassle, but it seems to be noted in the documentation?

If you want to have a dual-stack cluster, yes you should configure the dual-stack CIDRs when starting the server for the first time.

@HOSTED-POWER
Copy link
Author

ok that's confusing @brandond , can we get away with the kill? What with single node installs?

The fix doesn't work for our use case it seems, but I commented in the other thread

@HOSTED-POWER
Copy link
Author

Hi Brandond, we have a bunch of servers to upgrade with ipv6 support. Is the killall usable or not? :)

Is there any way/command to show the current k3s_install_options ? This could come in handy as well when running upgrades

For the rest, I tested the rc1 of 1.25 and it seems resolved so far!

@brandond
Copy link
Member

Closing as duplicate of #7203

@github-project-automation github-project-automation bot moved this from To Test to Done Issue in K3s Development Apr 18, 2023
@proligde
Copy link

@rbrtbnfgl I stumbled across this issue after my (and my collegues') local k3s-backed dev environments stopped working after upgrading to 1.26.4.

My scenario is, that we use k3s as local development environment and route all our local domains to 127.0.0.1. This worked for years now, but all of a sudden ports 80 and 443 running through the svclb stopped responding on 127.0.0.1 while still working on the LAN addresses like 192.168....

After pinning the problem down to the k3s version used, I could confirm it still worked on k3s v 1.25.6, but not anymore on 1.25.9, 1.26.4 and 1.27.1 which led me to this very ticket and the PR mentioned below.

To be honest – I don't understand how PR #7218 should produce that behavior. On the other hand my iptables and containerd routing knowledge is very limited. So I'm wondering - is this just a red herring or what did I misconfigure here?

thanks so much in advance - Max

@rbrtbnfgl
Copy link
Contributor

Hi @proligde could you open an issue with your setup config?

@k3s-io k3s-io locked and limited conversation to collaborators May 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants