Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[firewalld] kind doesn't work on Fedora 32 #1547

Closed
christianh814 opened this issue May 2, 2020 · 20 comments
Closed

[firewalld] kind doesn't work on Fedora 32 #1547

christianh814 opened this issue May 2, 2020 · 20 comments
Assignees
Labels
good first issue help wanted kind/bug kind/documentation kind/external priority/important-soon

Comments

@christianh814
Copy link

christianh814 commented May 2, 2020

What happened:

After upgrading to Fedora 32, I can no longer create a kind cluster.

What you expected to happen:

My kind cluster to get created

How to reproduce it (as minimally and precisely as possible):

kind create cluster --config=config.yaml

Were config.yaml is...

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
networking:
  disableDefaultCNI: True
  podSubnet: "10.254.0.0/16"
  serviceSubnet: "172.30.0.0/16"
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
  extraPortMappings:
  - containerPort: 80
    hostPort: 80
    listenAddress: 0.0.0.0
  - containerPort: 443
    hostPort: 443
    listenAddress: 0.0.0.0
- role: worker
- role: worker

Anything else we need to know?:

Output/trace of running with -v 10 https://gist.github.com/christianh814/abbf1964b9224c8940864d02b9236128

I figured maybe something was stale and ran docker network rm kind and re-ran the command. This time I looked at the logs on my laptop and saw...

May 01 16:51:17 laptop audit[98494]: SERVICE_STOP pid=98494 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:spc_t:s0 msg='unit=kubelet comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
May 01 16:51:17 laptop audit[98423]: SERVICE_STOP pid=98423 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:spc_t:s0 msg='unit=kubelet comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

Okay...so I docker exec into one of the workers and saw...

May 01 23:44:19 kind-worker systemd[1]: kubelet.service: Main process exited, code=exited, status=255/EXCEPTION
May 01 23:44:19 kind-worker systemd[1]: kubelet.service: Failed with result 'exit-code'.
May 01 23:44:20 kind-worker systemd[1]: kubelet.service: Service RestartSec=1s expired, scheduling restart.
May 01 23:44:20 kind-worker systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 217.
May 01 23:44:20 kind-worker systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
May 01 23:44:20 kind-worker systemd[1]: Started kubelet: The Kubernetes Node Agent.
May 01 23:44:20 kind-worker kubelet[3469]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
May 01 23:44:20 kind-worker kubelet[3469]: F0501 23:44:20.360424    3469 server.go:199] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory

And indeed it's not there

root@kind-worker:/var/lib# ls -1 /var/lib/kubelet/config.yaml
ls: cannot access '/var/lib/kubelet/config.yaml': No such file or directory

Strange that kind create cluster DOES work fine.

Environment:

  • kind version: (use kind version):
$ kind version
kind v0.8.0 go1.14.2 linux/amd64
  • Kubernetes version: (use kubectl version):
$ kubectl version --client
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info):
$ docker version
Client:
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.14rc1
 Git commit:        afacb8b
 Built:             Mon Mar 16 15:45:37 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.14rc1
  Git commit:       afacb8b
  Built:            Mon Mar 16 00:00:00 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.3
  GitCommit:        
 runc:
  Version:          1.0.0-rc10+dev
  GitCommit:        fbdbaf85ecbc0e077f336c03062710435607dbf1
 docker-init:
  Version:          0.18.0
  GitCommit:        
  • OS (e.g. from /etc/os-release):
$ cat /etc/fedora-release 
Fedora release 32 (Thirty Two)
$  uname -a
Linux laptop 5.6.7-300.fc32.x86_64 #1 SMP Thu Apr 23 14:13:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
@christianh814 christianh814 added the kind/bug label May 2, 2020
@BenTheElder
Copy link
Member

BenTheElder commented May 2, 2020

/var/lib/kubelet/config.yaml does not exist initially, this is normal. I wish kubeadm would make this clearer :/

during kubeadm's bootstrapping the kubelet config does not exist initially and kubelet is crashlooping until the config is populated.

@BenTheElder
Copy link
Member

BenTheElder commented May 2, 2020

can you share the full kind export logs in an archive? there's not a lot to go on here short of getting my hands on an identical host...

@BenTheElder
Copy link
Member

BenTheElder commented May 2, 2020

this config works on ubuntu 20.04 and kind v0.8.1 w/ ipv6 disabled. will have to reboot to sanity check the more common ipv6 enabled.

@christianh814
Copy link
Author

christianh814 commented May 2, 2020

So it failed again with the following config...

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
networking:
  disableDefaultCNI: True
  podSubnet: "10.254.0.0/16"
  serviceSubnet: "172.30.0.0/16"
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
- role: worker
- role: worker

So I tried a simpler config...

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: worker

So it's network related. I'll try 0.8.1 to see.

I'll also upload the logs

@BenTheElder BenTheElder self-assigned this May 2, 2020
@christianh814
Copy link
Author

christianh814 commented May 2, 2020

v0.8.1 gave me the same result. I believe it's network related

@BenTheElder
Copy link
Member

BenTheElder commented May 2, 2020

[we're debugging in slack, tentatively an issue with firewalld]

xref:
https://www.reddit.com/r/Fedora/comments/fl4wkl/fedora_32_no_external_dns_in_docker_containers/
#1283

@BenTheElder
Copy link
Member

BenTheElder commented May 2, 2020

for anyone following along, discussion in this thread: https://kubernetes.slack.com/archives/CEKK1KTN2/p1588378366006900

@BenTheElder
Copy link
Member

BenTheElder commented May 2, 2020

looking around it sounds like firewalld and docker do not work well together firewalld/firewalld#461

@BenTheElder
Copy link
Member

BenTheElder commented May 2, 2020

apparently disabling firewalld worked

@BenTheElder BenTheElder changed the title Kind config doesn't work after upgrading to Fedora 32 [firewalld] kind doesn't work on Fedora 32 May 2, 2020
@BenTheElder
Copy link
Member

BenTheElder commented May 2, 2020

i'm not sure what we can do here, based on the logs in slack it seems that firewalld breaks containers being able to reach to each other over a docker network which is standard docker functionality (e.g. compose uses this)

@BenTheElder
Copy link
Member

BenTheElder commented May 2, 2020

@BenTheElder BenTheElder added the kind/external label May 2, 2020
@dfarrell07
Copy link

dfarrell07 commented May 4, 2020

I think short of fully disabling firewalld, you can do:

firewall-cmd --permanent --zone=trusted --add-interface=docker0
firewall-cmd --get-zone-of-interface=<your eth interface>
firewall-cmd --zone=<zone from above> --add-masquerade --permanent
firewall-cmd --reload

@dfarrell07
Copy link

dfarrell07 commented May 4, 2020

firewall-cmd --permanent --zone=trusted --add-interface=docker0
firewall-cmd --get-zone-of-interface=
firewall-cmd --zone= --add-masquerade --permanent
firewall-cmd --reload

(btw this was from docker/for-linux#955 (comment))

Digging more, this seems to get all the Docker-relevant networking working for our CI with Fedora 32 except the KIND bits 😅. I've only gotten KIND working by disabling firewalld and enabling iptables:

sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo dnf install -y iptables-services
sudo touch /etc/sysconfig/iptables
sudo touch /etc/sysconfig/ip6tables
sudo systemctl start iptables
sudo systemctl start ip6tables
sudo systemctl enable iptables
sudo systemctl enable ip6tables
sudo iptables -t filter -F
sudo iptables -t filter -X
sudo systemctl restart docker

@christianh814
Copy link
Author

christianh814 commented May 4, 2020

Update. So on F32, I got it working with Firewalld by changing the FirewallBackend in the /etc/firewalld/firewalld.conf file from nftables to iptables and restarted docker.

# grep 'FirewallBackend=iptables' /etc/firewalld/firewalld.conf 
FirewallBackend=iptables

After I did that, my kind deployments started working "as normal".

@BenTheElder
Copy link
Member

BenTheElder commented May 5, 2020

Seems like somewhere between the upstream projects there's a bug to be fixed here, but this also seems worthy of at least a known-issues entry in our docs with workaround(s).

@dlakatos847
Copy link

dlakatos847 commented May 7, 2020

I think short of fully disabling firewalld, you can do:

firewall-cmd --permanent --zone=trusted --add-interface=docker0
firewall-cmd --get-zone-of-interface=<your eth interface>
firewall-cmd --zone=<zone from above> --add-masquerade --permanent
firewall-cmd --reload

Worked for CentOS 8 too

@BenTheElder
Copy link
Member

BenTheElder commented May 15, 2020

I'm not a fedora or firewalld user, but if someone wants to make an opinion about which fix to take, we should document it on this page https://kind.sigs.k8s.io/docs/user/known-issues/
https://github.com/kubernetes-sigs/kind/blob/master/site/content/docs/user/known-issues.md

@BenTheElder
Copy link
Member

BenTheElder commented May 31, 2020

possibly let's document #1547 (comment)

@BenTheElder BenTheElder added good first issue help wanted kind/documentation priority/important-soon labels May 31, 2020
@BenTheElder
Copy link
Member

BenTheElder commented Jun 17, 2020

workaround and known issue are now documented
#1672

@natafesenko
Copy link

natafesenko commented May 16, 2022

Update. So on F32, I got it working with Firewalld by changing the FirewallBackend in the /etc/firewalld/firewalld.conf file from nftables to iptables and restarted docker.

# grep 'FirewallBackend=iptables' /etc/firewalld/firewalld.conf 
FirewallBackend=iptables

After I did that, my kind deployments started working "as normal".

Thanks for this note @christianh814
that's still relevant for Fedora 35/k3s!
for others - switch FirewallBackend to iptables, restart firewalld and k3d/k3s.
DNS issue's gone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue help wanted kind/bug kind/documentation kind/external priority/important-soon
Projects
None yet
Development

No branches or pull requests

5 participants