Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kine is not working on RHEL based OS (RPM based installs and SELinux is enabled) #5924

Open
mdrahman-suse opened this issue May 17, 2024 · 2 comments
Assignees
Labels
kind/bug Something isn't working

Comments

@mdrahman-suse
Copy link
Contributor

Environmental Info:
RKE2 Version:

rke2 version v1.30.0+rke2r1 (60e06c4dbccff996f717af8f4c532971f57264b4)
go version go1.22.2 X:boringcrypto

Also with v1.29.4+rke2r1
Node(s) CPU architecture, OS, and Version:

cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.7 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.7"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.7 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.7
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.7"
[ec2-user@ip-172-31-3-155 ~]$ uname -a
Linux  4.18.0-425.3.1.el8.x86_64 #1 SMP Fri Sep 30 11:45:06 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

1 server, 1 external db

Describe the bug:

rke2-server is failing to start service with error when a datastore-endpoint is added to its configuration on an RHEL based OS and default installation method via RPM

  • config.yaml
write-kubeconfig-mode: "0644"
tls-san:
  - fake.fqdn.value
node-name: server1
kine-tls: true
node-external-ip: <public-ip>
node-ip: <private-ip>
datastore-endpoint: mysql://<user>:<password>@tcp(<db-instance-url>:3306)/<dbname>

NOTE: The same server setup and config works fine with tar install

Steps To Reproduce:

  • Setup DB in RDS
  • Installed RKE2 with the provided config and should install via RPM by default
  • Enable and start rke2 service

Expected behavior:

Cluster comes up successfully

Actual behavior:

rke2-server fails to start with error in the logs

Additional context / logs:

May 17 21:35:36 server1 rke2[18663]: time="2024-05-17T21:35:36Z" level=error msg="Error encountered while importing /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt: failed to pull images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt: image \"index.docker.io/rancher/rke2-cloud-provider:v1.29.3-build20240412\": not found"
May 17 21:35:51 server1 rke2[18663]: time="2024-05-17T21:35:51Z" level=error msg="Error encountered while importing /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt: failed to pull images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt: image \"index.docker.io/rancher/hardened-kubernetes:v1.30.0-rke2r1-build20240506\": not found"
May 17 21:35:58 server1 rke2[18663]: time="2024-05-17T21:35:58Z" level=error msg="Error encountered while importing /var/lib/rancher/rke2/agent/images/runtime-image.txt: failed to pull images from /var/lib/rancher/rke2/agent/images/runtime-image.txt: image \"index.docker.io/rancher/rke2-runtime:v1.30.0-rke2r1\": not found"
May 17 21:50:17 server1 rke2[18663]: time="2024-05-17T21:50:17Z" level=error msg="Failed to save TLS secret after controller init: timed out waiting for the condition"
May 17 21:50:24 server1 rke2[19581]: time="2024-05-17T21:50:24Z" level=info msg="Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /run/k3s/containerd/containerd.sock: connect: connection refused\""
May 17 22:05:31 server1 rke2[20423]: time="2024-05-17T22:05:31Z" level=info msg="Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /run/k3s/containerd/containerd.sock: connect: connection refused\""
@mdrahman-suse mdrahman-suse added the kind/bug Something isn't working label May 17, 2024
@vitorsavian vitorsavian self-assigned this May 20, 2024
@mdrahman-suse
Copy link
Contributor Author

mdrahman-suse commented May 28, 2024

So apparently it looks like selinux policies are not compatible with Kine, thanks @vitorsavian for identifying the root cause. Once I disabled selinux from the RHEL OS, I was able to create a cluster

$ sestatus
SELinux status:                 disabled

$ rke2 -v
rke2 version v1.30.0+rke2r1 (60e06c4dbccff996f717af8f4c532971f57264b4)
go version go1.22.2 X:boringcrypto

$ kga
NAME              STATUS   ROLES                  AGE     VERSION          INTERNAL-IP   EXTERNAL-IP    OS-IMAGE                               KERNEL-VERSION              CONTAINER-RUNTIME
node/server1      Ready    control-plane,master   7m31s   v1.3x.0+rke2r1   xxx.xx.x.15   x.xxx.x.190    Red Hat Enterprise Linux 8.7 (Ootpa)   4.18.0-425.3.1.el8.x86_64   containerd://1.7.11-k3s2

NAMESPACE     NAME                                                                     READY   STATUS      RESTARTS   AGE     IP            NODE      NOMINATED NODE   READINESS GATES
kube-system   pod/kube-scheduler-server1                                               1/1     Running     0          7m29s   xxx.xx.x.15   server1   <none>           <none>
kube-system   pod/kube-apiserver-server1                                               1/1     Running     0          7m27s   xxx.xx.x.15   server1   <none>           <none>
kube-system   pod/kube-controller-manager-server1                                      1/1     Running     0          7m29s   xxx.xx.x.15   server1   <none>           <none>
kube-system   pod/cloud-controller-manager-server1                                     1/1     Running     0          7m27s   xxx.xx.x.15   server1   <none>           <none>
kube-system   pod/kube-proxy-server1                                                   1/1     Running     0          7m22s   xxx.xx.x.15   server1   <none>           <none>
kube-system   pod/helm-install-rke2-coredns-q5rrx                                      0/1     Completed   0          7m12s   xxx.xx.x.15   server1   <none>           <none>
kube-system   pod/helm-install-rke2-canal-mglvd                                        0/1     Completed   0          7m12s   xxx.xx.x.15   server1   <none>           <none>
kube-system   pod/rke2-canal-knvld                                                     2/2     Running     0          6m54s   xxx.xx.x.15   server1   <none>           <none>
kube-system   pod/helm-install-rke2-snapshot-controller-crd-msml2                      0/1     Completed   0          7m12s   xx.xx.x.2     server1   <none>           <none>
kube-system   pod/helm-install-rke2-metrics-server-84566                               0/1     Completed   0          7m12s   xx.xx.x.3     server1   <none>           <none>
kube-system   pod/rke2-coredns-rke2-coredns-autoscaler-5749cd7b8b-r58f9                1/1     Running     0          6m55s   xx.xx.x.5     server1   <none>           <none>
kube-system   pod/helm-install-rke2-snapshot-controller-mfncj                          0/1     Completed   0          7m12s   xx.xx.x.7     server1   <none>           <none>
kube-system   pod/rke2-snapshot-controller-7dcf5d5b46-992x7                            1/1     Running     0          5m58s   xx.xx.x.10    server1   <none>           <none>
kube-system   pod/helm-install-rke2-snapshot-validation-webhook-ngpcq                  0/1     Completed   0          7m12s   xx.xx.x.8     server1   <none>           <none>
kube-system   pod/rke2-snapshot-validation-webhook-bf7bbd6fc-k52zs                     1/1     Running     0          5m55s   xx.xx.x.11    server1   <none>           <none>
kube-system   pod/rke2-metrics-server-868fc8795f-49h2q                                 1/1     Running     0          6m4s    xx.xx.x.9     server1   <none>           <none>
kube-system   pod/rke2-coredns-rke2-coredns-64dcf4f58b-v9m7d                           1/1     Running     0          6m55s   xx.xx.x.4     server1   <none>           <none>
kube-system   pod/helm-install-rke2-ingress-nginx-f6fkx                                0/1     Completed   0          7m12s   xx.xx.x.6     server1   <none>           <none>
kube-system   pod/rke2-ingress-nginx-controller-pd6g6                                  1/1     Running     0          5m38s   xx.xx.x.13    server1   <none>           <none>

I still think its an issue likely as the RHEL OSs have selinux enabled by default

@brandond
Copy link
Member

Sounds like we'll need changes to rke2-selinux?

@mdrahman-suse mdrahman-suse changed the title Kine is not working on RHEL based OS (RPM based installs) Kine is not working on RHEL based OS (RPM based installs and SELinux is enabled) Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants