etcd stuck in crashloopbackoff: permission denied to read config #1494

dajester2013 · 2021-07-30T13:09:59Z

Environmental Info:
RKE2 Version:
rke2 version v1.21.3+rke2r1 (2ed0b0d)
go version go1.16.6b7

Node(s) CPU architecture, OS, and Version:
I do not have access at the moment, but it is a VM running RHEL 7.9, FIPS mode.

Cluster Configuration:
3 servers, but this error is happening on the first server I'm trying to deploy to, have not attempted the other servers.

Describe the bug:
etcd will not start with selinux: true and profile: cis-1.6 - it gets stuck in a crash loop stating permission denied.

Steps To Reproduce:

Installed RKE2: I used the quick-start script curl -sfL https://get.rke2.io | sh -

Verified installed: rke2-selinux, rke2-common, rke2-server

# yum list installed | grep rke2
rke2-common.x86_64 ...
rke2-selinux.noarch ...
rke2-server.x86_64 ...

Copied sysctl config

# sudo cp -f /usr/share/rke2/rke2-cis-sysctl.conf /etc/sysctl.d/60-rke2-cis.conf
# sudo systemctl restart systemd-sysctl

Created etcd user+group

# useradd -r -c "etcd user" -s /sbin/nologin -M etcd -U

Configuration for rke2:

selinux: true
profile: cis-1.6
kube-apiserver-arg: tls-min-version=VersionTLS12
kube-scheduler-arg: tls-min-version=VersionTLS12
kubelet-arg: feature-gates=DynamicKubeletConfig=false
disable: rke2-ingress-nginx

Start rke2 server

# rke2 server
... many log statements until it gets stuck in a loop waiting for etcd to start ...

Expected behavior:
etcd and related containers start normally

Actual behavior:
etcd gets stuck in an error loop

Additional context / logs:
etcd container logs

# crictl logs <etcd container id>

Verify etcd uid/gid:

# id etcd
uid=976(etcd) gid=976(etcd) groups=976(etcd)

etcd container security settings

# crictl inspect <etcd container id>

audit log search:
this one is interesting as it repeatedly shows these sync_file_range SYSCALL's as success=no

# ausearch -x etcd
...
----
time->Thu Jul 29 14:50:32 2021
type=PROCTITLE msg=audit(1627584632.141:637479): proctitle=72756E6300696E6974
type=PATH msg=audit(1627584632.141:637479): item=2 name="/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" objtype=UNKNOWN cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
type=PATH msg=audit(1627584632.141:637479): item=1 name="/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" objtype=UNKNOWN cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
type=PATH msg=audit(1627584632.141:637479): item=0 name="/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" objtype=UNKNOWN cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
type=CWD msg=audit(1627584632.141:637479):  cwd="/"
type=SYSCALL msg=audit(1627584632.141:637479): arch=c000003e syscall=257 success=no exit=-13 a0=ffffff9c a1=198d9e0 a2=0 a3=0 items=3 ppid=31482 pid=2327 auid=1016 uid=976 gid=976 euid=976 suid=976 fsuid=976 egid=976 sgid=976 fsgid=976 tty=(none) ses=262 comm="etcd" exe="/usr/local/bin/etcd" subj=system_u:system_r:rke2_service_db_t:s0:c369,c904 key="access"
----
time->Thu Jul 29 14:50:32 2021
type=PROCTITLE msg=audit(1627584632.162:637480): proctitle=72756E6300696E6974
type=PATH msg=audit(1627584632.162:637480): item=2 name="/var/lib/rancher/rke2/server/db/etcd/config" objtype=UNKNOWN cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
type=PATH msg=audit(1627584632.162:637480): item=1 name="/var/lib/rancher/rke2/server/db/etcd/config" objtype=UNKNOWN cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
type=PATH msg=audit(1627584632.162:637480): item=0 name="/var/lib/rancher/rke2/server/db/etcd/config" objtype=UNKNOWN cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
type=CWD msg=audit(1627584632.162:637480):  cwd="/"
type=SYSCALL msg=audit(1627584632.162:637480): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c0000aede0 a2=80000 a3=0 items=3 ppid=31482 pid=2327 auid=1016 uid=976 gid=976 euid=976 suid=976 fsuid=976 egid=976 sgid=976 fsgid=976 tty=(none) ses=262 comm="etcd" exe="/usr/local/bin/etcd" subj=system_u:system_r:rke2_service_db_t:s0:c369,c904 key="access"
----
time->Thu Jul 29 14:50:32 2021
type=PROCTITLE msg=audit(1627584632.162:637481): proctitle=72756E6300696E6974
type=PATH msg=audit(1627584632.162:637481): item=2 name="/etc/localtime" objtype=UNKNOWN cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
type=PATH msg=audit(1627584632.162:637481): item=1 name="/etc/localtime" objtype=UNKNOWN cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
type=PATH msg=audit(1627584632.162:637481): item=0 name="/etc/localtime" objtype=UNKNOWN cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
type=CWD msg=audit(1627584632.162:637481):  cwd="/"
type=SYSCALL msg=audit(1627584632.162:637481): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=c000043760 a2=0 a3=0 items=3 ppid=31482 pid=2327 auid=1016 uid=976 gid=976 euid=976 suid=976 fsuid=976 egid=976 sgid=976 fsgid=976 tty=(none) ses=262 comm="etcd" exe="/usr/local/bin/etcd" subj=system_u:system_r:rke2_service_db_t:s0:c369,c904 key="access"

The text was updated successfully, but these errors were encountered:

dajester2013 · 2021-08-02T12:45:42Z

Additional note, if selinux is false AND profile is null, then rke2 starts no issue. If either selinux is true, OR profile is set (1.5 or 1.6), it gets stuck in this error loop.

cjellick · 2021-08-13T17:25:42Z

@briandowns to reproduce

briandowns · 2021-08-17T15:43:01Z

@dajester2013 I tried to reproduce what you've reported and am having some difficulty in finding similar behavior. Did you have RKE2 installed on the system previously? How did you enable FIPS mode (at install or after)? When was SELinux enabled?

dajester2013 · 2021-08-17T20:06:51Z

@briandowns So, the VMs were provisioned by our customer's IT organization. They were FIPS-enabled / SELinux-enabled from the point of provisioning. I installed RKE2 on these freshly-provisioned systems. I do know they are also running McAfee on these systems, and there are other OS hardening guides they have applied.

briandowns · 2021-08-17T20:08:27Z

Would it be possible to get those additional hardening steps?

briandowns · 2021-08-27T20:59:18Z

@dajester2013 I'm closing this as I can't reproduce in any form. Please feel free to reopen if you can aquire the additional hardening steps that have been applied to the nodes.

dajester2013 · 2021-09-24T15:49:17Z

@briandowns

Sorry for the delay, I was assigned other work, but now am back on this. I updated to 1.21.4, but it is still not working.

I do not know specifically what hardening steps have been taken. I have followed the installation instructions exactly as documented, but I am still stuck with etcd in a crash loop. I have tried everything I know to do, including checking selinux contexts and file permissions. Everything seems to match with my CentOS deployment (which works). The only way I can get it to work is if I disable the selinux and profile options in the config.yaml. It is really odd to me that it only works if it runs without these security settings.

I can explain further if you want to take this offline, even see if it is possible for you to see what we are seeing.

briandowns · 2021-09-24T16:08:29Z

I think we need to know what the additional hardening steps are that the customer is taking so we can possibly determine that gap.

dajester2013 · 2021-09-24T16:16:48Z

It's a STIGed RHEL 7.9 image is all I know.

…

On Fri, Sep 24, 2021, 11:08 Brian Downs ***@***.***> wrote: I think we need to know what the additional hardening steps are that the customer is taking so we can possibly determine that gap. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1494 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABC5FWDOFDSQWHWQAR2F3HLUDSPAPANCNFSM5BIP5XFQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

dajester2013 · 2021-09-24T16:54:17Z

So apparently it is an selinux issue. Placing the system into permissive mode allows everything to start, including the selinux and profile options enabled. I will raise the issue over in the selinux project.

briandowns · 2021-09-24T17:03:35Z

Can you link here to the new issue you raise?

brandond · 2021-09-24T17:30:47Z

FWIW based on the audit logs the denied syscall is openat which shouldn't be something that's blocked by default on systems with selinux enforcing. I am guessing that part of the "STIG" hardening process adds additional syscalls to the restricted list.

dweomer · 2021-09-27T15:54:03Z

Possibly related to containers/container-selinux#147

dajester2013 · 2023-06-02T18:27:06Z

I opened issue #4313, as I encountered it again on a freshly installed RockyLinux 9 with the DoD STIG profile applied.

brandond added this to the v1.21.4+rke2r1 milestone Jul 30, 2021

cjellick assigned briandowns Aug 13, 2021

fapatel1 modified the milestones: v1.21.4+rke2r1, v1.21.3+rke2r2 Aug 13, 2021

briandowns closed this as completed Aug 27, 2021

briandowns reopened this Sep 24, 2021

briandowns closed this as completed Sep 24, 2021

dajester2013 mentioned this issue Sep 24, 2021

etcd stuck in crash loop on selinux-enabled rhel 7.9 rancher/rke2-selinux#20

Open

anmazzotti mentioned this issue May 1, 2024

[SELinux] RKE2 provisioning rancher/elemental#1362

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcd stuck in crashloopbackoff: permission denied to read config #1494

etcd stuck in crashloopbackoff: permission denied to read config #1494

dajester2013 commented Jul 30, 2021 •

edited

Loading

dajester2013 commented Aug 2, 2021

cjellick commented Aug 13, 2021

briandowns commented Aug 17, 2021

dajester2013 commented Aug 17, 2021

briandowns commented Aug 17, 2021

briandowns commented Aug 27, 2021

dajester2013 commented Sep 24, 2021

briandowns commented Sep 24, 2021

dajester2013 commented Sep 24, 2021 via email

dajester2013 commented Sep 24, 2021 •

edited

Loading

briandowns commented Sep 24, 2021

brandond commented Sep 24, 2021

dweomer commented Sep 27, 2021

dajester2013 commented Jun 2, 2023

etcd stuck in crashloopbackoff: permission denied to read config #1494

etcd stuck in crashloopbackoff: permission denied to read config #1494

Comments

dajester2013 commented Jul 30, 2021 • edited Loading

dajester2013 commented Aug 2, 2021

cjellick commented Aug 13, 2021

briandowns commented Aug 17, 2021

dajester2013 commented Aug 17, 2021

briandowns commented Aug 17, 2021

briandowns commented Aug 27, 2021

dajester2013 commented Sep 24, 2021

briandowns commented Sep 24, 2021

dajester2013 commented Sep 24, 2021 via email

dajester2013 commented Sep 24, 2021 • edited Loading

briandowns commented Sep 24, 2021

brandond commented Sep 24, 2021

dweomer commented Sep 27, 2021

dajester2013 commented Jun 2, 2023

dajester2013 commented Jul 30, 2021 •

edited

Loading

dajester2013 commented Sep 24, 2021 •

edited

Loading