RHCOS 8.6 failing on `ext.config.rebuild-selinux-policy` #1036

jlebon · 2022-10-25T19:28:07Z

Latest 8.6 composes are failing on:

=== RUN   ext.config.rebuild-selinux-policy
systemctl status kola-runext.service:
��� kola-runext.service
   Loaded: loaded (/etc/systemd/system/kola-runext.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2022-10-25 16:01:14 UTC; 1s ago
  Process: 2219 ExecStart=/usr/local/bin/kola-runext-test.sh (code=exited, status=1/FAILURE)
 Main PID: 2219 (code=exited, status=1/FAILURE)

Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ ID_LIKE='rhel fedora'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ VERSION=412.86.202210251535-0
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ VERSION_ID=4.12
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ PLATFORM_ID=platform:el8
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ PRETTY_NAME='Red Hat Enterprise Linux CoreOS 412.86.202210251535-0 (Ootpa)'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ ANSI_COLOR='0;31'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ CPE_NAME=cpe:/o:redhat:enterprise_linux:8::coreos
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ HOME_URL=https://www.redhat.com/
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ DOCUMENTATION_URL=https://docs.openshift.com/container-platform/4.12/
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ BUG_REPORT_URL=https://access.redhat.com/labs/rhir/
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ REDHAT_BUGZILLA_PRODUCT='OpenShift Container Platform'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ REDHAT_BUGZILLA_PRODUCT_VERSION=4.12
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ REDHAT_SUPPORT_PRODUCT='OpenShift Container Platform'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ REDHAT_SUPPORT_PRODUCT_VERSION=4.12
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ OPENSHIFT_VERSION=4.12
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ RHEL_VERSION=8.6
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: +++ OSTREE_VERSION=412.86.202210251535-0
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2223]: ++ echo 8.6
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + RHEL_VERSION=8.6
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + echo RHEL_VERSION=8.6
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: RHEL_VERSION=8.6
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + service_should_start=0
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + case "${RHEL_VERSION:-}" in
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + service_should_start=1
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + case "${AUTOPKGTEST_REBOOT_MARK:-}" in
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + grep -qFe 'Recompiling policy' logs.txt
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + cat logs.txt
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: -- Logs begin at Tue 2022-10-25 16:00:32 UTC, end at Tue 2022-10-25 16:01:14 UTC. --
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: Oct 25 16:00:44 localhost systemd[1]: Starting RHEL CoreOS Rebuild SELinux Policy If Necessary...
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: Oct 25 16:00:44 localhost rhcos-rebuild-selinux-policy[1481]: RHEL_VERSION=8.6Checking for policy recompilation
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: Oct 25 16:00:44 localhost rhcos-rebuild-selinux-policy[1486]: -rw-r--r--. 1 root root 8912471 Oct 25 15:43 /etc/selinux/targeted/policy/policy.31
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: Oct 25 16:00:44 localhost rhcos-rebuild-selinux-policy[1486]: -rw-r--r--. 2 root root 8912471 Jan  1  1970 /usr/etc/selinux/targeted/policy/policy.31
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: Oct 25 16:00:44 localhost rhcos-rebuild-selinux-policy[1481]: Recompiling policy due to local modifications as workaround for https://bugzilla.redhat.com/2057497
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2225]: Oct 25 16:00:59 localhost systemd[1]: Started RHEL CoreOS Rebuild SELinux Policy If Necessary.
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + fatal 'Recompiled policy on first boot'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + echo 'Recompiled policy on first boot'
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: Recompiled policy on first boot
Oct 25 16:01:14 qemu0 kola-runext-test.sh[2219]: + exit 1
Oct 25 16:01:14 qemu0 systemd[1]: kola-runext.service: Main process exited, code=exited, status=1/FAILURE
Oct 25 16:01:14 qemu0 systemd[1]: kola-runext.service: Failed with result 'exit-code'.
--- FAIL: ext.config.rebuild-selinux-policy (53.85s)
        cluster.go:162: Error: Unit kola-runext.service exited with code 1
        cluster.go:162: 2022-10-25T16:01:15Z cli: Unit kola-runext.service exited with code 1
        harness.go:1093: kolet failed: : kolet run-test-unit failed: Process exited with status 1

I.e. it seems like we're recompiling the policy on first boot.

The text was updated successfully, but these errors were encountered:

jlebon · 2022-10-25T19:31:12Z

Digging into this, I think it's caused by libsemanage-2.9-9.el8_6, which has this patch: https://pkgs.devel.redhat.com/cgit/rpms/libsemanage/commit/?h=rhel-8.6.0&id=7b7f71ce7cdd6187b33b738bb6866a00f2149772.

Current theory is that ostree admin deploy at create_disk.sh time is recompiling the policy (via ostreedev/ostree#2569). Need to investigate why semodule -N --rebuild-if-modules-changed thinks this isn't a no-op.

jlebon · 2022-10-25T19:33:33Z

FYI @WOnder93

With -9.el8, `ext.config.rebuild-selinux-policy` fails: openshift#1036 We need to debug this, but for now let's unblock CI and dev pipelines.

WOnder93 · 2022-10-26T09:03:06Z

Hm... can you point me to the code behind kola-runext-test.sh?

There is a known quirk that after the linked libsemanage patch, the "no-op" path (i.e. when there are no changes in the modules and only the rest of the content is refreshed) produces a different binary policy than the full rebuild "from scratch". The policies are semantically equal, but some things get ordered differently and the resulting policies don't match byte-to-byte. I suppose this might be confusing the test.

I know this is not ideal, but it would be technically very difficult to make both paths produce an equal result :/

lucab · 2022-10-26T09:08:09Z

@WOnder93 this is the test: https://github.com/openshift/os/blob/master/tests/kola/rebuild-selinux-policy/test.sh
The underlying service logic is at https://github.com/openshift/os/blob/master/overlay.d/05rhcos/usr/libexec/rhcos-rebuild-selinux-policy.

Overall, the "recompile on boot" logic is gated by a cmp --quiet /{usr/,}etc/selinux/targeted/policy/policy.31.
The two files seems to have the exact same size, but their contents don't match byte-to-byte.

lucab · 2022-10-26T09:11:25Z

For reference, all of this comes from #962 as a workaround for https://issues.redhat.com/browse/OCPBUGS-595.

cgwalters · 2022-10-26T12:49:15Z

OK, we just need to patch ostree to turn off this logic on the initial deployment.

cgwalters · 2022-10-26T12:49:57Z

I know this is not ideal, but it would be technically very difficult to make both paths produce an equal result :/

I understand. But longer term, driving binary-level reproducibility into everything we do is important for reproducible builds, binary verification etc.

jlebon · 2022-10-26T13:29:29Z

OK, we just need to patch ostree to turn off this logic on the initial deployment.

That will fix the first boot issue, but I think what we want is to make sure we don't regenerate at all even on the next new deployment, no? So then, maybe a better fix is to run semodule -N --rebuild-if-modules-changed right after we do a full policy build.

jlebon · 2022-10-26T13:31:54Z

There is a known quirk that after the linked libsemanage patch, the "no-op" path (i.e. when there are no changes in the modules and only the rest of the content is refreshed) produces a different binary policy than the full rebuild "from scratch". The policies are semantically equal, but some things get ordered differently and the resulting policies don't match byte-to-byte. I suppose this might be confusing the test.

I know this is not ideal, but it would be technically very difficult to make both paths produce an equal result :/

Is there a ticket somewhere tracking this? Then we could reference it in our code and that way also be able to know when we don't need to work around this issue anymore.

``` With -9.el8, ext.config.rebuild-selinux-policy fails: openshift#1036 We need to debug this, but for now let's unblock CI and dev pipelines. ``` cherry-pick openshift@247e64a

``` [jlebon] With -9.el8, ext.config.rebuild-selinux-policy fails: openshift#1036 We need to debug this, but for now let's unblock CI and dev pipelines. ``` cherry-pick openshift@247e64a

WOnder93 · 2022-10-27T12:54:16Z

The underlying service logic is at https://github.com/openshift/os/blob/master/overlay.d/05rhcos/usr/libexec/rhcos-rebuild-selinux-policy.

Looking at that logic, shouldn't the RHEL version match pattern be 8.[0-5] instead of 8.[0-6]? I thought the plan was to get the patched ostree & libsemanage & policycoreutils backported/tagged to RHEL-8.6 - if that has been achieved, then the above workaround shouldn't need to be activated on RHEL-8.6. Or am I misunderstanding something?

travier · 2022-10-28T08:39:17Z

From memory, we did not fully have things ready in 8.6 at the time we made the workaround. This might have change. We would have to take another look.

travier · 2022-10-28T08:44:07Z

See https://issues.redhat.com/browse/OCPBUGS-595?focusedCommentId=20863310&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-20863310. With https://bugzilla.redhat.com/show_bug.cgi?id=2049186 & https://bugzilla.redhat.com/show_bug.cgi?id=2049189 done and https://bugzilla.redhat.com/show_bug.cgi?id=2057497 (ostree | 2022.2 | 5.el8) tagged in for RHCOS, we should be good.

travier · 2022-10-28T08:45:22Z

If someone else agrees with my assessment, then we can try a revert of this workaround. It's slightly late in 4.12 to do that now but should be good.

cgwalters · 2022-11-01T21:08:04Z

Ugh wait so there's a corollary to this - it looks like for quite some time now we've actually been rebuilding the policy by default on newer systems (current FCOS e.g.). On a stock FCOS I see:

[root@cosa-devsh ~]# rpm-ostree status
State: idle
AutomaticUpdatesDriver: Zincati
  DriverState: active; periodically polling for updates (last checked Tue 2022-11-01 21:02:30 UTC)
Deployments:
● fedora:fedora/x86_64/coreos/next
                  Version: 37.20221021.1.0 (2022-10-24T18:12:48Z)
                   Commit: 5d50e945e2a3aa5bedadb998bfc3d611cfc628412a52346218575d5733c0407a
             GPGSignature: Valid signature by ACB5EE4E831C74BB7C168D27F55AD3FB5323552A

  fedora:fedora/x86_64/coreos/next
                  Version: 37.20220918.1.1 (2022-09-21T21:05:43Z)
                   Commit: 9f38af9a6fc0d38acfbd496199b495482266ba0cad3012410b539916001d36e6
             GPGSignature: Valid signature by ACB5EE4E831C74BB7C168D27F55AD3FB5323552A
[root@cosa-devsh ~]# ostree admin config-diff|grep -i selin
M    selinux/targeted/policy/policy.33
M    selinux/targeted/active/commit_num
M    selinux/targeted/active/policy.kern
A    selinux/targeted/semanage.read.LOCK
A    selinux/targeted/semanage.trans.LOCK
[root@cosa-devsh ~]#

That's quite unfortunate.

Hmm, we have a kola test that verifies the set of files in ostree admin config-diff doesn't grow unexpectedly.

Why is it that we're getting this behavior? We're compiling policy at build time via semodule -B, do we need to also invoke semodule --refresh right after that? If so that'd at least avoid having "pointless" policy modifications for new systems going forward.

jlebon · 2022-11-02T16:12:45Z

Ugh wait so there's a corollary to this - it looks like for quite some time now we've actually been rebuilding the policy by default on newer systems (current FCOS e.g.).

Ouch. 😢

I mean, at least we have policy recompilation now, so users won't be missing out on policy updates. But nodes by default not using the canonical policy is very unfortunate indeed.

It'd be nice if we could get those machines back on the canonical policy. I think that's possible and would require implementing some of the follow-up bits we discussed in coreos/fedora-coreos-tracker#701.

Why is it that we're getting this behavior?

I think it's the same issue hitting RHCOS (see #1036 (comment)). Locally inspecting the vanilla qcow2 using guestmount, we can see the policy is already different.

We're compiling policy at build time via semodule -B, do we need to also invoke semodule --refresh right after that? If so that'd at least avoid having "pointless" policy modifications for new systems going forward.

Yeah, I suggested this higher up too. I'll try it out and see if it fixes it, but would be good to have @WOnder93 confirm that's a sane strategy.

OK more information on this. Fedora 36 is not affected, only Fedora 37 (i.e. only next currently). So that leads me to believe the issue was introduced in policycoreutils v3.4 (f36 is on v3.3). So we have a chance to fix this before this hits testing in two weeks when we GA, and stable two weeks after that.

Currently testing the rpm-ostree --refresh workaround.

There is a bug in the latest semanage code which causes an invocation of `semodule --rebuild-if-modules-changed` to still write a policy even though nothing changed since a full policy build. On FCOS and RHCOS, this bug is triggered as early as `ostree admin deploy` in cosa when creating the disk images. This results in shipping images with a policy diff baked in. Hack around this by immediately rerunning `semodule --rebuild-if-modules-changed` after building the policy. Fixes: openshift/os#1036

This is a test for openshift/os#1036. It also exists in the rpm-ostree CI, but let's have it here too since other packages can break this.

jlebon · 2022-11-02T17:10:01Z

Currently testing the rpm-ostree --refresh workaround.

OK yup, that does work: coreos/rpm-ostree#4122. Also added an f-c-c test in coreos/fedora-coreos-config#2056.

jlebon · 2022-11-02T17:29:05Z

Hmm, we have a kola test that verifies the set of files in ostree admin config-diff doesn't grow unexpectedly.

I was looking for that and couldn't find it. I filed coreos/fedora-coreos-tracker#1335.

There is a bug in the latest semanage code which causes an invocation of `semodule --rebuild-if-modules-changed` to still write a policy even though nothing changed since a full policy build. On FCOS and RHCOS, this bug is triggered as early as `ostree admin deploy` in cosa when creating the disk images. This results in shipping images with a policy diff baked in. Hack around this by immediately rerunning `semodule --rebuild-if-modules-changed` after building the policy. Fixes: openshift/os#1036 (cherry picked from commit 479050e)

This is a test for openshift/os#1036. It also exists in the rpm-ostree CI, but let's have it here too since other packages can break this.

dustymabe mentioned this issue Oct 25, 2022

Revert "mantle: use os.ReadDir for lightweight directory reading" coreos/coreos-assembler#3137

Merged

jlebon added a commit to jlebon/os that referenced this issue Oct 25, 2022

Pin to libsemanage-2.9-8.el8

247e64a

With -9.el8, `ext.config.rebuild-selinux-policy` fails: openshift#1036 We need to debug this, but for now let's unblock CI and dev pipelines.

jlebon mentioned this issue Oct 25, 2022

Pin to libsemanage-2.9-8.el8 #1037

Merged

jlebon mentioned this issue Nov 2, 2022

libpriv/postprocess: work around semanage bug coreos/rpm-ostree#4122

Merged

jlebon mentioned this issue Nov 2, 2022

tests/kola/selinux: add test that policy isn't recompiled coreos/fedora-coreos-config#2056

Merged

jlebon mentioned this issue Nov 2, 2022

Add a kola test that sanity-checks the output of ostree admin config-diff coreos/fedora-coreos-tracker#1335

Open

jlebon closed this as completed in coreos/rpm-ostree#4122 Nov 2, 2022

jlebon mentioned this issue Nov 2, 2022

[rhel8] libpriv/postprocess: work around semanage bug coreos/rpm-ostree#4124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RHCOS 8.6 failing on `ext.config.rebuild-selinux-policy` #1036

RHCOS 8.6 failing on `ext.config.rebuild-selinux-policy` #1036

jlebon commented Oct 25, 2022

jlebon commented Oct 25, 2022

jlebon commented Oct 25, 2022

WOnder93 commented Oct 26, 2022

lucab commented Oct 26, 2022 •

edited

Loading

lucab commented Oct 26, 2022

cgwalters commented Oct 26, 2022

cgwalters commented Oct 26, 2022

jlebon commented Oct 26, 2022

jlebon commented Oct 26, 2022

WOnder93 commented Oct 27, 2022

travier commented Oct 28, 2022

travier commented Oct 28, 2022

travier commented Oct 28, 2022 •

edited

Loading

cgwalters commented Nov 1, 2022

jlebon commented Nov 2, 2022

jlebon commented Nov 2, 2022

jlebon commented Nov 2, 2022

RHCOS 8.6 failing on ext.config.rebuild-selinux-policy #1036

RHCOS 8.6 failing on ext.config.rebuild-selinux-policy #1036

Comments

jlebon commented Oct 25, 2022

jlebon commented Oct 25, 2022

jlebon commented Oct 25, 2022

WOnder93 commented Oct 26, 2022

lucab commented Oct 26, 2022 • edited Loading

lucab commented Oct 26, 2022

cgwalters commented Oct 26, 2022

cgwalters commented Oct 26, 2022

jlebon commented Oct 26, 2022

jlebon commented Oct 26, 2022

WOnder93 commented Oct 27, 2022

travier commented Oct 28, 2022

travier commented Oct 28, 2022

travier commented Oct 28, 2022 • edited Loading

cgwalters commented Nov 1, 2022

jlebon commented Nov 2, 2022

jlebon commented Nov 2, 2022

jlebon commented Nov 2, 2022

RHCOS 8.6 failing on `ext.config.rebuild-selinux-policy` #1036

RHCOS 8.6 failing on `ext.config.rebuild-selinux-policy` #1036

lucab commented Oct 26, 2022 •

edited

Loading

travier commented Oct 28, 2022 •

edited

Loading