Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: fix ephemeral mount table in left over by prepare #7370

Merged
merged 8 commits into from
Sep 2, 2019

Conversation

zyga
Copy link
Collaborator

@zyga zyga commented Aug 28, 2019

After fighting leaky tests for several weeks and coming up with, in my
eyes, a reliable detector of state change from test to test I was quite
surprised by the failure of tests/main/mount-ns on master, just a day
after landing the pull request re-enabling that test.

What was even more surprising was that while the failure was clearly
related to LXD, indicated by the clone_children and nsdelegate changes
(see below), the history of the machine had none of the LXD tests in it.
So what could it be?

2019-08-28 07:45:24 Error executing google:ubuntu-18.04-64:tests/main/mount-ns :
-----
+ diff -u google.ubuntu-18.04-64/HOST.expected.txt HOST.deterministic.txt
--- google.ubuntu-18.04-64/HOST.expected.txt	2019-08-28 07:23:20.000000000 +0000
+++ HOST.deterministic.txt	2019-08-28 07:45:21.314648841 +0000
@@ -22,7 +22,7 @@
 22 21 1:14 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:22 - tmpfs tmpfs ro,mode=755
 23 22 1:15 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:23 - cgroup cgroup rw,blkio
 24 22 1:16 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:24 - cgroup cgroup rw,cpu,cpuacct
-25 22 1:17 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:25 - cgroup cgroup rw,cpuset,clone_children
+25 22 1:17 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:25 - cgroup cgroup rw,cpuset
 26 22 1:18 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:26 - cgroup cgroup rw,devices
 27 22 1:19 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:27 - cgroup cgroup rw,freezer
 28 22 1:20 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:28 - cgroup cgroup rw,hugetlb
@@ -32,7 +32,7 @@
 32 22 1:24 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:32 - cgroup cgroup rw,pids
 33 22 1:25 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:33 - cgroup cgroup rw,rdma
 34 22 1:26 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:34 - cgroup cgroup rw,xattr,name=systemd
-35 22 1:27 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:35 - cgroup2 cgroup rw
+35 22 1:27 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:35 - cgroup2 cgroup rw,nsdelegate
 36 21 1:28 / /sys/fs/fuse/connections rw,relatime shared:36 - fusectl fusectl rw
 37 21 1:29 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:37 - pstore pstore rw
 38 21 1:30 / /sys/kernel/config rw,relatime shared:38 - configfs configfs rw

Whatever was causing this, must be related to LXD but not to the LXD
test itself. A quick inspection of the dozen-or-so tests involved in the
failed run has made me realize something. The test machine had rebooted
as a part of another test.

A quick patch, like the one seen below was enough to reproduce the
failure each time. This meant that the test machine was broken even
before any test had executed. How is that possible?

diff --git a/tests/main/mount-ns/task.yaml b/tests/main/mount-ns/task.yaml
index 14bfb30b74..1b0b9c0d5d 100644
--- a/tests/main/mount-ns/task.yaml
+++ b/tests/main/mount-ns/task.yaml
@@ -53,6 +53,9 @@ details: |
     state leaked by another test that ran on the same machine earlier in the
     spread  execution chain.
 prepare: |
+    if [ "$SPREAD_REBOOT" -eq 0 ]; then
+        REBOOT 1
+    fi
     # The --renumber and --rename options renumber and rename various
     # non-deterministic elements of the mount table. The --ref-x1000 option
     # sets a multiple of 1000 as the base value for allocated renumbered

So what else touches LXD? To my surprise, spread.yaml does. This is an
excerpt from the project-wide prepare stanza:

if [[ "$SPREAD_SYSTEM" == ubuntu-* ]] && [[ "$SPREAD_SYSTEM" != ubuntu-core-* ]]; then
    apt-get remove --purge -y lxd lxcfs || true
    apt-get autoremove --purge -y
fi

So on startup, we remove / purge LXD. This means LXD was installed, did
its thing to change the mount table and we removed it, keeping the only
trace of that fact in the running kernel. As soon as we reboot the
system is in a pristine state.

This explains why the tests/main/mount-ns test, which was written by
observing the mount table of a spread system on the initial boot,
immediately fails if the test machine reboots before performing
measurements.

To verify this I created a small spread project which captures
measurements three times. This is the essential part of the test:

    if [ "$SPREAD_REBOOT" -eq 0 ]; then
        cat /proc/self/mountinfo > /var/tmp/mountinfo.initial
        case "$SPREAD_SYSTEM" in
            ubuntu-*)
                apt-get remove --purge -y lxd lxcfs || true
                apt-get autoremove --purge -y
                ;;
        esac
        cat /proc/self/mountinfo > /var/tmp/mountinfo.post-lxd-removal
        REBOOT
    else
        cat /proc/self/mountinfo > /var/tmp/mountinfo.post-reboot
    fi

This test showed the following things:

  • removing LXD immediately unmounts /var/lib/lxcfs
  • removing LXD means that the next boot will have vanilla cgroups from
    systemd, since LXD won't change them on startup

In practice the change to cgroups that LXD is doing is:
/sys/fs/cgroup/unified is mounted without nsdelegate
/sys/fs/cgroup/cpuset has clone_children disabled.
Note that clone_children is only applied if LXD is actually used to
create a container.

You can see the diff of the mountinfo tables below. Note that each table
was re-written with mountinfo-tool so that it is deterministic and
comparable.

google:ubuntu-18.04-64 ...# diff -u det.initial det.post-lxd-removal
--- det.initial	2019-08-28 13:26:37.073255599 +0000
+++ det.post-lxd-removal	2019-08-28 13:26:29.793015207 +0000
@@ -36,4 +36,3 @@
 36 19 1:30 / /sys/kernel/config rw,relatime shared:36 - configfs configfs rw
 37 19 1:31 / /sys/kernel/debug rw,relatime shared:37 - debugfs debugfs rw
 38 19 1:32 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:38 - securityfs securityfs rw
-39 1 1:33 / /var/lib/lxcfs rw,nosuid,nodev,relatime shared:39 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other

google:ubuntu-18.04-64 ...# diff -u det.post-lxd-removal det.post-reboot
--- det.post-lxd-removal	2019-08-28 13:26:29.793015207 +0000
+++ det.post-reboot	2019-08-28 13:26:21.120728849 +0000
@@ -30,7 +30,7 @@
 30 20 1:24 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:30 - cgroup cgroup rw,pids
 31 20 1:25 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:31 - cgroup cgroup rw,rdma
 32 20 1:26 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:32 - cgroup cgroup rw,xattr,name=systemd
-33 20 1:27 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:33 - cgroup2 cgroup rw
+33 20 1:27 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:33 - cgroup2 cgroup rw,nsdelegate
 34 19 1:28 / /sys/fs/fuse/connections rw,relatime shared:34 - fusectl fusectl rw
 35 19 1:29 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:35 - pstore pstore rw
 36 19 1:30 / /sys/kernel/config rw,relatime shared:36 - configfs configfs rw

So what can we do now? The primary goal is to make the mount namespace
test stable. It must not depend the number of reboots done by tests that
execute before it does.

We can achieve that in one of two ways:

  • make the device under test behave as if LXD was installed
  • make the device under test behave as if LXD was never installed

I chose option number two, simply because we have multiple distributions
that don't have LXD pre-installed so it's easier to compare their
initial state this way. As mentioned above we want to not have
/var/lib/lxcfs and to have /sys/fs/cgroup/unified mounted with the
nsdelegate mount option. Fortunately both are easy to achieve. In
practice we only have to handle the unified cgroup because the removal
of LXD handles lxcfs for us.

This patch performs the following changes:

  1. Restore cpuset and unified cgroup hierarchy state after removing lxd
    in project-wide prepare, as well as in the per-test restore, to the
    values the system ships with by default. This handles both the initial
    state of the system as well as the state after execution of a particular
    test using LXD.

  2. Adjust the mount-ns test dataset to use the now-pristine
    configuration. This is done to inactive datasets as well, namely to
    core.

  3. Introduce a new variant of the mount-ns test that first reboots the
    machine and then performs the same set of measurements. This ensures
    that claim 2) is correct. There's some sour grapes here though. Due to
    the bug in bash 4.3 we cannot freely reboot in 16.04 system, so there
    the fix is not fully tested yet.

Signed-off-by: Zygmunt Krynicki me@zygoon.pl

@zyga zyga changed the title tests: fix ephemeral mount table in prepare-image tests: fix ephemeral mount table in left over by prepare Aug 28, 2019
Copy link
Collaborator Author

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that /sys/fs/cgroup/unified doesn't have to be a mount point (?) so remount may fail. Let's be more robust against that.

tests/lib/prepare-restore.sh Outdated Show resolved Hide resolved
tests/lib/prepare-restore.sh Outdated Show resolved Hide resolved
@zyga
Copy link
Collaborator Author

zyga commented Aug 29, 2019

openSUSE 15 uses a kernel that doesn't support the nsdelegate option. I'll address this shortly.

@sergiocazzolato
Copy link
Collaborator

@zyga, Thanks for the fix:
I see 2 combinations which currently are failing and I think this PR could fix:
selinux-lxd -> interfaces-udisk2
snap-system-env -> mount-ns

Copy link
Member

@anonymouse64 anonymouse64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for this, one small naming suggestion

tests/lib/prepare-restore.sh Outdated Show resolved Hide resolved
tests/lib/prepare-restore.sh Outdated Show resolved Hide resolved
tests/lib/prepare-restore.sh Outdated Show resolved Hide resolved
@sergiocazzolato
Copy link
Collaborator

mount-ns fixed:
https://paste.ubuntu.com/p/y6cmMvyX5B/

interfaces-udisks2 still failing:
https://paste.ubuntu.com/p/fYFVG6MsgB/

@zyga
Copy link
Collaborator Author

zyga commented Aug 30, 2019

I've split off version-tool as a dedicated PR #7379

@zyga
Copy link
Collaborator Author

zyga commented Sep 2, 2019

Rebased on the state of the version-tool pull request. All history kept intact.

After fighting leaky tests for several weeks and coming up with, in my
eyes, a reliable detector of state change from test to test I was quite
surprised by the failure of tests/main/mount-ns on master, just a day
after landing the pull request re-enabling that test.

What was even more surprising was that while the failure was clearly
related to LXD, indicated by the clone_children and nsdelegate changes
(see below), the history of the machine had none of the LXD tests in it.
So what could it be?

    2019-08-28 07:45:24 Error executing google:ubuntu-18.04-64:tests/main/mount-ns :
    -----
    + diff -u google.ubuntu-18.04-64/HOST.expected.txt HOST.deterministic.txt
    --- google.ubuntu-18.04-64/HOST.expected.txt	2019-08-28 07:23:20.000000000 +0000
    +++ HOST.deterministic.txt	2019-08-28 07:45:21.314648841 +0000
    @@ -22,7 +22,7 @@
     22 21 1:14 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:22 - tmpfs tmpfs ro,mode=755
     23 22 1:15 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:23 - cgroup cgroup rw,blkio
     24 22 1:16 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:24 - cgroup cgroup rw,cpu,cpuacct
    -25 22 1:17 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:25 - cgroup cgroup rw,cpuset,clone_children
    +25 22 1:17 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:25 - cgroup cgroup rw,cpuset
     26 22 1:18 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:26 - cgroup cgroup rw,devices
     27 22 1:19 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:27 - cgroup cgroup rw,freezer
     28 22 1:20 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:28 - cgroup cgroup rw,hugetlb
    @@ -32,7 +32,7 @@
     32 22 1:24 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:32 - cgroup cgroup rw,pids
     33 22 1:25 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:33 - cgroup cgroup rw,rdma
     34 22 1:26 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:34 - cgroup cgroup rw,xattr,name=systemd
    -35 22 1:27 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:35 - cgroup2 cgroup rw
    +35 22 1:27 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:35 - cgroup2 cgroup rw,nsdelegate
     36 21 1:28 / /sys/fs/fuse/connections rw,relatime shared:36 - fusectl fusectl rw
     37 21 1:29 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:37 - pstore pstore rw
     38 21 1:30 / /sys/kernel/config rw,relatime shared:38 - configfs configfs rw

Whatever was causing this, must be related to LXD but not to the LXD
test itself. A quick inspection of the dozen-or-so tests involved in the
failed run has made me realize something. The test machine had rebooted
as a part of another test.

A quick patch, like the one seen below was enough to reproduce the
failure each time. This meant that the test machine was broken even
before *any* test had executed. How is that possible?

    diff --git a/tests/main/mount-ns/task.yaml b/tests/main/mount-ns/task.yaml
    index 14bfb30b74..1b0b9c0d5d 100644
    --- a/tests/main/mount-ns/task.yaml
    +++ b/tests/main/mount-ns/task.yaml
    @@ -53,6 +53,9 @@ details: |
         state leaked by another test that ran on the same machine earlier in the
         spread  execution chain.
     prepare: |
    +    if [ "$SPREAD_REBOOT" -eq 0 ]; then
    +        REBOOT 1
    +    fi
         # The --renumber and --rename options renumber and rename various
         # non-deterministic elements of the mount table. The --ref-x1000 option
         # sets a multiple of 1000 as the base value for allocated renumbered

So what else touches LXD? To my surprise, `spread.yaml` does. This is an
excerpt from the project-wide prepare stanza:

    if [[ "$SPREAD_SYSTEM" == ubuntu-* ]] && [[ "$SPREAD_SYSTEM" != ubuntu-core-* ]]; then
        apt-get remove --purge -y lxd lxcfs || true
        apt-get autoremove --purge -y
    fi

So on startup, we remove / purge LXD. This means LXD was installed, did
its thing to change the mount table and we removed it, keeping the only
trace of that fact in the running kernel. As soon as we reboot the
system is in a pristine state.

This explains why the `tests/main/mount-ns` test, which was written by
observing the mount table of a spread system on the initial boot,
immediately fails if the test machine reboots before performing
measurements.

To verify this I created a small spread project which captures
measurements three times. This is the essential part of the test:

        if [ "$SPREAD_REBOOT" -eq 0 ]; then
            cat /proc/self/mountinfo > /var/tmp/mountinfo.initial
            case "$SPREAD_SYSTEM" in
                ubuntu-*)
                    apt-get remove --purge -y lxd lxcfs || true
                    apt-get autoremove --purge -y
                    ;;
            esac
            cat /proc/self/mountinfo > /var/tmp/mountinfo.post-lxd-removal
            REBOOT
        else
            cat /proc/self/mountinfo > /var/tmp/mountinfo.post-reboot
        fi

This test showed the following things:
 - removing LXD immediately unmounts /var/lib/lxcfs
 - removing LXD means that the next boot will have vanilla cgroups from
   systemd, since LXD won't change them on startup

In practice the change to cgroups that LXD is doing is:
   /sys/fs/cgroup/unified is mounted *without* nsdelegate
   /sys/fs/cgroup/cpuset has clone_children *disabled*.
Note that clone_children is only applied if LXD is actually used to
create a container.

You can see the diff of the mountinfo tables below. Note that each table
was re-written with mountinfo-tool so that it is deterministic and
comparable.

    google:ubuntu-18.04-64 ...# diff -u det.initial det.post-lxd-removal
    --- det.initial	2019-08-28 13:26:37.073255599 +0000
    +++ det.post-lxd-removal	2019-08-28 13:26:29.793015207 +0000
    @@ -36,4 +36,3 @@
     36 19 1:30 / /sys/kernel/config rw,relatime shared:36 - configfs configfs rw
     37 19 1:31 / /sys/kernel/debug rw,relatime shared:37 - debugfs debugfs rw
     38 19 1:32 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:38 - securityfs securityfs rw
    -39 1 1:33 / /var/lib/lxcfs rw,nosuid,nodev,relatime shared:39 - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other

    google:ubuntu-18.04-64 ...# diff -u det.post-lxd-removal det.post-reboot
    --- det.post-lxd-removal	2019-08-28 13:26:29.793015207 +0000
    +++ det.post-reboot	2019-08-28 13:26:21.120728849 +0000
    @@ -30,7 +30,7 @@
     30 20 1:24 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:30 - cgroup cgroup rw,pids
     31 20 1:25 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:31 - cgroup cgroup rw,rdma
     32 20 1:26 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:32 - cgroup cgroup rw,xattr,name=systemd
    -33 20 1:27 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:33 - cgroup2 cgroup rw
    +33 20 1:27 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:33 - cgroup2 cgroup rw,nsdelegate
     34 19 1:28 / /sys/fs/fuse/connections rw,relatime shared:34 - fusectl fusectl rw
     35 19 1:29 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:35 - pstore pstore rw
     36 19 1:30 / /sys/kernel/config rw,relatime shared:36 - configfs configfs rw

So what can we do now? The primary goal is to make the mount namespace
test stable. It must not depend the number of reboots done by tests that
execute before it does.

We can achieve that in one of two ways:
 - make the device under test behave as if LXD was installed
 - make the device under test behave as if LXD was never installed

I chose option number two, simply because we have multiple distributions
that don't have LXD pre-installed so it's easier to compare their
initial state this way. As mentioned above we want to *not* have
/var/lib/lxcfs and to have /sys/fs/cgroup/unified mounted with the
nsdelegate mount option. Fortunately both are easy to achieve. In
practice we only have to handle the unified cgroup because the removal
of LXD handles lxcfs for us.

This patch performs the following changes:

1) Restore cpuset and unified cgroup hierarchy state after removing lxd
in project-wide prepare, as well as in the per-test restore, to the
values the system ships with by default. This handles both the initial
state of the system as well as the state after execution of a particular
test using LXD.

2) Adjust the mount-ns test dataset to use the now-pristine
configuration. This is done to inactive datasets as well, namely to
core.

3) Introduce a new variant of the mount-ns test that first reboots the
machine and then performs the same set of measurements. This ensures
that claim 2) is correct. There's some sour grapes here though. Due to
the bug in bash 4.3 we cannot freely reboot in 16.04 system, so there
the fix is not fully tested yet.

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
The nsdelegate mount option for the cgroupv2 filesystem is only
available since kernel 4.13. The test system for openSUSE 15 uses an
older version (4.12.14) so the logic failed there.

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Now that we have it we might as well use it.

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Also use BASH_VERSION instead of the more crude version extraction.

Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
Signed-off-by: Zygmunt Krynicki <me@zygoon.pl>
@sergiocazzolato
Copy link
Collaborator

tests passing now!!!
https://paste.ubuntu.com/p/ZVHH4Y2dwt/

Copy link
Collaborator

@sergiocazzolato sergiocazzolato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests what were failing now work!
Thanks for this fix

@sergiocazzolato sergiocazzolato merged commit 2199adc into snapcore:master Sep 2, 2019
@zyga zyga deleted the fix/leaky-test-xxii branch September 2, 2019 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants