Fix cgroupv2 checkpoint/restore #2335

kolyshkin · 2020-04-21T01:06:25Z

Fixes: #2328

kolyshkin · 2020-04-21T08:39:10Z

@adrianreber PTAL

kolyshkin · 2020-04-21T09:21:23Z

Rebased

adrianreber · 2020-04-21T10:26:32Z

I looked at the last three commits and this looks correct.

Good idea to rely on the CRIU tmpfs handling for cgroup1 non-cgroupns mode.

I tried to start a container with Podman using --cgroupns=host on a cgroup1 system and the mounted cgroup looks the same using the mount command as when I leave the parameter --cgroupns=host away. I always see a tmpfs as the base for the cgroup1 mount. Should I see something different or am I testing it wrong with and without cgroupns?

kolyshkin · 2020-04-21T18:43:03Z

If I understand it right (I still doubt that),

cgroup v1 without cgroupns is a set of bind mounts from the host to container rootfs
cgroup v1 with cgroupns is a set of real mounts with fstype=cgroup

and it's not obvious how to distinguish between the two cases from inside the container, i.e. mounts look the same (except for maybe the source field)

kolyshkin · 2020-04-21T18:47:32Z

Also, for some reason, cgroupv1+cgroupns checkpoint/restore works even without these patches, and I admit I don't quite understand why.

AkihiroSuda · 2020-04-21T20:19:01Z

Vagrantfile

+        && curl -sSL https://github.com/checkpoint-restore/criu/commit/378337a496ca759848180bc5411e4446298c5e4e.patch | patch -p1 \
+        && make install-criu \
+        && cd - \
+        && rm -rf /usr/src/criu


Why not in Dockerfile?

Yes, we have the very same code in Dockerfile (this is mostly a copy-paste).

We are running these tests on the Vagrant host itself (not inside a container, i.e. make localintegration not make integration) and some of those tests use criu, so we have to have criu on the vagrant host, not inside a container.

Or do you mean we can copy the criu binary out of container we just built and use it? This is also possible (and I tried to do it with runc binary earlier, and you suggested to rebuild it on the host itself, so I assumed you have the same PoV about criu binary).

As noted in the comment, this will go away as soon as criu 3.14 is released and packaged for F31.

If we want to run everything on Vagrant directly, can we eliminate Podman from Vagrantfile? Can be another PR though.

Yes, we can followup on that. This just fixes the cgroupv2 checkpoint and covers it with tests.

See #2342. Note the difference is we used to run unit tests on Debian (container) and now it's Fedora (host).

AkihiroSuda · 2020-04-21T21:57:52Z

LGTM

libcontainer/container_linux.go

Same test as the first one, just with cgroupns enabled. Since in case of cgroupv2 `runc spec` enables cgroupns, this case was already tested by the first checkpoint test, so skip it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

In case of cgroupv2 unified hierarchy, the /sys/fs/cgroup mount is the real mount with fstype of cgroup2 (rather than a set of external bind mounts like for cgroupv1). So, we should not add it to the list of "external bind mounts" on both checkpoint and restore. Without this fix, checkpoint integration tests fail on cgroup v2. Also, same is true for cgroup v1 + cgroupns. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

With the fix in the previous commit and criu patched with support for cgroupv2, these tests should now pass. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Vagrantfile

kolyshkin · 2020-04-23T00:45:13Z

Got a very weird SIGSEGV in CI, filed checkpoint-restore/criu#1035 just in case. Seems more like a bug in golang runtime or protobuf package.

Restarted CI to check if the above is repeatable, or was a one-time thing.

AkihiroSuda · 2020-04-23T00:48:06Z

LGTM

Since commit 9280e35 it is not longer needed to have `cgroup2' mount. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

To make a bind mount read-only, it needs to be remounted. This is what the code removed does, but it is not needed here. We have to deal with three cases here: 1. cgroup v2 unified mode. In this case the mount is real mount with fstype=cgroup2, and there is no need to have a bind mount on top, as we pass readonly flag to the mount as is. 2. cgroup v1 + cgroupns (enableCgroupns == true). In this case the "mount" is in fact a set of real mounts with fstype=cgroup, and they are all performed in mountCgroupV1, with readonly flag added if needed. 3. cgroup v1 as is (enableCgroupns == false). In this case mountCgroupV1() calls mountToRootfs() again with an argument from the list obtained from getCgroupMounts(), i.e. a bind mount with the same flags as the original mount has (plus unix.MS_BIND | unix.MS_REC), and mountToRootfs() does remounting (under the case "bind":). So, the code which this patch is removing is not needed -- it essentially does nothing in case 3 above (since the bind mount is already remounted readonly), and in cases 1 and 2 it creates an unneeded extra bind mount on top of a real one (or set of real ones). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin · 2020-04-24T00:12:12Z

Added a couple of patches on top, @adrianreber PTAL (only the last 2 patches)

adrianreber · 2020-04-24T06:55:18Z

@kolyshkin Last two patches sound reasonable.

AkihiroSuda · 2020-04-24T07:16:05Z

LGTM

mrunalp · 2020-04-24T15:47:30Z

LGTM

kolyshkin added kind/bug area/cgroupv2 labels Apr 21, 2020

This was referenced Apr 21, 2020

cgroupv2 support meta issue #2315

Closed

cgroupv2 checkpoint/restore is not working #2328

Closed

kolyshkin force-pushed the cgroupv2-cpt branch from 1fba43b to 6b34331 Compare April 21, 2020 02:38

kolyshkin changed the title ~~[WIP] Fix cgroupv2 checkpoint/restore~~ Fix cgroupv2 checkpoint/restore Apr 21, 2020

kolyshkin force-pushed the cgroupv2-cpt branch from 6b34331 to 040a91c Compare April 21, 2020 09:20

kolyshkin mentioned this pull request Apr 21, 2020

runc checkpoint: fix --status-fd to accept fd #2341

Merged

AkihiroSuda reviewed Apr 21, 2020

View reviewed changes

kolyshkin mentioned this pull request Apr 21, 2020

travis: run vagrant tests on the host #2342

Merged

adrianreber reviewed Apr 22, 2020

View reviewed changes

libcontainer/container_linux.go Outdated Show resolved Hide resolved

tests/checkpoint: add simple c/r test for cgroupns

00a2844

Same test as the first one, just with cgroupns enabled. Since in case of cgroupv2 `runc spec` enables cgroupns, this case was already tested by the first checkpoint test, so skip it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin force-pushed the cgroupv2-cpt branch from 040a91c to db9709e Compare April 22, 2020 18:06

kolyshkin marked this pull request as ready for review April 22, 2020 18:18

kolyshkin force-pushed the cgroupv2-cpt branch from db9709e to 9280e35 Compare April 22, 2020 18:29

tests/checkpoint: enable for Fedora 31 / cgroup v2

32d52a0

With the fix in the previous commit and criu patched with support for cgroupv2, these tests should now pass. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

h-vetinari reviewed Apr 22, 2020

View reviewed changes

Vagrantfile Show resolved Hide resolved

kolyshkin added 2 commits April 23, 2020 15:22

libcontainer/integration/checkpoint_test: simplify

20959b1

Since commit 9280e35 it is not longer needed to have `cgroup2' mount. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

mrunalp merged commit 634e51b into opencontainers:master Apr 24, 2020

This was referenced Apr 25, 2020

sigsegv on runc restore checkpoint-restore/criu#1035

Closed

SEGV in Vagrant CI #2353

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cgroupv2 checkpoint/restore #2335

Fix cgroupv2 checkpoint/restore #2335

kolyshkin commented Apr 21, 2020 •

edited

Loading

kolyshkin commented Apr 21, 2020

kolyshkin commented Apr 21, 2020

adrianreber commented Apr 21, 2020

kolyshkin commented Apr 21, 2020

kolyshkin commented Apr 21, 2020

AkihiroSuda Apr 21, 2020

kolyshkin Apr 21, 2020

AkihiroSuda Apr 21, 2020

kolyshkin Apr 21, 2020

AkihiroSuda commented Apr 21, 2020 •

edited

Loading

kolyshkin commented Apr 23, 2020

AkihiroSuda commented Apr 23, 2020 •

edited

Loading

kolyshkin commented Apr 24, 2020

adrianreber commented Apr 24, 2020

AkihiroSuda commented Apr 24, 2020 •

edited

Loading

mrunalp commented Apr 24, 2020 •

edited by AkihiroSuda

Loading

Fix cgroupv2 checkpoint/restore #2335

Fix cgroupv2 checkpoint/restore #2335

Conversation

kolyshkin commented Apr 21, 2020 • edited Loading

kolyshkin commented Apr 21, 2020

kolyshkin commented Apr 21, 2020

adrianreber commented Apr 21, 2020

kolyshkin commented Apr 21, 2020

kolyshkin commented Apr 21, 2020

AkihiroSuda Apr 21, 2020

Choose a reason for hiding this comment

kolyshkin Apr 21, 2020

Choose a reason for hiding this comment

AkihiroSuda Apr 21, 2020

Choose a reason for hiding this comment

kolyshkin Apr 21, 2020

Choose a reason for hiding this comment

AkihiroSuda commented Apr 21, 2020 • edited Loading

kolyshkin commented Apr 23, 2020

AkihiroSuda commented Apr 23, 2020 • edited Loading

kolyshkin commented Apr 24, 2020

adrianreber commented Apr 24, 2020

AkihiroSuda commented Apr 24, 2020 • edited Loading

mrunalp commented Apr 24, 2020 • edited by AkihiroSuda Loading

kolyshkin commented Apr 21, 2020 •

edited

Loading

AkihiroSuda commented Apr 21, 2020 •

edited

Loading

AkihiroSuda commented Apr 23, 2020 •

edited

Loading

AkihiroSuda commented Apr 24, 2020 •

edited

Loading

mrunalp commented Apr 24, 2020 •

edited by AkihiroSuda

Loading