Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cgroupv2 checkpoint/restore #2335

Merged
merged 5 commits into from
Apr 24, 2020
Merged

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Apr 21, 2020

Fixes: #2328

@kolyshkin kolyshkin changed the title [WIP] Fix cgroupv2 checkpoint/restore Fix cgroupv2 checkpoint/restore Apr 21, 2020
@kolyshkin
Copy link
Contributor Author

@adrianreber PTAL

@kolyshkin
Copy link
Contributor Author

Rebased

@adrianreber
Copy link
Contributor

I looked at the last three commits and this looks correct.

Good idea to rely on the CRIU tmpfs handling for cgroup1 non-cgroupns mode.

I tried to start a container with Podman using --cgroupns=host on a cgroup1 system and the mounted cgroup looks the same using the mount command as when I leave the parameter --cgroupns=host away. I always see a tmpfs as the base for the cgroup1 mount. Should I see something different or am I testing it wrong with and without cgroupns?

@kolyshkin
Copy link
Contributor Author

If I understand it right (I still doubt that),

  • cgroup v1 without cgroupns is a set of bind mounts from the host to container rootfs
  • cgroup v1 with cgroupns is a set of real mounts with fstype=cgroup

and it's not obvious how to distinguish between the two cases from inside the container, i.e. mounts look the same (except for maybe the source field)

@kolyshkin
Copy link
Contributor Author

Also, for some reason, cgroupv1+cgroupns checkpoint/restore works even without these patches, and I admit I don't quite understand why.

&& curl -sSL https://github.com/checkpoint-restore/criu/commit/378337a496ca759848180bc5411e4446298c5e4e.patch | patch -p1 \
&& make install-criu \
&& cd - \
&& rm -rf /usr/src/criu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not in Dockerfile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have the very same code in Dockerfile (this is mostly a copy-paste).

We are running these tests on the Vagrant host itself (not inside a container, i.e. make localintegration not make integration) and some of those tests use criu, so we have to have criu on the vagrant host, not inside a container.

Or do you mean we can copy the criu binary out of container we just built and use it? This is also possible (and I tried to do it with runc binary earlier, and you suggested to rebuild it on the host itself, so I assumed you have the same PoV about criu binary).

As noted in the comment, this will go away as soon as criu 3.14 is released and packaged for F31.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to run everything on Vagrant directly, can we eliminate Podman from Vagrantfile? Can be another PR though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can followup on that. This just fixes the cgroupv2 checkpoint and covers it with tests.

See #2342. Note the difference is we used to run unit tests on Debian (container) and now it's Fedora (host).

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Apr 21, 2020

LGTM

Approved with PullApprove

Same test as the first one, just with cgroupns enabled.

Since in case of cgroupv2 `runc spec` enables cgroupns,
this case was already tested by the first checkpoint test,
so skip it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin marked this pull request as ready for review April 22, 2020 18:18
In case of cgroupv2 unified hierarchy, the /sys/fs/cgroup mount
is the real mount with fstype of cgroup2 (rather than a set of
external bind mounts like for cgroupv1).

So, we should not add it to the list of "external bind mounts"
on both checkpoint and restore.

Without this fix, checkpoint integration tests fail on cgroup v2.

Also, same is true for cgroup v1 + cgroupns.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
With the fix in the previous commit and criu patched with support for
cgroupv2, these tests should now pass.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

Got a very weird SIGSEGV in CI, filed checkpoint-restore/criu#1035 just in case. Seems more like a bug in golang runtime or protobuf package.

Restarted CI to check if the above is repeatable, or was a one-time thing.

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Apr 23, 2020

LGTM

Approved with PullApprove

Since commit 9280e35 it is not longer needed to have `cgroup2'
mount.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
To make a bind mount read-only, it needs to be remounted. This is what
the code removed does, but it is not needed here.

We have to deal with three cases here:

1. cgroup v2 unified mode. In this case the mount is real mount with
   fstype=cgroup2, and there is no need to have a bind mount on top,
   as we pass readonly flag to the mount as is.

2. cgroup v1 + cgroupns (enableCgroupns == true). In this case the
   "mount" is in fact a set of real mounts with fstype=cgroup, and
   they are all performed in mountCgroupV1, with readonly flag
   added if needed.

3. cgroup v1 as is (enableCgroupns == false). In this case
   mountCgroupV1() calls mountToRootfs() again with an argument
   from the list obtained from getCgroupMounts(), i.e. a bind
   mount with the same flags as the original mount has (plus
   unix.MS_BIND | unix.MS_REC), and mountToRootfs() does remounting
   (under the case "bind":).

So, the code which this patch is removing is not needed -- it
essentially does nothing in case 3 above (since the bind mount
is already remounted readonly), and in cases 1 and 2 it
creates an unneeded extra bind mount on top of a real one (or set of
real ones).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

Added a couple of patches on top, @adrianreber PTAL (only the last 2 patches)

@adrianreber
Copy link
Contributor

@kolyshkin Last two patches sound reasonable.

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Apr 24, 2020

LGTM

Approved with PullApprove

1 similar comment
@mrunalp
Copy link
Contributor

mrunalp commented Apr 24, 2020

LGTM

Approved with PullApprove

@mrunalp mrunalp merged commit 634e51b into opencontainers:master Apr 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cgroupv2 checkpoint/restore is not working
5 participants