-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cgroupv2 checkpoint/restore #2335
Conversation
@adrianreber PTAL |
Rebased |
I looked at the last three commits and this looks correct. Good idea to rely on the CRIU tmpfs handling for cgroup1 non-cgroupns mode. I tried to start a container with Podman using |
If I understand it right (I still doubt that),
and it's not obvious how to distinguish between the two cases from inside the container, i.e. mounts look the same (except for maybe the source field) |
Also, for some reason, cgroupv1+cgroupns checkpoint/restore works even without these patches, and I admit I don't quite understand why. |
&& curl -sSL https://github.com/checkpoint-restore/criu/commit/378337a496ca759848180bc5411e4446298c5e4e.patch | patch -p1 \ | ||
&& make install-criu \ | ||
&& cd - \ | ||
&& rm -rf /usr/src/criu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not in Dockerfile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have the very same code in Dockerfile (this is mostly a copy-paste).
We are running these tests on the Vagrant host itself (not inside a container, i.e. make localintegration
not make integration
) and some of those tests use criu, so we have to have criu on the vagrant host, not inside a container.
Or do you mean we can copy the criu binary out of container we just built and use it? This is also possible (and I tried to do it with runc binary earlier, and you suggested to rebuild it on the host itself, so I assumed you have the same PoV about criu binary).
As noted in the comment, this will go away as soon as criu 3.14 is released and packaged for F31.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to run everything on Vagrant directly, can we eliminate Podman from Vagrantfile? Can be another PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can followup on that. This just fixes the cgroupv2 checkpoint and covers it with tests.
See #2342. Note the difference is we used to run unit tests on Debian (container) and now it's Fedora (host).
Same test as the first one, just with cgroupns enabled. Since in case of cgroupv2 `runc spec` enables cgroupns, this case was already tested by the first checkpoint test, so skip it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case of cgroupv2 unified hierarchy, the /sys/fs/cgroup mount is the real mount with fstype of cgroup2 (rather than a set of external bind mounts like for cgroupv1). So, we should not add it to the list of "external bind mounts" on both checkpoint and restore. Without this fix, checkpoint integration tests fail on cgroup v2. Also, same is true for cgroup v1 + cgroupns. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
With the fix in the previous commit and criu patched with support for cgroupv2, these tests should now pass. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Got a very weird SIGSEGV in CI, filed checkpoint-restore/criu#1035 just in case. Seems more like a bug in golang runtime or protobuf package. Restarted CI to check if the above is repeatable, or was a one-time thing. |
Since commit 9280e35 it is not longer needed to have `cgroup2' mount. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
To make a bind mount read-only, it needs to be remounted. This is what the code removed does, but it is not needed here. We have to deal with three cases here: 1. cgroup v2 unified mode. In this case the mount is real mount with fstype=cgroup2, and there is no need to have a bind mount on top, as we pass readonly flag to the mount as is. 2. cgroup v1 + cgroupns (enableCgroupns == true). In this case the "mount" is in fact a set of real mounts with fstype=cgroup, and they are all performed in mountCgroupV1, with readonly flag added if needed. 3. cgroup v1 as is (enableCgroupns == false). In this case mountCgroupV1() calls mountToRootfs() again with an argument from the list obtained from getCgroupMounts(), i.e. a bind mount with the same flags as the original mount has (plus unix.MS_BIND | unix.MS_REC), and mountToRootfs() does remounting (under the case "bind":). So, the code which this patch is removing is not needed -- it essentially does nothing in case 3 above (since the bind mount is already remounted readonly), and in cases 1 and 2 it creates an unneeded extra bind mount on top of a real one (or set of real ones). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Added a couple of patches on top, @adrianreber PTAL (only the last 2 patches) |
@kolyshkin Last two patches sound reasonable. |
Fixes: #2328