-
Notifications
You must be signed in to change notification settings - Fork 2.1k
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restoring a checkpointed container into an existing namespace is not possible #1786
Comments
We have an engine which say what resources are external and don't need to be dumped. I think we can add a new type of external resources, which is called ns. |
@avagin Your proposal sounds good. I will work on adding that to CRIU and runc. Not totally clear yet how it should work during dump. I would expect that CRIU either dumps the process with the existing namespace information or it ignores the namespace information. So something like:
During restore we already support this via Or we could combine to functionality of --join --empty-ns all into external. From my point of view The option --empty-ns could be --external ns[pid]:none (or --external ns[pid]:empty). And this discussion does not really belong in the runc bug tracker... But as I opened it here I am continuing it here as it was triggered by runc and also needs runc integration. |
--join-ns means a bit different thing, it means that ALL tasks should be restored in a specified ns. An external ns is a namespace which should not be dumped and restored and we can set more than one namespace as external. For example, the task A lives in the netns 1, the task B lives in the netns 2. On dump and restore, we can set both namespaces as externals. |
So, in your example, 'netns1' is just an CRIU internal label which we use to identify that namespace during restore. Yes sounds like it should work. |
Using CRIU to checkpoint and restore a container into an existing network namespace is not possible. If the network namespace is defined like { "type": "network", "path": "/run/netns/test" } there is the expectation that the restored container is again running in the network namespace specified with 'path'. This adds the new CRIU 'external namespace' feature to runc, where during checkpointing that specific namespace is referenced and during restore CRIU tries to restore the container in exactly that namespace. This breaks/fixes current runc behavior. If, without this patch, runc restores a container with such a network namespace definition, it is ignored and CRIU recreates a network namespace without a name. With this patch runc uses the network namespace path (if available) to checkpoint and restore the container in just that network namespace. Restore will now fail if a container was checkpointed with a network namespace path set and if that network namespace path does not exist during restore. runc still falls back to the old behavior if CRIU older than 3.11 is installed. Fixes opencontainers#1786 Related to containers/podman#469 Signed-off-by: Adrian Reber <areber@redhat.com>
Using CRIU to checkpoint and restore a container into an existing network namespace is not possible. If the network namespace is defined like { "type": "network", "path": "/run/netns/test" } there is the expectation that the restored container is again running in the network namespace specified with 'path'. This adds the new CRIU 'external namespace' feature to runc, where during checkpointing that specific namespace is referenced and during restore CRIU tries to restore the container in exactly that namespace. This breaks/fixes current runc behavior. If, without this patch, runc restores a container with such a network namespace definition, it is ignored and CRIU recreates a network namespace without a name. With this patch runc uses the network namespace path (if available) to checkpoint and restore the container in just that network namespace. Restore will now fail if a container was checkpointed with a network namespace path set and if that network namespace path does not exist during restore. runc still falls back to the old behavior if CRIU older than 3.11 is installed. Fixes opencontainers#1786 Related to containers/podman#469 Signed-off-by: Adrian Reber <areber@redhat.com>
Using CRIU to checkpoint and restore a container into an existing network namespace is not possible. If the network namespace is defined like { "type": "network", "path": "/run/netns/test" } there is the expectation that the restored container is again running in the network namespace specified with 'path'. This adds the new CRIU 'external namespace' feature to runc, where during checkpointing that specific namespace is referenced and during restore CRIU tries to restore the container in exactly that namespace. This breaks/fixes current runc behavior. If, without this patch, runc restores a container with such a network namespace definition, it is ignored and CRIU recreates a network namespace without a name. With this patch runc uses the network namespace path (if available) to checkpoint and restore the container in just that network namespace. Restore will now fail if a container was checkpointed with a network namespace path set and if that network namespace path does not exist during restore. runc still falls back to the old behavior if CRIU older than 3.11 is installed. Fixes opencontainers#1786 Related to containers/podman#469 Thanks to Andrei Vagin for all the help in getting the interface between CRIU and runc right! Signed-off-by: Adrian Reber <areber@redhat.com>
I am currently trying to fix (better implement) the non-existing functionality to restore a container into an existing network namespace. I already had a short discussion with @avagin on IRC but wanted to use this place to hopefully come to a correct solution.
I am starting a container and I want it to join an existing network namespace:
This works just like it should. The container is running and uses the network namespace 'test' as specified above. My next step is to checkpoint the container, which works and then to restore the container, which seems to work. Upon closer inspection I see that the restored contained is running in a network namespace but not the one I specified ('test') but a network namespace created by CRIU during restore. I would have probably not detected the problem if it would have been a PID namespaces, but as I have setup a veth pair in the 'test' network namespace with an IP address it became clear that the CRIU restored container is running in another network namespace.
With older versions of CRIU the veths have been restored correctly but the latest CRIU version (using criu-dev branch) seems to have a problem. But this is not the problem I am trying to understand right now.
For me it seems wrong that runc is told to use the network namespace 'test' and I configured it correctly, but CRIU just uses another namespace.
Luckily CRIU is prepared for cases like this. It has the possibility to join an existing namespace:
This is also exported via CRIU's RPC interface so it can easily be used with the following patch:
The problem with this approach is that it only works if I do the checkpoint from within the network namespace
nsenter -n -t <PID> runc checkpoint
. So during checkpointing it is important that CRIU does not checkpoint the information about the network namespace, else CRIU will create a new network namespace even with the join option.I tried a few things already but I would like to come to a conclusion what is the right way to do it.
I had a look at LXC and it seems LXC is not touching namespaces at all during checkpoint and restore and leaves it all to CRIU. This seems not be an option for runc as it offers the possibility via 'path' to tell which network namespace to use.
Right now I am seeing two approaches:
Right now I think option 2 is the right one, but I wanted to get feedback from @avagin before continuing.
I also think this is probably not only a network namespace problem, but a problem for all namespaces as the once specified in the configuration file are not joined after restore.
The text was updated successfully, but these errors were encountered: