New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: Reduce the mount points with namespace isolation #8760
runtime: Reduce the mount points with namespace isolation #8760
Conversation
Thanks! @fadecoder |
/test |
c31cc76
to
b148e2b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing untrusted users to hog the host CPU and add measurable latency in a critical component like the host systemd is a security issue. Maybe not critical but severe enough to waste precious cycles and $$ on a big system.
Adding the missing isolation not only helps with the k8s deployment density, but it is true to kata's provide the workload isolation and security advantages of VMs.
motto, that all users legitimately expect.
Thanks @fadecoder !
/test |
@fadecoder, bear with me here in case I'm wrong, but I think I understand why CRI-O tests are failing. Creating a new mount namespace requires that the process creating it has kata-containers/tests/common.bash Lines 590 to 601 in 0dc00ae
That will make the CI pass, most likely. Whether or not giving this set of permissions is a good thing, is not exactly up for discussion here (mainly as we're already allowing on |
b148e2b
to
091cf1b
Compare
/test |
This PR needs to be rebased to main in order to get rid of the bogus pending build checks in CI. |
This patch can reduce load on systemd process, and increase the k8s deployment density when using go runtime. Fixes: kata-containers#8758 Signed-off-by: Zhigang Wang <wangzhigang17@huawei.com> Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
091cf1b
to
9317e23
Compare
/test |
CI have passed, I will get it merged. Thanks! @fadecoder |
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. The monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time but in any case, only the shim daemon, that will process the actual mount operations, needs to live in a separate namespace. The runtime-rs shim does exactly that by checking its command line arguments. If it doesn't find an action, e.g "start" or "delete", as typically passed the container runtime, it assumes it is the running as the shim daemon and unshares its mount namespace. Since the shim command line parsing bits are hidden deep in containerd vendor code, it isn't practical to do the exactly the same in the go runtime. Go for a variant : introduce a function that heuristicaly validates thet current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic won't fire in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. Thus crio storage drivers shouldn't create a PRIVATE bind mount on their home directory. Otherwise, the container's rootfs mount wouldn't be propagated to kata runtime's mount namespace, and kata runtime couldn't access the container's rootfs files. So, when kata cooperated with crio, crio should set skip_mount_home=true for its storage overlay. Fixes: kata-containers#9028 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
We notice a 100% CPU load
systemd
process in thehost
when deploying more than 400 Kata Containers Pods (using go runtime) through k8s in one node, as described in issue #8758 .Inspired by rust runtime, we add the mount namespace isolation to go runtime, and it can prevent the
systemd
process from seeing more mounts, thereby reducing the burden on thesystemd
, making it faster to deploy more Pods.Considering the runtime-rs VMMs are under heavy developments, adding such isolation is still useful for many user-cases.
Fixes: #8758