-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: "run-monitor (qemu, crio)" job failing on CreateContainer #9028
Comments
Hi @littlejawa ! Is this a known issue? I search on the issues list but I found anything related. Any idea of what might be causing the failure? |
Not a known issue, and I really don't know what's wrong :-/ |
Hi, it seems that something has gone wrong since the 1ts of Feb. For The nightly test for s390x was able to run a container (at least) on that day: https://github.com/kata-containers/kata-containers/actions/runs/7736480131/job/21093903151, but no success on running a container since then. (Sorry, the job is wrongly named as A pod is stuck in a
I will dig into this more and share something looking informative soon. Thanks. |
Update: I think I've spotted a commit which causes the issue. The test has been passed when I reverted the following: @fadecoder Could you have a look at this? |
Hi @BbolroC ! Good catch ! Did you This comment in the corresponding PR #8760 clearly asks to add This got addressed by @fadecoder with 091cf1b but then the PR had to be rebased and the change disappeared in the final 9317e23 commit. It is an example of how the forced rebases because of renamed CI checks can be harmful (Cc @fidencio @stevenhorsman ). I don't think this is the cause of the error though as explained below. Another interesting fact is that the kata-containers-ci-on-push / run-kata-monitor-tests / run-monitor (qemu, crio) check of #8760 failed with exact same All runs of this check failed according to https://github.com/kata-containers/kata-containers/actions/workflows/ci-on-push.yaml?query=actor%3Afadecoder+. Even the run for 091cf1b . This seems to indicate that the lacking
|
I could run the monitor test successfully with an experimental hack in #9063 . |
@BbolroC ++
There are a bunch of checkers non-required. I've been monitoring them since last week to see those that could be required, that's the reason why I opened this issue :) It seems a topic for the AC meeting: how/when/what switch to current stable checkers to required.
|
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. This was inspired by the runtime-rs implementation. Unfortunately, the kata-monitor checks in CI have been failing since then with CRI-O. The precise path of code that got broken by kata-containers#8760 isn't known at this time. Looking closer at runtime-rs, it appears that only the shim daemon unshares its mount namespace. Do the same with the go runtime. Similarly to what runtime-rs does, look at the command line arguments to detect that the current process is the daemon. Since the go shim command line parsing belongs to containerd vendored code, it isn't really practical to do exactly the same though. Go for a variant : introduce a function that heuristicaly validates that the current process was spawned by the NewCommand() function. A C-style assert is added in NewCommand() to ensure any code change that would break the assumptions of the validating function are detected by CI. The panic *cannot* happen in production. This is a bit hacky and could certainly be handled cleanlier with a change in the containerd shim package. The go shim is expected to sunset soon though. It thus doesn't seem worth the pain. Fixes kata-containers#9028 Signed-off-by: Greg Kurz <groug@kaod.org>
PR kata-containers#8760 tentatively tried to have the shim to run in its own mount namespace for the sake of improving isolation between the sandbox and the host. Thus crio storage drivers shouldn't create a PRIVATE bind mount on their home directory. Otherwise, the container's rootfs mount wouldn't be propagated to kata runtime's mount namespace, and kata runtime couldn't access the container's rootfs files. So, when kata cooperated with crio, crio should set skip_mount_home=true for its storage overlay. Fixes: kata-containers#9028 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
The kata-monitor tests for QEMU hypervisor and CRI-O has failed for a while now.
For example, on https://github.com/kata-containers/kata-containers/actions/runs/7754650968/job/21149196848?pr=9003, it failed on the
CreateContainer
operation:The sandbox/pod is created apparently fine. When it tries to create the container with the following yaml:
I was able to reproduce the error locally. I tried to solve by passing the full path to
top
(/bin/top
) on the above yaml, but it failed with samefile not found
error.An snippet of
journalctl -t crio
:The text was updated successfully, but these errors were encountered: