New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gha: kata-deploy: Revert containerd config break #8679
gha: kata-deploy: Revert containerd config break #8679
Conversation
@Amulyam24 - FYI the ppc64le checks are failing with:
I'm guessing that we need to login to docker hub on the self-hosted node as the anonymous rate limit is pretty low? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @stevenhorsman. My only question is if this is indeed fixing #8678 as described at the commit message. Unless I'm missing something, this is just adding a test, not fixing the actual problem, right?
Correct - this is just the first stage of the PR, assuming it works (and the test fails), we'll have a way to test the issue and I plan to revert part of the tomlq PR and check that fixes things. I'll add |
/test |
71a6d02
to
dddc747
Compare
Ok, I think this containerd test approach works for baremetal systems, but not a other clusters as we can't necessarily get in an access their containerd. I can try using kata-debug for this process though... |
d575ff7
to
aa8a902
Compare
I didn't get kata-debug working reliably, so just using kubectl with custom columns to extract status and containerruntimeversion. I'll also temporarily dropped the fix, so I can check that this test is correctly failing. |
361289e
to
6b941d0
Compare
In the previous run, I got some of the deploy tests to fail as expected, but more of them were still passing, which doesn't really make sense, so I've upped the sleep to see if that gives enough time for the containerd config changes to take effect. |
80f08c1
to
c6b9f86
Compare
/test |
As I shared in confidential-containers/operator#305 (comment) - I'm not super happy with this change and think the testing could do with some wider input, but the revert of using tomlq resolves the problem of kata-deploy corrupting containerd's config, so I think it's good enough to merge and unblock people until January when more people are back to discuss this. |
7ed9e5c
to
861ee1e
Compare
/test |
861ee1e
to
8c772d5
Compare
4d1f863
to
2991a96
Compare
/test |
2991a96
to
46570e6
Compare
/test |
After kata-deploy has installed, check that the worker nodes are still in Ready state and don't have a containerd://Unknown container runtime versions, identicating that container isn't working to ensure that we didn't corrupt the containerd config during kata-deploy's edits Fixes: kata-containers#8678 Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This reverts commit dd9f5b0. Signed-off-by: stevenhorsman <steven@uk.ibm.com>
4418eb8
to
ee5fa08
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @stevenhorsman !
Folks, I don't like the commit being reverted, but I must say it's what it's at this point. As I'm working Today, I've decided to poke Steve and get to the root cause of the issue, and here's the full explanation. elif input_format == "toml":
import tomlkit
for input_stream in input_streams:
json.dump(tomlkit.load(input_stream), jq.stdin, cls=JSONDateTimeEncoder) # type: ignore
jq.stdin.write("\n") # type: ignore The interesting part to understand here is that jq is an invocation of the jq = subprocess.Popen(
["jq"] + list(jq_args),
stdin=subprocess.PIPE,
stdout=subprocess.PIPE if converting_output else None,
close_fds=False,
universal_newlines=True,
) Okay, with this in mind, we can simplify a whole lot the test case. / # jq --version
jq-1.6
/ # echo '{"foo": 1.0}' | jq .foo
1 Bingo, we can reproduce the issue we've faced and we know that the version of jq being used is 1.6 When taking a look at ⋊> Downloads ./jq-linux-amd64 --version
jq-1.7
⋊> Downloads echo '{"foo": 1.0}' | jq .foo
1.0 And, hey, it works as expected. So, the actual solution for the issue should be:
I will open a PR with this soon enough, and this will allow us to move on and get the snapshotter work merged, unblocking others depending on this one. |
After kata-deploy has installed, check containerd is still in active state, to ensure that we didn't corrupt the containerd config during kata-deploy's edits
Fixes: #8678