kube: fix cluster-init stuck loop after basek3s conversion#5869
kube: fix cluster-init stuck loop after basek3s conversion#5869rene merged 1 commit intolf-edge:masterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5869 +/- ##
==========================================
- Coverage 19.52% 17.11% -2.42%
==========================================
Files 19 474 +455
Lines 3021 85661 +82640
==========================================
+ Hits 590 14663 +14073
- Misses 2310 69483 +67173
- Partials 121 1515 +1394 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
752880a to
e2f0259
Compare
|
@naiming-zededa could you please rebase, I want to test the Eden fix for smoke tests.... I tried to do by myself, but you didn't allow maintainers access.... when you open a PR you can mark this option (then I will be able to rebase your PR):
|
- When a node converts from single-node k3s to basek3s cluster mode, in some error case, The main loop then calls external_boot_image_import() indefinitely and can not reach the rest of the tasks in the loop - Set install_kubevirt=0 when /var/lib/base-k3s-mode exists. KubeVirt has been removed; there is no boot image to import. - Replace `ctr info` (containerd management API) with `crictl info` Also redirect the check output to $INSTALL_LOG instead of /dev/null so failures are diagnosable. - On external_boot_image_import failure, log and fall through instead of calling continue. Signed-off-by: naiming-zededa <naiming@zededa.com>
e2f0259 to
0086b2f
Compare
@rene rebased. I check this option, it is checked already, hmm.. |

Description
When a node converts from single-node k3s to basek3s cluster mode, in some error case, The main loop then calls external_boot_image_import() indefinitely and can not reach the rest of the tasks in the loop
Set install_kubevirt=0 when /var/lib/base-k3s-mode exists. KubeVirt has been removed; there is no boot image to import.
Replace
ctr info(containerd management API) withcrictl infoAlso redirect the check output to $INSTALL_LOG instead of /dev/null so failures are diagnosable.On external_boot_image_import failure, log and fall through instead of calling continue.
PR dependencies
How to test and validate this PR
it's a corner case, it happened when converting from single node mode
into a basek3s cluster mode. even though the k3s is running, but the
looping stuck at the checking container image, and spit out massive messages.
PR Backports
Checklist
For backport PRs (remove it if it's not a backport):
And the last but not least:
check them.
Please, check the boxes above after submitting the PR in interactive mode.