Skip to content

kube: fix cluster-init stuck loop after basek3s conversion#5869

Merged
rene merged 1 commit intolf-edge:masterfrom
naiming-zededa:naiming-container-img
Apr 30, 2026
Merged

kube: fix cluster-init stuck loop after basek3s conversion#5869
rene merged 1 commit intolf-edge:masterfrom
naiming-zededa:naiming-container-img

Conversation

@naiming-zededa
Copy link
Copy Markdown
Contributor

Description

  • When a node converts from single-node k3s to basek3s cluster mode, in some error case, The main loop then calls external_boot_image_import() indefinitely and can not reach the rest of the tasks in the loop

  • Set install_kubevirt=0 when /var/lib/base-k3s-mode exists. KubeVirt has been removed; there is no boot image to import.

  • Replace ctr info (containerd management API) with crictl info Also redirect the check output to $INSTALL_LOG instead of /dev/null so failures are diagnosable.

  • On external_boot_image_import failure, log and fall through instead of calling continue.

PR dependencies

How to test and validate this PR

it's a corner case, it happened when converting from single node mode
into a basek3s cluster mode. even though the k3s is running, but the
looping stuck at the checking container image, and spit out massive messages.

PR Backports

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

For backport PRs (remove it if it's not a backport):

  • I've added a reference link to the original PR
  • PR's title follows the template

And the last but not least:

  • I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

Please, check the boxes above after submitting the PR in interactive mode.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 17.11%. Comparing base (2281599) to head (0086b2f).
⚠️ Report is 633 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5869      +/-   ##
==========================================
- Coverage   19.52%   17.11%   -2.42%     
==========================================
  Files          19      474     +455     
  Lines        3021    85661   +82640     
==========================================
+ Hits          590    14663   +14073     
- Misses       2310    69483   +67173     
- Partials      121     1515    +1394     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@naiming-zededa naiming-zededa force-pushed the naiming-container-img branch 2 times, most recently from 752880a to e2f0259 Compare April 29, 2026 18:57
Copy link
Copy Markdown

@zedi-pramodh zedi-pramodh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rene
Copy link
Copy Markdown
Contributor

rene commented Apr 30, 2026

@naiming-zededa could you please rebase, I want to test the Eden fix for smoke tests....

I tried to do by myself, but you didn't allow maintainers access.... when you open a PR you can mark this option (then I will be able to rebase your PR):

image

- When a node converts from single-node k3s to basek3s cluster mode, in
  some error case, The main loop then calls external_boot_image_import()
  indefinitely and can not reach the rest of the tasks in the loop

- Set install_kubevirt=0 when /var/lib/base-k3s-mode exists. KubeVirt
  has been removed; there is no boot image to import.
- Replace `ctr info` (containerd management API) with `crictl info` Also
  redirect the check output to $INSTALL_LOG instead of /dev/null so
  failures are diagnosable.
- On external_boot_image_import failure, log and fall through instead of
  calling continue.

Signed-off-by: naiming-zededa <naiming@zededa.com>
@naiming-zededa naiming-zededa force-pushed the naiming-container-img branch from e2f0259 to 0086b2f Compare April 30, 2026 17:04
@github-actions github-actions Bot requested a review from zedi-pramodh April 30, 2026 17:04
@naiming-zededa
Copy link
Copy Markdown
Contributor Author

@naiming-zededa could you please rebase, I want to test the Eden fix for smoke tests....

I tried to do by myself, but you didn't allow maintainers access.... when you open a PR you can mark this option (then I will be able to rebase your PR):

@rene rebased. I check this option, it is checked already, hmm..

@rene rene merged commit a6fc872 into lf-edge:master Apr 30, 2026
32 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants