-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Optimize for Frequent Power-off/Power-On operating procedures #3261
Comments
Pre Ready-For-Testing Checklist
|
Automation e2e test issue: harvester/tests#664 |
New updated scenario:
|
The planned optimization and bug fix in v1.3.0 have been done, and the continuous enhancement is tracked new epic #5007 in Harvester v1.4.0. Close this issue now. |
Context
Running Harvester in edge or remote environments with either intermittent power or with devices needing to be turned off and moved to a new location frequently (sometimes daily).
Is your feature request related to a problem? Please describe.
Operators turning the cluster off and on are not highly technical with Kubernetes and thus can't be expected to troubleshoot stuck containers that don't come back online after startup.
update 20230824:
This issue is converted into an EPIC, it touchs k8s, csi, VM, Linux and more, one HEP is required.
A Harvester cluster is deployed in the rough sequence of:
OS -> rancherd -> rke2 -> k8s -> fleet -> harvester -> longhorn | monitoring | logging..., -> virtual-machine ...
To shutodown the cluster safely, follow the roughly reverse sequence seems reasonable.
A bunch of sub-stories are created to work on them.
Sub-stories in v1.3.0:
Bug fix and enhancement in v1.3.0:
A couple of bugs with various phenomena, but the root cause is same: after a node reboots, kubelet fails to UnmountVolume
Stopping
state and can't launch #4033Others:
VM start button is not visible
#4659HEP PR:
Continuous enhancment in v1.4.0+:
EPIC: #5007 [ENHANCEMENT] Continuous enhancement on system robustness and resilience
Those ISSUEs/PRs could be checked & algined as well:
#3902
#3263
harvester/node-manager#15
The text was updated successfully, but these errors were encountered: