Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Optimize for Frequent Power-off/Power-On operating procedures #3261

Closed
9 tasks done
rebeccazzzz opened this issue Dec 8, 2022 · 4 comments
Closed
9 tasks done
Assignees
Labels
Epic highlight Highlight issues/features kind/enhancement Issues that improve or augment existing functionality priority/1 Highly recommended to fix in this release require/doc Improvements or additions to documentation require/HEP Require Harvester Enhancement Proposal PR
Milestone

Comments

@rebeccazzzz
Copy link

rebeccazzzz commented Dec 8, 2022

Context
Running Harvester in edge or remote environments with either intermittent power or with devices needing to be turned off and moved to a new location frequently (sometimes daily).

Is your feature request related to a problem? Please describe.
Operators turning the cluster off and on are not highly technical with Kubernetes and thus can't be expected to troubleshoot stuck containers that don't come back online after startup.

update 20230824:
This issue is converted into an EPIC, it touchs k8s, csi, VM, Linux and more, one HEP is required.

A Harvester cluster is deployed in the rough sequence of:

OS -> rancherd -> rke2 -> k8s -> fleet -> harvester -> longhorn | monitoring | logging..., -> virtual-machine ...

To shutodown the cluster safely, follow the roughly reverse sequence seems reasonable.

A bunch of sub-stories are created to work on them.

Sub-stories in v1.3.0:

Bug fix and enhancement in v1.3.0:

A couple of bugs with various phenomena, but the root cause is same: after a node reboots, kubelet fails to UnmountVolume

Others:

HEP PR:

Continuous enhancment in v1.4.0+:

EPIC: #5007 [ENHANCEMENT] Continuous enhancement on system robustness and resilience

Those ISSUEs/PRs could be checked & algined as well:
#3902
#3263
harvester/node-manager#15

@rebeccazzzz rebeccazzzz added kind/enhancement Issues that improve or augment existing functionality priority/0 Must be fixed in this release labels Dec 8, 2022
@rebeccazzzz rebeccazzzz added this to the v1.2.0 milestone Dec 8, 2022
@rebeccazzzz rebeccazzzz added priority/1 Highly recommended to fix in this release and removed priority/0 Must be fixed in this release labels Dec 8, 2022
@guangbochen guangbochen added the require/doc Improvements or additions to documentation label Dec 9, 2022
@harvesterhci-io-github-bot

Pre Ready-For-Testing Checklist

  • If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted?
    The HEP PR is at:

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

  • Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*)?
    The PR is at:

    • Does the PR include the explanation for the fix or the feature?

    • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
      The PR for the YAML change is at:
      The PR for the chart change is at:

  • If labeled: area/ui Has the UI issue filed or ready to be merged?
    The UI issue/PR is at:

  • If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged?
    The documentation/KB PR is at:

  • If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?

    • The automation skeleton PR is at:
    • The automation test case PR is at:
  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at:

@harvesterhci-io-github-bot

Automation e2e test issue: harvester/tests#664

@w13915984028
Copy link
Member

The planned optimization and bug fix in v1.3.0 have been done, and the continuous enhancement is tracked new epic #5007 in Harvester v1.4.0. Close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic highlight Highlight issues/features kind/enhancement Issues that improve or augment existing functionality priority/1 Highly recommended to fix in this release require/doc Improvements or additions to documentation require/HEP Require Harvester Enhancement Proposal PR
Projects
None yet
Development

No branches or pull requests

6 participants