Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] longhorn-manager cannot start while upgrading if the configmap contains volume sensitive settings #4160

Closed
derekbit opened this issue Jun 21, 2022 · 2 comments
Assignees
Labels
backport/1.3.1 component/longhorn-manager Longhorn manager (control plane) kind/bug priority/0 Must be fixed in this release (managed by PO) require/manual-test-plan Require adding/updating manual test cases if they can't be automated severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact)
Milestone

Comments

@derekbit
Copy link
Member

Describe the bug

longhorn-manager cannot start while upgrading if the configmap contains volume sensitive settings

To Reproduce

Steps to reproduce the behavior:

  1. Fresh install v1.2.x with one of the settings
    • taint-toleration: ...
    • default-longhorn-static-storage-class: ...
    • system-managed-components-node-selector: ...
  2. Create a volume and attach it to one node.
  3. Upgrade v1.3.0 with one of the settings
    • taint-toleration: ...
    • default-longhorn-static-storage-class: ...
    • system-managed-components-node-selector: ...
  4. longhorn-manager cannot start successfully and shows similar error like
    fail to set the setting taint-toleration ....
    

Expected behavior

longhorn-manager can start successfully

Log or Support bundle

If applicable, add the Longhorn managers' log or support bundle when the issue happens.
You can generate a Support Bundle using the link at the footer of the Longhorn UI.

Environment

  • Longhorn version:
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
    • Number of management node in the cluster:
    • Number of worker node in the cluster:
  • Node config
    • OS type and version:
    • CPU per node:
    • Memory per node:
    • Disk type(e.g. SSD/NVMe):
    • Network bandwidth between the nodes:
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
  • Number of Longhorn volumes in the cluster:

Additional context

Workaround

Remove the volume sensitive settings from the configmap

  1. Edit the configmap longhorn-default-setting
    kubectl -n longhorn-system edit configmap longhorn-default-setting
    For example,
...
data:
    default-setting.yaml: |-
    default-replica-count: 5
    taint-toleration: xyz.io/drain:NoSchedule
...
  1. Remove taint-toleration: xyz.io/drain:NoSchedule and becomes
...
data:
    default-setting.yaml: |-
    default-replica-count: 5
...
  1. Then save the configmap. Wait for a while, the longhorn-manager should start successfully
@derekbit derekbit self-assigned this Jun 21, 2022
@innobead innobead added component/longhorn-manager Longhorn manager (control plane) area/upgrade priority/0 Must be fixed in this release (managed by PO) severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact) backport/1.3.1 labels Jun 21, 2022
@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Jun 21, 2022

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at: [BUG] longhorn-manager cannot start while upgrading if the configmap contains volume sensitive settings #4160 (comment)

  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at: [BUG] longhorn-manager cannot start while upgrading if the configmap contains volume sensitive settings #4160 (comment)

  • Does the PR include the explanation for the fix or the feature?
    [BUG] longhorn-manager cannot start while upgrading if the configmap contains volume sensitive settings #4160 (comment)

  • [ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
    The PR for the YAML change is at:
    The PR for the chart change is at:

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at Avoid validation failure while syncing configmap to setting CRs longhorn-manager#1392

  • Which areas/issues this PR might have potential impacts on?
    Area: upgrade and default settings
    Issues

  • [ ] If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
    The LEP PR is at

  • [ ] If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
    The UI issue/PR is at

  • [ ] If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
    The documentation issue/PR is at

  • [ ] If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
    The automation skeleton PR is at
    The automation test case PR is at
    The issue of automation test case implementation is at (please create by the template)

  • [ ] If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
    The engine automation PR is at

  • [ ] If labeled: require/manual-test-plan Has the manual test plan been documented?
    The updated manual test plan is at

  • [ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at

@chriscchien
Copy link
Contributor

Verified on longhorn master head 1e8dd33
Result Pass
Test step

  1. Can reproduce on v1.3.0
  2. Longhorn can successfully upgrade from v1.2.x to master-head when taint-toleration was in longhorn-default-setting, no CrashLookbackOff observed from longhorn-manager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.3.1 component/longhorn-manager Longhorn manager (control plane) kind/bug priority/0 Must be fixed in this release (managed by PO) require/manual-test-plan Require adding/updating manual test cases if they can't be automated severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact)
Projects
None yet
Development

No branches or pull requests

4 participants