[BUG] Continuously rebuild when auto-balance==least-effort and existing node becomes unschedulable #4502

docbobo · 2022-08-30T07:34:04Z

Describe the bug

I was just noticing some weird interactions beet auto-balance "least-effort" and unschedulable replicas when I had to cordon a few of my nodes. Here's a quick description:

I have a volume with 3 replicas, with auto-balance configure to "ignored", so that it would fall back to my system default of least effort. Each of the replicas is assigned to a different zone. When I had to cordon a few of the nodes, all of the nodes from one of the zone became unschedulable and there were literally only two zones left. However, the replica on the unschedulable node was still running. So far, so good.

Longhorn then started to build a fourth replica in one of the two regions already being used. When it was done, it deleted one of the previously existing ones. And the it did that again. And again. And again. It actually never stopped building new replicas and deleting the old ones.

I've seen that behavior already a few times. What helped in that situation was setting auto balance to disabled. In that case, it will finish the cycle that it's currently in, then stop

To Reproduce

See above.

Expected behavior

Even though one of the replicas is on an unschedulable node, I'd expect longhorn to realize that it already has achieved the best balance regarding fault-tolerance. I would definitely not expect it to continue recreating and deleting replicas forever.

Log or Support bundle

n/a

Environment

Longhorn version: 1.3.1
Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3s v1.24.4
- Number of management node in the cluster: 3
- Number of worker node in the cluster: 11
Node config
- OS type and version: openSUSE MicroOS
- CPU per node: 4 cores
- Memory per node: 4-12 MB
- Disk type(e.g. SSD/NVMe): SSD
- Network bandwidth between the nodes: 8x10Gbe, 3x1Gbe
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): KVM & Baremetal
Number of Longhorn volumes in the cluster: 37

Additional context

n/a

innobead · 2022-08-30T07:34:29Z

cc @c3y1huang

c3y1huang · 2022-08-30T08:18:11Z

Thanks for reporting. We will look into this.

withinboredom · 2022-08-31T08:53:53Z

Looks like the same thing (or maybe the opposite?) happens if set to best-effort and it is auto-upgrading the image and one of the nodes goes away during the upgrade.

replicas are created/destroyed in an infinite loop.

c3y1huang · 2022-08-31T09:46:41Z

Auto-balance best-effort will first go through the logic to achieve the balance for least-effect. So the infinite loop goes for both cases. We need to fix this bug so the setting recognizes replicas already on an un-schedulable node.

However, I am not expecting it to auto-upgrading the image, can you give some more info about the behavior?

withinboredom · 2022-08-31T11:51:14Z

I had the "auto-upgrade engine" turned on, and during the upgrade to 1.3.1 from 1.3.0, one of the nodes turned off while upgrading some engines. The volume then went into an infinite loop of creating replicas and deleting them until the node returned online sometime later.

I came here to report the issue and saw this here.

There were only three nodes, and each volume had three replicas (then two nodes and three replicas). It is hard to tell, but it may not be the same issue but the same behavior.

docbobo · 2022-09-02T15:28:32Z

I am seeing something similar just when using nodeSelector. When the nodeSelector only matches nodes in two different regions, but the strategy is set of least-effort (and maybe also best-effort) then longhorn will continuously rebuild.

longhorn-io-github-bot · 2022-09-28T00:37:01Z

Pre Ready-For-Testing Checklist

Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at:
- test(replica-auto-balance): unschedulable node longhorn-tests#1124
- [BUG] Continuously rebuild when auto-balance==least-effort and existing node becomes unschedulable #4502 (comment)
Is there a workaround for the issue? If so, where is it documented?
The workaround is at:
- [BUG] Continuously rebuild when auto-balance==least-effort and existing node becomes unschedulable #4502 (comment)
Does the PR include the explanation for the fix or the feature?
~~Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?~~
The PR for the YAML change is at:
The PR for the chart change is at:
Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
The PR is at
Which areas/issues this PR might have potential impacts on?
Area manager
Issues
If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
The LEP PR is at
If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
The UI issue/PR is at
If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
The documentation issue/PR is at
If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
The automation skeleton/test case PR is at test(replica-auto-balance): unschedulable node longhorn-tests#1124
~~The issue of automation test case implementation is at (please create by the template)~~
If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
The engine automation PR is at
If labeled: require/manual-test-plan Has the manual test plan been documented?
The updated manual test plan is at
If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
The compatibility issue is filed at

yangchiu · 2022-09-30T00:51:44Z

Verified passed on master-head (longhorn-manager aa79220) by executing the test_replica_auto_balance_when_replica_on_unschedulable_node automated test case, and manually running the test steps following #4502 (comment), the unexpected loop of replica deleting and recreating isn't observed.

docbobo added the kind/bug label Aug 30, 2022

c3y1huang added component/longhorn-manager Longhorn manager (control plane) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated labels Aug 30, 2022

c3y1huang self-assigned this Aug 30, 2022

innobead added this to the v1.4.0 milestone Aug 30, 2022

innobead added backport/1.3.2 priority/0 Must be fixed in this release (managed by PO) area/volume-replica-scheduling Volume replica scheduling related labels Aug 30, 2022

This was referenced Sep 27, 2022

test(replica-auto-balance): unschedulable node longhorn/longhorn-tests#1124

Merged

fix(replica-auto-balance): unschedulable node rebuilding loop longhorn/longhorn-manager#1501

Merged

innobead added the backport/1.2.6 label Sep 28, 2022

github-actions bot mentioned this issue Sep 28, 2022

[BACKPORT][v1.2.6][BUG] Continuously rebuild when auto-balance==least-effort and existing node becomes unschedulable #4629

Closed

c3y1huang added backport/1.3.2 and removed backport/1.3.2 labels Sep 28, 2022

github-actions bot mentioned this issue Sep 28, 2022

[BACKPORT][v1.3.2][BUG] Continuously rebuild when auto-balance==least-effort and existing node becomes unschedulable #4630

Closed

This was referenced Sep 28, 2022

[BACKPORT][v1.2.6] fix(replica-auto-balance): rebuilding loop on unschedulable node longhorn/longhorn-manager#1507

Merged

[BACKPORT][v1.3.2] fix(replica-auto-balance): rebuilding loop on unschedulable node longhorn/longhorn-manager#1506

Merged

khushboo-rancher assigned yangchiu Sep 28, 2022

This was referenced Sep 29, 2022

[BACKPORT][v1.3.x] test(replica-auto-balance): unschedulable node longhorn/longhorn-tests#1128

Merged

[BACKPORT][v1.2.x] test(replica-auto-balance): unschedulable node longhorn/longhorn-tests#1129

Merged

yangchiu closed this as completed Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Continuously rebuild when auto-balance==least-effort and existing node becomes unschedulable #4502

[BUG] Continuously rebuild when auto-balance==least-effort and existing node becomes unschedulable #4502

docbobo commented Aug 30, 2022

innobead commented Aug 30, 2022

c3y1huang commented Aug 30, 2022

withinboredom commented Aug 31, 2022 •

edited

c3y1huang commented Aug 31, 2022

withinboredom commented Aug 31, 2022

docbobo commented Sep 2, 2022

longhorn-io-github-bot commented Sep 28, 2022 •

edited by c3y1huang

yangchiu commented Sep 30, 2022 •

edited

[BUG] Continuously rebuild when auto-balance==least-effort and existing node becomes unschedulable #4502

[BUG] Continuously rebuild when auto-balance==least-effort and existing node becomes unschedulable #4502

Comments

docbobo commented Aug 30, 2022

Describe the bug

To Reproduce

Expected behavior

Log or Support bundle

Environment

Additional context

innobead commented Aug 30, 2022

c3y1huang commented Aug 30, 2022

withinboredom commented Aug 31, 2022 • edited

c3y1huang commented Aug 31, 2022

withinboredom commented Aug 31, 2022

docbobo commented Sep 2, 2022

longhorn-io-github-bot commented Sep 28, 2022 • edited by c3y1huang

Pre Ready-For-Testing Checklist

yangchiu commented Sep 30, 2022 • edited

withinboredom commented Aug 31, 2022 •

edited

longhorn-io-github-bot commented Sep 28, 2022 •

edited by c3y1huang

yangchiu commented Sep 30, 2022 •

edited