[Bug] Degraded volume generate failed replica make volume unschedulable #3220

chriscchien · 2021-11-04T08:43:37Z

Describe the bug
This is found from automation test case test_basic.py::test_allow_volume_creation_with_degraded_availability , can reproduced by hand (Reproduced rate about 20%).

Write data, detach and attach a degraded volume, then enable scheduling to the node which make volume degraded, sometimes will be an extra failed replica make volume scheduling failure

To Reproduce
Steps to reproduce the behavior:

Longhorn with 3 nodes
Set 'Allow Volume Creation with Degraded Availability' to True
Set 'Replica Node Level Soft Anti-Affinity' to False
Disable scheduling on node 3
Create volume with 3 replicas and attach to node 3
Write data to volume
Detach volume and attach volume to node 3
Enable scheduling on node 3

Expected behavior
Volume should become scheduled, but somehow there will an failed replica make volume unscheduled.

Log
longhorn-support-bundle_8f6ba9b9-f232-4b14-a2e0-794fc8caa96f_2021-11-04T09-09-49Z.zip

Log related to failed replica name from support-bundle:

logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:19.870964937+08:00 time="2021-11-04T09:08:19Z" level=debug msg="Schedule replica bak-r-4dc6dc9f to node controlplane, disk 1029d7d4-23de-4ae4-979b-af635d42fb84, diskPath /var/lib/longhorn/, dataDirectoryName bak-ea9cb833"
logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:19.871172716+08:00 time="2021-11-04T09:08:19Z" level=debug msg="Replica controller picked up" controller=longhorn-replica controllerID=worker1 dataPath= node=worker1 nodeID= ownerID= replica=bak-r-4dc6dc9f
logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:19.871186477+08:00 time="2021-11-04T09:08:19Z" level=debug msg="Instance handler updated instance bak-r-4dc6dc9f state, old state , new state stopped"
logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:20.148771896+08:00 time="2021-11-04T09:08:20Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:20.148785425+08:00 time="2021-11-04T09:08:20Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker1 owner=worker1 replica=bak-r-4dc6dc9f state= volume=bak
logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:21.249170465+08:00 time="2021-11-04T09:08:21Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:21.249186895+08:00 time="2021-11-04T09:08:21Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker1 owner=worker1 replica=bak-r-4dc6dc9f state=detached volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.063075157+08:00 time="2021-11-04T09:08:27Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.063089392+08:00 time="2021-11-04T09:08:27Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=detached volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.287826382+08:00 time="2021-11-04T09:08:27Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.287874556+08:00 time="2021-11-04T09:08:27Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attaching volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.481191416+08:00 time="2021-11-04T09:08:27Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.481211134+08:00 time="2021-11-04T09:08:27Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attaching volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.753147497+08:00 time="2021-11-04T09:08:27Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.753169509+08:00 time="2021-11-04T09:08:27Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attaching volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.857031317+08:00 time="2021-11-04T09:08:27Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.857048612+08:00 time="2021-11-04T09:08:27Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attaching volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:28.083774813+08:00 time="2021-11-04T09:08:28Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:28.083793484+08:00 time="2021-11-04T09:08:28Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attaching volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:30.083639585+08:00 time="2021-11-04T09:08:30Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:30.083662590+08:00 time="2021-11-04T09:08:30Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attaching volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:09:51.013158071+08:00 time="2021-11-04T09:09:51Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attached volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:09:51.814029354+08:00 time="2021-11-04T09:09:51Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:09:51.814111844+08:00 time="2021-11-04T09:09:51Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attached volume=bak
yamls/longhorn/replicas.yaml:    name: bak-r-4dc6dc9f

Environment:
Longhorn master-head

The text was updated successfully, but these errors were encountered:

innobead · 2021-11-04T09:42:05Z

cc @longhorn/qa

chriscchien · 2022-05-11T02:03:24Z

Update: This issue still present in recently build and make test case falky

derekbit · 2022-06-08T05:23:51Z

cleanupCorruptedOrStaleReplicas is responsible for deleting staled or failed replicas.
https://github.com/longhorn/longhorn-manager/blob/master/controller/volume_controller.go#L780
The deletion is not triggered because r.Spec.RebuildRetryCount is 0.

derekbit · 2022-06-08T07:24:51Z

For a replica's Spec.RebuildRetryCount, the default value is 0, but for a replica that is being replenished during rebuilding, the value is 5.

In this test case, there is a failed replica, and then a new replica is created for replenishing during rebuilding. The scheduler tries to schedule the replicas to nodes/disks periodically.

When the failed replica is scheduled first and successfully, the newly created one can be cleaned up according to logic.

If a new replica is scheduled first, the failed one cannot be scheduled or cleaned up, and then the volume becomes unschedulable.

So, the flaky result is due to the scheduling order of replicas in
https://github.com/longhorn/longhorn-manager/blob/master/controller/volume_controller.go#L1059

I have confirmed the root cause by sorting the replicas by the replicas' creationTimestamp, which means that the newly created replica cannot be scheduled and can be cleaned up.

innobead · 2022-06-08T07:44:16Z

@derekbit is this a regression or actually an existing issue for a long while (from 1.2.x)?

derekbit · 2022-06-08T07:56:59Z

After checking the logic in v1.2.x, it's also an existing issue in v1.2.x.
v1.2.x e2e also sometimes hits this issue.
https://ci.longhorn.io/job/public/job/v1.2.x/job/v1.2.x-longhorn-tests-sles-amd64/96/

In the test case longhorn/longhorn#3220 (comment), a replica cannot be scheduled to a node, but it's spec.failedAt is set in https://github.com/longhorn/longhorn-manager/blob/master/controller/volume_controller.go#L1317. The strict constraints of cleanupFailedToScheduledReplicas() results in that the failed replica cannot be cleanup up. Longhorn 3320 Signed-off-by: Derek Su <derek.su@suse.com>

longhorn-io-github-bot · 2022-06-09T05:10:15Z

Pre Ready-For-Testing Checklist

Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at:
~~[ ] Is there a workaround for the issue? If so, where is it documented?~~
The workaround is at:
Does the PR include the explanation for the fix or the feature?
~~[ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?~
The PR for the YAML change is at:
The PR for the chart change is at:
Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
The PR is at

longhorn/longhorn-manager#1371

Which areas/issues this PR might have potential impacts on?
Area: replica scheduling
Issues
~~[ ] If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?~~
The LEP PR is at
~~[ ] If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?~~
The UI issue/PR is at
~~[ ] If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?~~
The documentation issue/PR is at
[ ] If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
The automation skeleton PR is at
The automation test case PR is at
The issue of automation test case implementation is at (please create by the template)
~~[ ] If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?~~
The engine automation PR is at
~~[ ] If labeled: require/manual-test-plan Has the manual test plan been documented?~~
The updated manual test plan is at
~~[ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?~~
The compatibility issue is filed at

In the test case longhorn/longhorn#3220 (comment), a replica cannot be scheduled to a node, but it's spec.failedAt is set in https://github.com/longhorn/longhorn-manager/blob/master/controller/volume_controller.go#L1317. The strict constraints of cleanupFailedToScheduledReplicas() results in that the failed replica cannot be cleanup up. Longhorn 3320 Signed-off-by: Derek Su <derek.su@suse.com>

chriscchien · 2022-06-14T02:21:28Z

Close this ticket because issue not happen in recent build after fix merged .

In the test case longhorn/longhorn#3220 (comment), a replica cannot be scheduled to a node, but it's spec.failedAt is set in https://github.com/longhorn/longhorn-manager/blob/master/controller/volume_controller.go#L1317. The strict constraints of cleanupFailedToScheduledReplicas() results in that the failed replica cannot be cleanup up. Longhorn 3220 Signed-off-by: Derek Su <derek.su@suse.com> (cherry picked from commit fedb7eb)

chriscchien mentioned this issue Nov 4, 2021

[TASK]Fix flaky test case : test_allow_volume_creation_with_degraded_availability #3219

Closed

2 tasks

chriscchien changed the title ~~Fix it if it's automation issue or issue an bug to developer~~ [Bug] Degraded volume generate failed replica make volume unschedulable Nov 4, 2021

innobead added this to the v1.3.0 milestone Nov 4, 2021

chriscchien mentioned this issue Jan 12, 2022

[TEST] Fix arm64 failed/flaky cases in E2E test cases #3527

Closed

1 task

chriscchien mentioned this issue Mar 4, 2022

[TEST] Analyze SLES e2e test result #3667

Closed

1 task

innobead modified the milestones: v1.3.0, v1.4.0 Mar 31, 2022

chriscchien mentioned this issue May 11, 2022

[TEST] Verify e2e failed cases (Listed in description) #3951

Closed

1 task

innobead assigned PhanLe1010 May 20, 2022

chriscchien mentioned this issue May 31, 2022

[TEST] Verify e2e failed cases on master-head build 139 #4047

Closed

1 task

innobead assigned derekbit and unassigned PhanLe1010 Jun 7, 2022

innobead added priority/1 Highly recommended to fix in this release (managed by PO) and removed priority/0 Must be fixed in this release (managed by PO) labels Jun 7, 2022

innobead added backport/1.2.5 and removed kind/regression Regression which has worked before labels Jun 8, 2022

This was referenced Jun 9, 2022

Relieve the constraints of cleanupFailedToScheduledReplicas() longhorn/longhorn-manager#1371

Merged

Fix the flaky issue in test_allow_volume_creation_with_degraded_ability longhorn/longhorn-tests#989

Merged

innobead assigned chriscchien Jun 9, 2022

chriscchien closed this as completed Jun 14, 2022

roger-ryao mentioned this issue Oct 18, 2022

[Bug] [1.2.x] Degraded volume generate failed replica make volume unschedulable #4732

Closed

innobead added backport/1.2.6 and removed backport/1.2.5 backport-needed/TBD labels Oct 21, 2022

derekbit mentioned this issue Oct 21, 2022

[BACKPORT][v1.2.6] Relieve the constraints of cleanupFailedToScheduledReplicas() longhorn/longhorn-manager#1527

Merged

github-actions bot mentioned this issue Oct 21, 2022

[BACKPORT][v1.2.6][Bug] Degraded volume generate failed replica make volume unschedulable #4770

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Degraded volume generate failed replica make volume unschedulable #3220

[Bug] Degraded volume generate failed replica make volume unschedulable #3220

chriscchien commented Nov 4, 2021 •

edited

innobead commented Nov 4, 2021

chriscchien commented May 11, 2022

derekbit commented Jun 8, 2022

derekbit commented Jun 8, 2022 •

edited

innobead commented Jun 8, 2022

derekbit commented Jun 8, 2022 •

edited

longhorn-io-github-bot commented Jun 9, 2022 •

edited by derekbit

chriscchien commented Jun 14, 2022

[Bug] Degraded volume generate failed replica make volume unschedulable #3220

[Bug] Degraded volume generate failed replica make volume unschedulable #3220

Comments

chriscchien commented Nov 4, 2021 • edited

innobead commented Nov 4, 2021

chriscchien commented May 11, 2022

derekbit commented Jun 8, 2022

derekbit commented Jun 8, 2022 • edited

innobead commented Jun 8, 2022

derekbit commented Jun 8, 2022 • edited

longhorn-io-github-bot commented Jun 9, 2022 • edited by derekbit

Pre Ready-For-Testing Checklist

chriscchien commented Jun 14, 2022

chriscchien commented Nov 4, 2021 •

edited

derekbit commented Jun 8, 2022 •

edited

derekbit commented Jun 8, 2022 •

edited

longhorn-io-github-bot commented Jun 9, 2022 •

edited by derekbit