Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Degraded volume generate failed replica make volume unschedulable #3220

Closed
Tracked by #3219
chriscchien opened this issue Nov 4, 2021 · 8 comments
Closed
Tracked by #3219
Assignees
Labels
backport/1.2.6 kind/bug priority/1 Highly recommended to fix in this release (managed by PO) reproduce/often 80 - 50% reproducible severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact)
Milestone

Comments

@chriscchien
Copy link
Contributor

chriscchien commented Nov 4, 2021

Describe the bug
This is found from automation test case test_basic.py::test_allow_volume_creation_with_degraded_availability , can reproduced by hand (Reproduced rate about 20%).

Write data, detach and attach a degraded volume, then enable scheduling to the node which make volume degraded, sometimes will be an extra failed replica make volume scheduling failure

Screenshot_20211104_171849

To Reproduce
Steps to reproduce the behavior:

  1. Longhorn with 3 nodes
  2. Set 'Allow Volume Creation with Degraded Availability' to True
  3. Set 'Replica Node Level Soft Anti-Affinity' to False
  4. Disable scheduling on node 3
  5. Create volume with 3 replicas and attach to node 3
  6. Write data to volume
  7. Detach volume and attach volume to node 3
  8. Enable scheduling on node 3

Expected behavior
Volume should become scheduled, but somehow there will an failed replica make volume unscheduled.

Log
longhorn-support-bundle_8f6ba9b9-f232-4b14-a2e0-794fc8caa96f_2021-11-04T09-09-49Z.zip

Log related to failed replica name from support-bundle:

logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:19.870964937+08:00 time="2021-11-04T09:08:19Z" level=debug msg="Schedule replica bak-r-4dc6dc9f to node controlplane, disk 1029d7d4-23de-4ae4-979b-af635d42fb84, diskPath /var/lib/longhorn/, dataDirectoryName bak-ea9cb833"
logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:19.871172716+08:00 time="2021-11-04T09:08:19Z" level=debug msg="Replica controller picked up" controller=longhorn-replica controllerID=worker1 dataPath= node=worker1 nodeID= ownerID= replica=bak-r-4dc6dc9f
logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:19.871186477+08:00 time="2021-11-04T09:08:19Z" level=debug msg="Instance handler updated instance bak-r-4dc6dc9f state, old state , new state stopped"
logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:20.148771896+08:00 time="2021-11-04T09:08:20Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:20.148785425+08:00 time="2021-11-04T09:08:20Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker1 owner=worker1 replica=bak-r-4dc6dc9f state= volume=bak
logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:21.249170465+08:00 time="2021-11-04T09:08:21Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-vmb6c/longhorn-manager.log:2021-11-04T17:08:21.249186895+08:00 time="2021-11-04T09:08:21Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker1 owner=worker1 replica=bak-r-4dc6dc9f state=detached volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.063075157+08:00 time="2021-11-04T09:08:27Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.063089392+08:00 time="2021-11-04T09:08:27Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=detached volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.287826382+08:00 time="2021-11-04T09:08:27Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.287874556+08:00 time="2021-11-04T09:08:27Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attaching volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.481191416+08:00 time="2021-11-04T09:08:27Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.481211134+08:00 time="2021-11-04T09:08:27Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attaching volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.753147497+08:00 time="2021-11-04T09:08:27Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.753169509+08:00 time="2021-11-04T09:08:27Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attaching volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.857031317+08:00 time="2021-11-04T09:08:27Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:27.857048612+08:00 time="2021-11-04T09:08:27Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attaching volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:28.083774813+08:00 time="2021-11-04T09:08:28Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:28.083793484+08:00 time="2021-11-04T09:08:28Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attaching volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:30.083639585+08:00 time="2021-11-04T09:08:30Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:08:30.083662590+08:00 time="2021-11-04T09:08:30Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attaching volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:09:51.013158071+08:00 time="2021-11-04T09:09:51Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attached volume=bak
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:09:51.814029354+08:00 time="2021-11-04T09:09:51Z" level=error msg="There's no available disk for replica bak-r-4dc6dc9f, size 5368709120"
logs/longhorn-manager-l7zl4/longhorn-manager.log:2021-11-04T17:09:51.814111844+08:00 time="2021-11-04T09:09:51Z" level=error msg="unable to schedule replica" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=worker2 owner=worker2 replica=bak-r-4dc6dc9f state=attached volume=bak
yamls/longhorn/replicas.yaml:    name: bak-r-4dc6dc9f

Environment:
Longhorn master-head

@chriscchien chriscchien changed the title Fix it if it's automation issue or issue an bug to developer [Bug] Degraded volume generate failed replica make volume unschedulable Nov 4, 2021
@chriscchien chriscchien added kind/bug reproduce/often 80 - 50% reproducible priority/2 Nice to fix in this release (managed by PO) kind/regression Regression which has worked before severity/3 Function working but has a major issue w/ workaround and removed priority/2 Nice to fix in this release (managed by PO) labels Nov 4, 2021
@innobead innobead added this to the v1.3.0 milestone Nov 4, 2021
@innobead
Copy link
Member

innobead commented Nov 4, 2021

cc @longhorn/qa

@chriscchien
Copy link
Contributor Author

Update: This issue still present in recently build and make test case falky

@innobead innobead added priority/1 Highly recommended to fix in this release (managed by PO) priority/0 Must be fixed in this release (managed by PO) backport-needed/TBD severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact) and removed priority/1 Highly recommended to fix in this release (managed by PO) severity/3 Function working but has a major issue w/ workaround labels Jun 7, 2022
@innobead innobead assigned derekbit and unassigned PhanLe1010 Jun 7, 2022
@innobead innobead added priority/1 Highly recommended to fix in this release (managed by PO) and removed priority/0 Must be fixed in this release (managed by PO) labels Jun 7, 2022
@derekbit
Copy link
Member

derekbit commented Jun 8, 2022

cleanupCorruptedOrStaleReplicas is responsible for deleting staled or failed replicas.
https://github.com/longhorn/longhorn-manager/blob/master/controller/volume_controller.go#L780
The deletion is not triggered because r.Spec.RebuildRetryCount is 0.

@derekbit
Copy link
Member

derekbit commented Jun 8, 2022

For a replica's Spec.RebuildRetryCount, the default value is 0, but for a replica that is being replenished during rebuilding, the value is 5.

In this test case, there is a failed replica, and then a new replica is created for replenishing during rebuilding. The scheduler tries to schedule the replicas to nodes/disks periodically.

When the failed replica is scheduled first and successfully, the newly created one can be cleaned up according to logic.

If a new replica is scheduled first, the failed one cannot be scheduled or cleaned up, and then the volume becomes unschedulable.

So, the flaky result is due to the scheduling order of replicas in
https://github.com/longhorn/longhorn-manager/blob/master/controller/volume_controller.go#L1059

I have confirmed the root cause by sorting the replicas by the replicas' creationTimestamp, which means that the newly created replica cannot be scheduled and can be cleaned up.

@innobead
Copy link
Member

innobead commented Jun 8, 2022

@derekbit is this a regression or actually an existing issue for a long while (from 1.2.x)?

@derekbit
Copy link
Member

derekbit commented Jun 8, 2022

After checking the logic in v1.2.x, it's also an existing issue in v1.2.x.
v1.2.x e2e also sometimes hits this issue.
https://ci.longhorn.io/job/public/job/v1.2.x/job/v1.2.x-longhorn-tests-sles-amd64/96/

@innobead innobead added backport/1.2.5 and removed kind/regression Regression which has worked before labels Jun 8, 2022
derekbit added a commit to derekbit/longhorn-manager that referenced this issue Jun 8, 2022
In the test case longhorn/longhorn#3220 (comment),
a replica cannot be scheduled to a node, but it's spec.failedAt is set in
https://github.com/longhorn/longhorn-manager/blob/master/controller/volume_controller.go#L1317.
The strict constraints of cleanupFailedToScheduledReplicas() results in that
the failed replica cannot be cleanup up.

Longhorn 3320

Signed-off-by: Derek Su <derek.su@suse.com>
@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Jun 9, 2022

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

  • [ ] Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

  • Does the PR include the explanation for the fix or the feature?

  • ~~[ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?~
    The PR for the YAML change is at:
    The PR for the chart change is at:

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at

longhorn/longhorn-manager#1371

  • Which areas/issues this PR might have potential impacts on?
    Area: replica scheduling
    Issues

  • [ ] If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
    The LEP PR is at

  • [ ] If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
    The UI issue/PR is at

  • [ ] If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
    The documentation issue/PR is at

  • [ ] If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
    The automation skeleton PR is at
    The automation test case PR is at
    The issue of automation test case implementation is at (please create by the template)

  • [ ] If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
    The engine automation PR is at

  • [ ] If labeled: require/manual-test-plan Has the manual test plan been documented?
    The updated manual test plan is at

  • [ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at

innobead pushed a commit to longhorn/longhorn-manager that referenced this issue Jun 9, 2022
In the test case longhorn/longhorn#3220 (comment),
a replica cannot be scheduled to a node, but it's spec.failedAt is set in
https://github.com/longhorn/longhorn-manager/blob/master/controller/volume_controller.go#L1317.
The strict constraints of cleanupFailedToScheduledReplicas() results in that
the failed replica cannot be cleanup up.

Longhorn 3320

Signed-off-by: Derek Su <derek.su@suse.com>
@chriscchien
Copy link
Contributor Author

Close this ticket because issue not happen in recent build after fix merged .

derekbit added a commit to derekbit/longhorn-manager that referenced this issue Oct 21, 2022
In the test case longhorn/longhorn#3220 (comment),
a replica cannot be scheduled to a node, but it's spec.failedAt is set in
https://github.com/longhorn/longhorn-manager/blob/master/controller/volume_controller.go#L1317.
The strict constraints of cleanupFailedToScheduledReplicas() results in that
the failed replica cannot be cleanup up.

Longhorn 3220

Signed-off-by: Derek Su <derek.su@suse.com>
(cherry picked from commit fedb7eb)
innobead pushed a commit to longhorn/longhorn-manager that referenced this issue Oct 21, 2022
In the test case longhorn/longhorn#3220 (comment),
a replica cannot be scheduled to a node, but it's spec.failedAt is set in
https://github.com/longhorn/longhorn-manager/blob/master/controller/volume_controller.go#L1317.
The strict constraints of cleanupFailedToScheduledReplicas() results in that
the failed replica cannot be cleanup up.

Longhorn 3220

Signed-off-by: Derek Su <derek.su@suse.com>
(cherry picked from commit fedb7eb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.2.6 kind/bug priority/1 Highly recommended to fix in this release (managed by PO) reproduce/often 80 - 50% reproducible severity/2 Function working but has a major issue w/o workaround (a major incident with significant impact)
Projects
None yet
Development

No branches or pull requests

5 participants