[BUG] After adding taint to a node, volume cannot be attached to any other node #2475

PhanLe1010 · 2021-04-12T22:58:26Z

Describe the bug
Volume stuck in attaching after setting taint for a one of it replicas' node

To Reproduce

Create a cluster of 3 Longhorn nodes.
Create a volume of 3 replicas
Set taint f:b=NoExecute for a node
Wait for Kubernetes to evict workloads from tainted node
Attach the volume. Observe that the volume stuck in attaching state

Expected behavior
Longhorn should skip the replica on the tainted node and finish attaching volume

Environment:

Longhorn version: Longhorn master version - 04/12/21

Additional context
The replica controller always set the state of the replica: status.CurrentState = types.InstanceStateStopped when the corresponding instance manager is in error state. On the other hand, the volume controller insists on waiting until all replicas to be in running state before finishing attaching the volume

The text was updated successfully, but these errors were encountered:

PhanLe1010 · 2021-04-12T23:02:17Z

Proposal:

We need to differentiate 2 cases when the Instance Manager get into the error state:

If the volume is detached and isn't attaching, the replica should stay in a stoped state
If the volume is attaching, the replicas should be set to a different state so that the volume controller can ignore it

innobead · 2021-04-13T09:21:00Z

This is not a regression and it's a day 1 issue.

@shuo-wu "There is no similar issue with the node down case. We can add the similar logic for this case:"
https://github.com/longhorn/longhorn-manager/blob/9da8dee282c552504b4e4aed8dd398bfdaf2993c/controller/volume_controller.go#L1263-L1271

longhorn-io-github-bot · 2021-04-14T01:58:58Z

meldafrawi · 2021-04-15T15:13:28Z

Validation: PASSED

PhanLe1010 added the kind/bug label Apr 12, 2021

PhanLe1010 changed the title ~~[BUG] After setting taint for a node, volume cannot be attached to any node~~ [BUG] After adding taint to a node, volume cannot be attached to any other node Apr 12, 2021

innobead added the severity/1 Function broken (a critical incident with very high impact (ex: data corruption, failed upgrade) label Apr 13, 2021

khushboo-rancher added the kind/regression Regression which has worked before label Apr 13, 2021

innobead assigned PhanLe1010 Apr 13, 2021

innobead added this to the v1.1.1 milestone Apr 13, 2021

innobead removed the kind/regression Regression which has worked before label Apr 13, 2021

innobead added the priority/2 Nice to implement or fix in this release (managed by PO) label Apr 13, 2021

PhanLe1010 mentioned this issue Apr 14, 2021

Fix volume cannot be attached when one of the replicas has error IM and the node of IM is still running longhorn/longhorn-manager#873

Merged

innobead added the component/longhorn-manager Longhorn manager (control plane) label Apr 14, 2021

cclhsu mentioned this issue Apr 14, 2021

[QUESTION] Deploying longhorn instance-manager in rke cluster failed. #2488

Closed

meldafrawi closed this as completed Apr 15, 2021

joshimoo mentioned this issue Aug 26, 2021

[BUG] Node deletion leads volume to get stuck in attaching state #2848

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] After adding taint to a node, volume cannot be attached to any other node #2475

[BUG] After adding taint to a node, volume cannot be attached to any other node #2475

PhanLe1010 commented Apr 12, 2021 •

edited

Loading

PhanLe1010 commented Apr 12, 2021 •

edited

Loading

innobead commented Apr 13, 2021 •

edited

Loading

longhorn-io-github-bot commented Apr 14, 2021 •

edited by PhanLe1010

Loading

meldafrawi commented Apr 15, 2021

[BUG] After adding taint to a node, volume cannot be attached to any other node #2475

[BUG] After adding taint to a node, volume cannot be attached to any other node #2475

Comments

PhanLe1010 commented Apr 12, 2021 • edited Loading

PhanLe1010 commented Apr 12, 2021 • edited Loading

innobead commented Apr 13, 2021 • edited Loading

longhorn-io-github-bot commented Apr 14, 2021 • edited by PhanLe1010 Loading

Pre Ready-For-Testing Checklist

meldafrawi commented Apr 15, 2021

PhanLe1010 commented Apr 12, 2021 •

edited

Loading

PhanLe1010 commented Apr 12, 2021 •

edited

Loading

innobead commented Apr 13, 2021 •

edited

Loading

longhorn-io-github-bot commented Apr 14, 2021 •

edited by PhanLe1010

Loading