Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iSCSI fails to setup a volume when one target is down #74305

Closed
jsafrane opened this issue Feb 20, 2019 · 1 comment · Fixed by #74306
Closed

iSCSI fails to setup a volume when one target is down #74305

jsafrane opened this issue Feb 20, 2019 · 1 comment · Fixed by #74306
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@jsafrane
Copy link
Member

If one target in multipath volume gets down and a pod that uses the volume is deleted and quickly recreated (e.g. by a Deployment), the new pod never starts.

How to reproduce it (as minimally and precisely as possible):

  1. Run Deployment with single replica with one multipath iSCSI volume

  2. Destroy one target (e.g. switch the machine off). Multipath will mark appropriate path as failed:

$ multipath -ll
mpatha (36001405ecb7e7041b044ba9805a1628b) dm-1 LIO-ORG ,TCMU device     
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 14:0:0:0 sda 8:0  failed faulty running                 <----
|-+- policy='round-robin 0' prio=10 status=active
| `- 15:0:0:0 sdb 8:16 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 16:0:0:0 sdc 8:32 active ready running
  1. Delete the deployment pod.

  2. Kubernetes will quickly start a new pod for the Deployment.

  3. This new pod will be ContainerCreating (or Error) forever,

What happened:
Kubelet tries to WaitForAttach / SetUp volume for the new pod. The volume is still attached. Kubelet goes through all targets and tries to find out if it's logged in or not. Eventually it tries to read /sys/class/iscsi_host/host415/device/session383/connection383:0/iscsi_connection/connection383:0/address of the failed path and that returns error:

addrPath := connectionPath + "/address"
addr, err := io.ReadFile(addrPath)
if err != nil {
return nil, err
}

What you expected to happen:
Kubelet assumes that the path is failed / not logged in and tries to recover it. If it fails it continues with multipath with 2 available devices and 1 failed one.

IMO, simple continue instead of return will do its job just fine.

/assign
/sig storage
cc @bswartz @j-griffith @redbaron for iSCSI expertise

@jsafrane jsafrane added the kind/bug Categorizes issue or PR as related to a bug. label Feb 20, 2019
@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Feb 20, 2019
@jsafrane
Copy link
Member Author

Fix: #74306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants