iSCSI fails to setup a volume when one target is down #74305
Labels
kind/bug
Categorizes issue or PR as related to a bug.
sig/storage
Categorizes an issue or PR as relevant to SIG Storage.
If one target in multipath volume gets down and a pod that uses the volume is deleted and quickly recreated (e.g. by a Deployment), the new pod never starts.
How to reproduce it (as minimally and precisely as possible):
Run Deployment with single replica with one multipath iSCSI volume
Destroy one target (e.g. switch the machine off). Multipath will mark appropriate path as failed:
Delete the deployment pod.
Kubernetes will quickly start a new pod for the Deployment.
This new pod will be
ContainerCreating
(orError
) forever,What happened:
Kubelet tries to WaitForAttach / SetUp volume for the new pod. The volume is still attached. Kubelet goes through all targets and tries to find out if it's logged in or not. Eventually it tries to read
/sys/class/iscsi_host/host415/device/session383/connection383:0/iscsi_connection/connection383:0/address
of the failed path and that returns error:kubernetes/pkg/volume/util/device_util_linux.go
Lines 164 to 168 in 2f52e91
What you expected to happen:
Kubelet assumes that the path is failed / not logged in and tries to recover it. If it fails it continues with multipath with 2 available devices and 1 failed one.
IMO, simple
continue
instead ofreturn
will do its job just fine./assign
/sig storage
cc @bswartz @j-griffith @redbaron for iSCSI expertise
The text was updated successfully, but these errors were encountered: