For complete list of issues please check our Github issues page. If you notice an issue not listed in Github issues page, please do file an issue on the Github repository.
Please refer to release notes to learn about known issues in each release.
Following listing is for issues observed in Kubernetes.
Multi-Attach error for RWO (Block) volume when Node VM is shutdown before Pods are evicted and Volumes are detached from Node VM
Note: This Issue is present in all Kubernetes Releases
- Impact: After Node is shutdown, Pod running on that Node does not come up on the new Node. Events on the Pod will have a warning message for "FailedAttachVolume". Error Message:
Multi-Attach error for volume "pvc-uuid" Volume is already exclusively attached to one node and can't be attached to another. - Upstream Issue: Kubernetes is being enhanced to fix this issue. Here is the Kubernetes Enhancement Proposals (KEP) PR you can refer for more detail - kubernetes/enhancements#1116
- Workaround:
- Pods stuck in this state can be recovered by following steps.
- Find the Node VM in the vCenter Inventory. Make sure the correct VM associated with the Node is used for further instructions.
- Detach all Persistent Volumes Disk attached to this Node VM. Note: Do not detach Primary disks used by the Guest OS.
- Right-click a virtual machine in the inventory and select Edit Settings.
- From Virtual Hardware find all Hard Disks for Persistent Volumes and remove them. (Do not select - Delete files from datastore)
- Click on OK to reconfigure VM to detach all Persistent Volumes disks from shutdown/powered off Node VM.
- Execute
kubectl get volumeattachmentsand find all volumeattachments objects associated with shutdown Node VM. - Edit
volumeattachmentobject withkubectl edit volumeattachments <volumeattachments-object-name>and remove finalizers. - Check if the
volumeattachmentobject is deleted by Kubernetes. If this object remains on the system, you can safely delete this withkubectl delete volumeattachments <volumeattachments-object-name>. - Wait for some time for Pod to come up on a new Node.
- Pods stuck in this state can be recovered by following steps.
- Impact: Low throughput of attach and detach operations, especially at scale.
- Upstream issue is tracked at: kubernetes/kubernetes#84169
- Workaround:
- Upgrade Kubernetes minor version to 1.17.8 and above or 1.18.5 and above. These versions contain the upstream fix for this issue.
- If upgrading the Kubernetes version is not possible, then there is a workaround that can be applied on your Kubernetes cluster. On each primary node, perform the following steps:
- Open kube-controller-manager manifest, located at
/etc/kubernetes/manifests/kube-controller-manager.yaml - Add
--disable-attach-detach-reconcile-syncto spec.containers.command - Since kube-controller-manager is a static pod, Kubelet will restart it whenever a new flag is added. Make sure the kube-controller-manager pod is up and running.
- Open kube-controller-manager manifest, located at
- Issue: Filesystem resize is skipped if the original PVC is deleted when FilesystemResizePending condition is still on the PVC, but PV and its associated volume on the storage system are not deleted due to the Retain policy. Refer to kubernetes/kubernetes#88683 for more details.
- Impact: User may create a new PVC to statically bind to the undeleted PV. In this case, the volume on the storage system is resized but the filesystem is not resized accordingly. User may try to write to the volume whose filesystem is out of capacity.
- Workaround: User can log into the container to manually resize the filesystem.
- Issue: kubernetes/enhancements#661
- Impact: User cannot resize volume in a StatefulSet.
- Workaround:
- Upstream solution is being tracked at kubernetes/enhancements#2842.
- If the statefulset is not managed by an operator, there is a slightly risky workaround which the user can use on their own discretion depending upon their use case. Please refer to https://serverfault.com/a/989665 for more details. Kindly note that VMware does not support this option.
- Impact: If a user tries to expand a PVC to a size which may not be supported by the underlying storage system, volume expansion will keep failing and there is no way to recover.
- Issue: kubernetes/enhancements#1790
- Workaround is being tracked at https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1790-recover-resize-failure/README.md