Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
rbd mount fencing #10563
This aims to fix #10462.
In event of multiple pods mounting the same rbd volume, only one can succeed. This avoids multiple pod writing to the rbd volume and corrupt data.
rbd mount fencing uses rbd lock add|remove|list. Kubelet mounter checks if the rbd image has a lock, if lock exists and is held by others, then no more mount and kubelet fails the pod. Read-only mount can still go without checking lock.
Still running tests to rule out regression. Big appreciation for volunteers helping me test it.
NEW: If multiple Pods use the same RBD volume in read-write mode, it is possible data on the RBD volume could get corrupted. This problem has been found in environments where both apiserver and etcd rebooted and Pod were redistributed.
A workaround is to ensure there is no other Ceph client uses the RBD volume before mapping RBD image in read-write mode. For example,
added a commit
this pull request
Jul 24, 2015
Jul 24, 2015
@rootfs Who is supposed to unlock the volume once the node/minion dies?
My pod was scheduled to node A and was running. I have shut down node A to simulate a crash and the pod got rescheduled to node B. Node B complains that the volume is locked and won't start my pod: "Error syncing pod, skipping: rbd: image pv1 is locked by other nodes".
Having to manually unlock all ceph volumes once a node dies would not be funny.
@maklemenz the node that locks the rbd volume should unlock it after the pod is deleted. In your situation the node went away and is thus unable to unlock the rbd. This is not ideal but at least prevent concurrent access to the same rbd and potential data corruption.
Discussions at #6084 may yield a blueprint for a generic solution.