Nested data volumes (e.g. /secrets & /secrets/more) cause unexpected behavior #57421
Labels
kind/bug
Categorizes issue or PR as related to a bug.
sig/node
Categorizes an issue or PR as relevant to SIG Node.
sig/storage
Categorizes an issue or PR as relevant to SIG Storage.
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
Data volumes backed by the API such as secret, configmap, downwardAPI and projected volumes, appear to do strange things when nested. Since each mounted volume is intended to correspond 1:1 with the API object data. Any change to the data causes the kubelet to load a new version of the data and to replace all of the old volume data with the newly-loaded data.
When the kubelet asks Docker to mount the nested volume into the base volume, Docker creates a mount point in the base directory upon which it can mount the nested volume.
The kubelet views the existence of the mount point directory in the base directory as a change that puts the volume out of sync with the data.
The kublet attempts to remove the old versions of the secrets and re-symlink the data.
a. on an old-enough kernel (e.g. 3.10.0-514.21.1.el7.x86_64), the remove attempt fails because the directory is an in-use mount point and gives an error of "Device or resource busy".
b. on a newer kernel (e.g. 3.10.0-693.5.2.el7.x86_64), the remove succeeds despite the directory being an active mount point and the nested mount is no longer visible from within the container, despite its being mounted.
On the old kernel, the kubelet continuously repeats step 3 because every re-sync iteration makes it think that it needs a refresh due to the constant presence of the mountpoint directory. Additionally, every iteration leaves behind an old copy of the data since an error is encountered prior to its removal. For example, you might end up with 100 directories that look like
/secrets/..YYYY_MM_DD_HH_MM_SS.XXXXXXXXX
after a few hours.What you expected to happen:
As long as the name of a nested volume doesn't interfere with the name of a data item from a parent volume, I would expect the nested volume to work.
How to reproduce it (as minimally and precisely as possible):
yaml
file with 2 secrets and a pod that makes volumes for the 2 secrets, with one of them mounted within the other:pod.yaml
:If you have a new kernel,
/secrets/more/secret
won't exist. If you have an old kernel, it will exist but you should see several timestamp directories in/secrets
(one for each sync iteration) that aren't being cleaned up.Environment:
kubectl version
):The text was updated successfully, but these errors were encountered: