New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orphaned pod found - but volume paths are still present on disk #60987
Comments
|
/sig storage |
|
I have the same issue, except that I can't even remove that directory: |
|
@iliketosneeze When we've run into that issue the only recourse is to unfortunately reboot the host. Once it comes up things seem to be in a clean state. |
|
i also experience this issue using kubernetes v1.10.1. Manually deleting the directory solves the problem. But yes, kubelet should intelligently deal with orphaned pods @iliketosneeze maybe you should try umounting the directory from tmpfs |
|
Same here with minikube : $ kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.1", GitCommit:"d4ab47518836c750f9949b9e0d387f20fb92260b", GitTreeState:"clean", BuildDate:"2018-04-12T14:26:04Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:44:10Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
$ journalctl -f
Apr 23 12:37:26 minikube kubelet[2886]: E0423 12:37:26.781919 2886 kubelet_volumes.go:140] Orphaned pod "a08c2261-3eec-11e8-83b3-a0ea30334065" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
Apr 23 12:37:28 minikube kubelet[2886]: E0423 12:37:28.789802 2886 kubelet_volumes.go:140] Orphaned pod "a08c2261-3eec-11e8-83b3-a0ea30334065" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
# find /var/lib/kubelet/pods/a08c2261-3eec-11e8-83b3-a0ea30334065/containers/kube-proxy/4ac98a82
/var/lib/kubelet/pods/a08c2261-3eec-11e8-83b3-a0ea30334065/containers/kube-proxy/4ac98a82
# ls /var/lib/kubelet/pods/a08c2261-3eec-11e8-83b3-a0ea30334065/volumes/
kubernetes.io~configmap kubernetes.io~secret
$ That looks like an internal pod. I guess there is nothing to remove. |
|
We are seeing a similar issue with our own custom flexvolume in a Kubernetes 1.8.9 cluster. Is there any way to resolve this without restarting the host until there is an actual solution? |
|
@lukmanulhakimd that did help with removing them, but then new volume mounts failed as the host was stuck in uninterruptable I/O. I had to cold cycle to nodes in the end. |
|
I'm having the same issue with kubernetes 1.8.5, rancher 1.6.13, docker-ce17.03.02. |
|
We're also having this issue with Kubernetes 1.9.6, Docker 17.03.1-ce and vSphere Cloud Provider for persistent storage. |
|
Having the same issue with Kubernetes 1.10.2, Docker 18.06.0-ce |
|
Having the same issue with Kubernetes 1.11, Docker 18.06.0-ce and ceph 13.2.1 |
|
Having the same issue with Kubernetes v1.9.0 Docker 1.12.6 and rook master |
|
those who affected, do you see anything interesting when you run kubelet with |
|
I ran with --v=10. |
|
I ran also with version 11 and vSphere dynamic storage provider |
|
Many cloudproviders call generic util helper to unmount and delete directory, it would explain why multiple providers show same symptoms. I assume disks are unmounted , but just a directory is missed behind, right? Directory deletion code is following: kubernetes/pkg/volume/util/util.go Lines 183 to 191 in 217a3d8
and it got to be producing some error or info v4 message immediately after Only way it can decide not to delete directory is if IsLikelyNotMountPoint thinks it is still mounted. Maybe it is containerized kubelet which confuses it , or some |
|
I tried to rerpoduce it by creating/deleting following stateful with single PVC with 50 pods in it all node allocated to a single node, but no luck. Kubelet 1.11.2 , vSphere cloud provider. Pretty much same setup except for kubernetes version was regularly reporting orphaned pods with kubelet 1.10.5 |
|
k8s 1.6.4 docker 1.12.6 E0816 16:22:24.166061 415225 kubelet_volumes.go:114] Orphaned pod "4c4f8eb9-a12d-11e8-b849-c0bfc0a0d6e2" found, but volume paths are still present on disk. |
|
confirmed here also, had to clean the files manually a reboot did not solve this issue. |
|
I observe lot of those errors logged by kubelet in my clusters , this bugs seems to be releated to pods that were using a custom bash-flexvolume plugin ( which mount cifs volumes ), anyway this is very annoing I about 90% of the kubelet logs are those E0823 10:31:01.847946 1303 kubelet_volumes.go:140] Orphaned pod "19a4e3e6-a562-11e8-9a25-309c23027882" found, but volume paths are still present on disk : There were a total of 2 errors similar to this. Turn up verbosity to see them.
E0823 10:31:03.840552 1303 kubelet_volumes.go:140] Orphaned pod "19a4e3e6-a562-11e8-9a25-309c23027882" found, but volume paths are still present on disk : There were a total of 2 errors similar to this. Turn up verbosity to see them.printed every two seconds.. fixing require a manual operation ( or automate a risky rm -Rf operation based on a log line parser ) but this is a poor-man workaround.. while kubelet could/should/must! handle that problem itself |
|
Having the same issue:
|
|
At least version 1.11.2 doesn't have this issue, we stopped seeing this error after an upgrade. |
|
the mountpoint is not umount while remove orphan pod, and RemoveAllOneFilesystem not be executed. the reason why mountpoint not umounted? maybe volume manager have a error process while umount. maybe user service is using directory of mountpoint while umount. or some other reason? I think we can add the umount process while checking with mountpoint left. PR: #68616 |
|
Faced the same problem with datera array and K8s. Rebooting nodes cleared up hung pods. |
|
same issue |
|
same issue
any progress on this issue? |
|
same issue |
|
For the past 2 years...steps to reproduce: run k8s for a while. Ungracefully reboot the server. It will come back up with 1 to a dozen of these, spewing out every second or two: enough to dominate all entries sent from a production cluster to a centralized syslog. |
|
same issue |
|
Same issue in kubernetes v1.20.2 |
|
why this issue is closed? |
|
/reopen |
|
@andyzhangx: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@patrickstjohn: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
thanks! close it. |
|
@andyzhangx: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/close |
|
For those of you (like me) still using an older version & facing this problem. You tried
|
|
A warning to anyone with those older versions still running: avoid invoking |
|
fyi, the fix in #95301 only removes empty directories. This is to address some use cases where the node rebooted and the mounts are gone but the directories remain. It doesn't fix other scenarios that others are mentioning where the directory is still mounted. In those cases, you will need to work with the affected volume plugin owners to figure out what is going wrong that is preventing the volume from getting unmounted. |
|
We had that from below ( a big pile of them i must say ) in our kubelet logs just today, our k8s v1.18.10 seems little outdated but we are stuck at that version for now
|
|
Just leaving this here for further use.. maybe ;) |
|
the same with: |
|
Since we can not solve this problem once and for all. Why are't we report it as an event. Let the user to choose their way to fix this problem in their scene. |
Is this a BUG REPORT or FEATURE REQUEST?:
BUG
What happened:
Kubelet is periodically going into an error state and causing errors with our storage layer (ceph, shared filesystem). Upon cleaning out the orphaned pod directory things eventually right themselves.
rmdir /var/lib/kubelet/pods/*/volumes/*rook/*What you expected to happen:
Kubelet should intelligently deal with orphaned pods. Cleaning a stale directory manually should not be required.
How to reproduce it (as minimally and precisely as possible):
Using rook-0.7.0 (this isn't a rook problem as far as I can tell but this is how we're reproducing):
kubectl create -f rook-operator.yaml
kubectl create -f rook-cluster.yaml
kubectl create -f rook-filesystem.yaml
Mount/write to the shared filesystem and monitor /var/log/messages for the following:
kubelet: E0309 16:46:30.429770 3112 kubelet_volumes.go:128] Orphaned pod "2815f27a-219b-11e8-8a2a-ec0d9a3a445a" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.Anything else we need to know?:
This looks identical to the following: #45464 but for a different plugin.
Environment:
Kubernetes version (use
kubectl version):Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T12:22:21Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}Cloud provider or hardware configuration:
Bare-metal private cloud
OS (e.g. from /etc/os-release):
Red Hat Enterprise Linux Server release 7.4 (Maipo)
Kernel (e.g.
uname -a):Linux 4.4.115-1.el7.elrepo.x86_64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Sat Feb 3 20:11:41 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
kubeadm
The text was updated successfully, but these errors were encountered: