New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Number of "loaded inactive dead" systemd transient mount units continues to grow #57345
Comments
/sig storage |
CC @kubernetes/sig-storage-bugs |
Mitigation: I can verify that rebooting the machine ("sudo shutdown -r") cleans up the transient mounts. |
I ran some tests locally and systemd considers these commands as mount units and hence they show up in |
@saad-ali was this cluster started via |
I have opened a PR to change the default for local clusters - #57355 |
is kubelet started by systemd too? what's the resource limit in kubelet unit file? |
@gnufied No, I was able to repro against a standard GKE/GCE cluster. Once the secret is unmounted the unit appears to stick around as |
@saad-ali do you still have that cluster around? can you check if those secrets are still mounted or they just show up in |
They don't appear to be mounted, they just show up in the Example entry in:
And no associated entry in mount table: $ mount | grep -i "4cb56507-e42f-11e7-a1b6-42010a80022d" The number of entries in the mount table remains static:
|
Also, I'm seeing the systemd transient units growing uncontrolled in Kubernetes 1.6 as well. Over the course of an hour:
So my hypothesis is that this has been happening for a while, but the reason it's becoming an issue now is that in k8s 1.8+ (with PR #49640), all k8s mount operations are executed as scoped transient units and once the max units is hit, all subsequent kubernetes triggered mounts fail. |
It might be a platform issue - @mtanino tested it with 16K transit mount on Fedora 25 without hitting this issue. mkdir -p /mnt/test; for i in $(seq 1 16384); do echo $i; mkdir /mnt/test/$i; systemd-run --scope -- mount -t tmpfs tmpfs /mnt/test/$i; done |
It would be better to test on different platforms with the full Kubernetes stack, ie the cron job example. It could be something in Kubernetes or docker that is causing the leak. |
I think this is indeed GKE/GCE problem.. I can confirm that, recursively bind mounting a directory with shared option in another place and then mounting tmpfs inside the directory causes tmpfs to propagate to bind mount directory as well (and you will have 2 systemd-unit for per mount). But on umount, both systemd units are removed from systemd-unit listing. So the bug isn't entirely because of rootfs being mounted in multiple places. But it does exaceberate the problems somewhat because for each mount you have 2 systemd units being created. GCE image also appears to be using overlay2 and has a weird bind mount of /var/lib/docker on itself.. Things to investigate next: In isolation check if somehow this is related to overlay2. The mount error that @saad-ali posted above is because of the way layers are mounted in overlay2. overlay2 uses symlinks to reduce number of arguments supplied to mount command. |
@msau42 I tested full stack locally on Ubuntu and then full stack on EBS and I can't reproduce the problem. |
The limit based on what @wonderfly dug up is 131072 transient units (https://github.com/systemd/systemd/blob/v232/src/core/unit.c#L229). So you won't hit the issue with 16k units. That said, it does look like the $ sudo mkdir -p /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/testmnt/alpha; sudo systemd-run --scope -- /home/kubernetes/containerized_mounter/mounter mount -t tmpfs tmpfs /var/lib/kubelet/testmnt/alpha
Running scope as unit: run-r5bde6edc9a5d4529bae2a560d81c8025.scope
$ systemctl list-units --all | grep -i "testmnt"
home-kubernetes-containerized_mounter-rootfs-var-lib-kubelet-testmnt-alpha.mount loaded inactive dead /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/testmnt/alpha
var-lib-kubelet-testmnt-alpha.mount loaded inactive dead /var/lib/kubelet/testmnt/alpha
$ mount | grep -i "testmnt"
tmpfs on /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/testmnt/alpha type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/testmnt/alpha type tmpfs (rw,relatime)
tmpfs on /var/lib/kubelet/testmnt/alpha type tmpfs (rw,relatime)
tmpfs on /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/testmnt/alpha type tmpfs (rw,relatime)
$ sudo umount /var/lib/kubelet/testmnt/alpha/
$ mount | grep -i "testmnt"
$ systemctl list-units --all | grep -i "testmnt"
home-kubernetes-containerized_mounter-rootfs-var-lib-kubelet-testmnt-alpha.mount loaded inactive dead /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/testmnt/alpha
var-lib-kubelet-testmnt-alpha.mount loaded inactive dead /var/lib/kubelet/testmnt/alpha Mounts created directly with the host mount utility do no appear to have the same issue. |
@gnufied pointed out offline that this is a bit misleading:
So to be clear the problem is with the way the |
To expand on this, the mount does not need to created
Any mounts created in
Mounts created outside those directories do not appear to have this issue:
These directories are both set up with shared mount propagation:
Will follow up with COS team. |
BTW an easier mitigation might be |
Ya, that's the mitigation we are using at the moment. |
Is your expected command line like this?
|
@mtanino actually, just the /mnt/test directory should be bind mounted shared. The underlying mounts are normal mounts. |
Hi, good catch, I'm seeing this in my cluster too:
The problem is that this is starving systemd resources, causing errors like
It seems that waiting a bit allows the daemon-reload to work though:
Another consequence is socket-activated services, like sshd in CoreOS Container Linux, can't start anymore. [edit] some references that I think are in the same error class: |
Sorry for my late response.
Here is my test results. Tried same steps with @saad-ali but I didn't see loaded inactive dead.
Also I tried shared bind mount for 32767 mount points but also didn't see the problem.
|
Just to confirm this is still happening on GKE 1.8.5 nodes. |
At this point, we believe this to be a systemd issue tracked by systemd/systemd#7798:
The suggested mitigation is to run Once it is fixed in systemd, the changes must be picked up in the OS you are using. |
For the record, the systemd issue has been addressed with systemd/systemd#7811 and included in the tag v237. |
Does anyone know when systemd would be updated on GKE? We have that issue every second month :( |
If I'm not mistaken, update has been done in GKE, at least since version |
Thank you @honnix, but we have v1.9.2-gke.1 and just got an issue today :( |
Interesting, then I'm not sure. Regression? Or maybe ported to later 1.9.x than 1.9.2? |
Yeah, would try to migrate to the latest one today and see how it behaves, thank you! |
@honnix this is fixed in gke 1.9.3+ |
oops and @artemyarulin ^ |
@msau42 This is awesome, thank you! Doing migration now :) |
I had the issue on Azure with kubernetes 1.9.9. i restarted the node and issue disappeared. It might appear again after a few days. |
@martinnaughton You need to upgrade your systemd version to v237+ or patch current systemd with change from systemd PR 7811. |
I have tested command "systemctl daemon-reload",it don't work. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
The number of systemd transient mount units continues to grow unchecked on nodes.
I see a massive number of
loaded inactive dead
secret transient mount units, for example:We suspect (but have not yet verified) that once there are too many transient mount units, subsequent mounts will fail with:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
I left my one node cluster running over the weekend with a single cron job (based on the example cronjob in https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/).
On my test machine I saw the following on Friday evening:
And the following on Monday morning:
Anything else we need to know?:
Suspect PR #49640 which forces mounts to run in their own systemd scope. It went in to k8s 1.8, so all versions of 1.8+ running systemd may be affected.
CC @jsafrane @derekwaynecarr
Environment:
kubectl version
):uname -a
):The text was updated successfully, but these errors were encountered: