podman with bind mount leaving cgroup debris and prevents container restart #730

aalba6675 · 2018-05-06T02:30:24Z

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

Description

When podman stops a systemd container with bind mounts, it leaves behind a lot of cgroup debris.
This prevents the container from starting the third time.

Steps to reproduce the issue:

Workaround: currently to get working bind mounts I have to set mount --make-private /tmp. Otherwise oci-systemd-hook cannot move the mount to the overlay. This on Fedora 28. Cannot move mount from /tmp/ocitmp.XXXX to .../merged/run projectatomic/oci-systemd-hook#92
Create a systemd-based fedora:28 container

podman create --name bobby_silver -v /srv/docker/volumes/podman/home:/home:z   --env container=podman  --entrypoint=/sbin/init --stop-signal=RTMIN+3 fedora:28

Start container 3 times

podman start bobby_silver
podman stop bobby_silver
podman start bobby_silver
podman stop bobby_silver
podman start bobby_silver
podman stop bobby_silver

Describe the results you received:
After first start/stop cycle there is cgroup debris:

cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0 type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0 type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0 type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

## third time unlucky
unable to start container "bobby_silver": container create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:58: mounting \\\"cgroup\\\" to rootfs \\\"/var/lib/containers/storage/overlay/52f7959a1a8a171b2c8aee587ea81c964e84130681444f0ff03b3202804a91cb/merged\\\" at \\\"/sys/fs/cgroup\\\" caused \\\"stat /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0: no such file or directory\\\"\""

journal

May 06 10:37:05 podman.localdomain kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
May 06 10:37:05 podman.localdomain audit: ANOM_PROMISCUOUS dev=vethd814e8bb prom=256 old_prom=0 auid=1050 uid=0 gid=0 ses=3
May 06 10:37:05 podman.localdomain kernel: IPv6: ADDRCONF(NETDEV_UP): vethd814e8bb: link is not ready
May 06 10:37:05 podman.localdomain kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethd814e8bb: link becomes ready
May 06 10:37:05 podman.localdomain kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
May 06 10:37:05 podman.localdomain kernel: cni0: port 2(vethd814e8bb) entered blocking state
May 06 10:37:05 podman.localdomain kernel: cni0: port 2(vethd814e8bb) entered disabled state
May 06 10:37:05 podman.localdomain kernel: device vethd814e8bb entered promiscuous mode
May 06 10:37:05 podman.localdomain kernel: cni0: port 2(vethd814e8bb) entered blocking state
May 06 10:37:05 podman.localdomain kernel: cni0: port 2(vethd814e8bb) entered forwarding state
May 06 10:37:05 podman.localdomain NetworkManager[1159]: <info>  [1525574225.9736] device (vethd814e8bb): carrier: link connected
May 06 10:37:05 podman.localdomain NetworkManager[1159]: <info>  [1525574225.9747] manager: (vethd814e8bb): new Veth device (/org/freedesktop/NetworkManager/Devices/12)
May 06 10:37:05 podman.localdomain systemd-udevd[18311]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
May 06 10:37:05 podman.localdomain systemd-udevd[18311]: Could not generate persistent MAC address for vethd814e8bb: No such file or directory
May 06 10:37:05 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=89
May 06 10:37:05 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=91
May 06 10:37:05 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=92
May 06 10:37:05 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=93
May 06 10:37:05 podman.localdomain audit: NETFILTER_CFG table=filter family=2 entries=155
May 06 10:37:06 podman.localdomain conmon[18363]: conmon cd8be22a52efaed7e279 <ninfo>: about to waitpid: 18364
May 06 10:37:06 podman.localdomain kernel: SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
May 06 10:37:06 podman.localdomain oci-systemd-hook[18389]: systemdhook <error>: cd8be22a52ef: pid not found in state: Success
May 06 10:37:06 podman.localdomain conmon[18363]: conmon cd8be22a52efaed7e279 <error>: Failed to create container: exit status 1
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=filter family=2 entries=156
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=94
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=96
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=94
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=96
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=10 entries=78
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=10 entries=80
May 06 10:37:06 podman.localdomain kernel: cni0: port 2(vethd814e8bb) entered disabled state
May 06 10:37:06 podman.localdomain audit: ANOM_PROMISCUOUS dev=vethd814e8bb prom=0 old_prom=256 auid=1050 uid=0 gid=0 ses=3
May 06 10:37:06 podman.localdomain kernel: device vethd814e8bb left promiscuous mode
May 06 10:37:06 podman.localdomain kernel: cni0: port 2(vethd814e8bb) entered disabled state
May 06 10:37:06 podman.localdomain NetworkManager[1159]: <info>  [1525574226.1560] device (vethd814e8bb): released from master device cni0
May 06 10:37:06 podman.localdomain gnome-shell[3393]: Removing a network device that was not added
May 06 10:37:06 podman.localdomain gnome-shell[2067]: Removing a network device that was not added
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=94
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=93
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=91

Describe the results you expected:
start/stop without any issue

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

# podman version
Version:       0.5.2-dev
Go Version:    go1.10.1
OS/Arch:       linux/amd64

Output of podman info:

``
host:
MemFree: 17688948736
MemTotal: 33667493888
SwapFree: 0
SwapTotal: 0
arch: amd64
cpus: 8
hostname: podman.localdomain
kernel: 4.16.5-300.fc28.x86_64
os: linux
uptime: 10h 3m 7.91s (Approximately 0.42 days)
insecure registries:
registries: []
registries:
registries:

docker.io
registry.fedoraproject.org
quay.io
registry.access.redhat.com
store:
ContainerStore:
number: 4
GraphDriverName: overlay
GraphOptions:
overlay.override_kernel_check=true
GraphRoot: /var/lib/containers/storage
GraphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "true"
Supports d_type: "true"
ImageStore:
number: 2
RunRoot: /var/run/containers/storage


**Additional environment details (AWS, VirtualBox, physical, etc.):**
* physical
* Fedora 28

The text was updated successfully, but these errors were encountered:

aalba6675 · 2018-05-06T02:31:52Z

After getting into this situation, I can't start even non-bind mounted containers.
alice_gold is a Fedora:28 container without any bind mounts

unable to start container "alice_gold": container create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:58: mounting \\\"cgroup\\\" to rootfs \\\"/var/lib/containers/storage/overlay/bc6d5e48c36118d012e5996158e5f7de7c0bd386b63cda3be40686a90203c019/merged\\\" at \\\"/sys/fs/cgroup\\\" caused \\\"stat /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5/2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5: no such file or directory\\\"\""
: internal libpod error

aalba6675 · 2018-05-06T02:32:42Z

If I don't use bind mounts at all and leave /tmp as mount --make-shared, then all non-bind mount containers work: they can be start/stop without any cgroup debris.

rhatdan · 2018-05-06T12:18:27Z

@mrunalp I think this might be runc not running posthook oci-systemd-hook.

@aalba6675 Could you see if the journal reports that oci-systemd-hook ran in the posthook?

aalba6675 · 2018-05-06T13:03:27Z

@rhatdan The posthook doesn't seem to be run. This is the journal when the container is stopped. Leaving behind a lot of cgroup mounted on impossibly long paths like

/sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
libpod_parent/libpod-
conmon-
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0 type cgroup 
(rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)```

Journal:

May 06 20:58:29 podman.localdomain audit[5949]: SERVICE_STOP pid=5949 uid=0 auid=1050 ses=3 subj=system_u:system_r:container_t:s0:c124,c228 msg='unit=dbus comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 06 20:58:29 podman.localdomain audit[5949]: SERVICE_STOP pid=5949 uid=0 auid=1050 ses=3 subj=system_u:system_r:container_t:s0:c124,c228 msg='unit=systemd-user-sessions comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 06 20:58:29 podman.localdomain audit[6133]: SYSTEM_SHUTDOWN pid=6133 uid=0 auid=1050 ses=3 subj=system_u:system_r:container_t:s0:c124,c228 msg=' comm="systemd-update-utmp" exe="/usr/lib/systemd/systemd-update-utmp" hostname=? addr=? terminal=? res=success'
May 06 20:58:29 podman.localdomain audit[5949]: SERVICE_STOP pid=5949 uid=0 auid=1050 ses=3 subj=system_u:system_r:container_t:s0:c124,c228 msg='unit=systemd-update-utmp comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 06 20:58:29 podman.localdomain audit[5949]: SERVICE_STOP pid=5949 uid=0 auid=1050 ses=3 subj=system_u:system_r:container_t:s0:c124,c228 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 06 20:58:29 podman.localdomain conmon[5938]: conmon cd8be22a52efaed7e279 <ninfo>: container 5949 exited with status 0

Old school (after the container is stopped):

lscgroup | grep pod
cpu,cpuacct:/libpod_parent
cpu,cpuacct:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
cpu,cpuacct:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
pids:/libpod_parent
pids:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
pids:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
cpuset:/libpod_parent
cpuset:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
cpuset:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
hugetlb:/libpod_parent
hugetlb:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
hugetlb:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
memory:/libpod_parent
memory:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
memory:/
libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
blkio:/libpod_parent
blkio:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
blkio:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
freezer:/libpod_parent
freezer:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
freezer:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
net_cls,net_prio:/libpod_parent
net_cls,net_prio:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
net_cls,net_prio:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
devices:/libpod_parent
devices:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
devices:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
perf_event:/libpod_parent
perf_event:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
perf_event:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0

aalba6675 · 2018-05-06T13:05:02Z

Is there a way to manually clear these cgroups/mounts? Once I get into this state I can't start simple containers (i.e. those without bind mounts).

rhatdan · 2018-05-06T13:10:11Z

Can't you just umount them?

aalba6675 · 2018-05-06T13:20:45Z

Trying:

umount -t cgroup $(mount | grep ^cgroup.*libpod | gawk '{print $3}')

I get either

umount: /sys/fs/cgroup/systemd/
libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
: not mounted.

or

umount: /sys/fs/cgroup/systemd/libpod_parent/
libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
: no mount point specified.

aalba6675 · 2018-05-06T13:28:25Z

Another observation: notify_on_release is 0 in the /sys/fs/cgroup/systemd/libpod_parent hierarchy.

mheon · 2018-05-06T20:12:39Z

CGroups issue could be related to #496
#507 may fix it, but has serious issues with Systemd-managed CGroups that still need to be solved.

aalba6675 · 2018-05-07T00:34:39Z

@mheon - any idea why this would be triggered by a kludgy bind mount (mount --make-private /tmp)?

Non-bind mount containers are functioning without leaving group debris .

Once this situation is triggered (lots of cgroup mounted on /sys/fs/cgroup/systemd/libpod_parent/<long non existent path with uuid repeated>), then even non-bind mount containers could no longer be started.

mheon · 2018-05-07T00:47:37Z

@aalba6675 The mounts situation makes it sound like it could be oci-umount-hook firing, and less like our other CGroup issues (though it's wierd you're not seeing CGroups left over even in cases where mounts aren't involved, we should still be leaking one or two). I'm not familiar enough with that hook to know for sure what the cause might be, though.

aalba6675 · 2018-05-07T01:47:44Z

@mheon you are right, I spoke too soon. The cgroups are only visible in the legacy tools libcgroup-tools.

lscgroup | grep libpod_parent
cpu,cpuacct:/libpod_parent
cpu,cpuacct:/libpod_parent/libpod-conmon-9a3f523c5c1055be951b2635c9371372f0de0ead65129da8bbab9c21a63f82e6
cpu,cpuacct:/libpod_parent/libpod-conmon-85b047f7598912928dd50ae435e932d8f6d36340bd29d94378cd76c04c1b9d3a
cpu,cpuacct:/libpod_parent/libpod-conmon-85b047f7598912928dd50ae435e932d8f6d36340bd29d94378cd76c04c1b9d3a/85b047f7598912928dd50ae435e932d8f6d36340bd29d94378cd76c04c1b9d3a
cpu,cpuacct:/libpod_parent/libpod-conmon-f31ddb42076b6f92fa4e3528aa69e3835657bac718a15fa1f21eded28d33df6d
cpu,cpuacct:/libpod_parent/libpod-conmon-9616fc8f2a42ad9276f4b65bf20aa48f1d921a3b56433b7b0c17799c7d48f5df
cpu,cpuacct:/libpod_parent/libpod-conmon-9616fc8f2a42ad9276f4b65bf20aa48f1d921a3b56433b7b0c17799c7d48f5df/9616fc8f2a42ad9276f4b65bf20aa48f1d921a3b56433b7b0c17799c7d48f5df
devices:/libpod_parent
devices:/libpod_parent/libpod-conmon-9a3f523c5c1055be951b2635c9371372f0de0ead65129da8bbab9c21a63f82e6
devices:/libpod_parent/libpod-conmon-85b047f7598912928dd50ae435e932d8f6d36340bd29d94378cd76c04c1b9d3a

There is leaking in /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-* as well. However there are no mounts of type cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-*, so this doesn't prevent these stateless containers from start/stop multiple times.

In the case of bind mounts, there seems to be a recursive loop as the directory libpod-conmon-<uuid>/<uuid> gets repeated libpod-conmon-<uuid>/<uuid>/libpod-conmon-<uuid>/<uuid> etc until the container won't start.

mheon · 2018-05-07T02:07:59Z

The recursive CGroup path seems like a separate bug - our current CGroup issues are mainly a lack of cleanup, whereas this seems to actually be duplicating the Conmon scope repeatedly. Interesting - I'll look at this more tomorrow.

aalba6675 · 2018-05-07T02:27:01Z

Thanks! Let me summarize a reproducer for Fedora 28:

## prolog
mount --make-private /tmp
mkdir -p /volumes/podman/home
semanage fcontext -a -e /var/lib/containers /volumes
restorecon -R /volumes

podman create --name bobby_silver --env container=podman --entrypoint /sbin/init --stop-signal=RTMIN+3  \
    -v /volumes/podman/home:/home:z fedora:28

## now start stop 3 times
podman start bobby_silver; podman stop bobby_silver
podman start bobby_silver; podman stop bobby_silver
podman start bobby_silver; podman stop bobby_silver
# the third time will fail; there will be lots of mount type cgroup leakage, i.e., the cgroup will be mounted on paths like
# /sys/fs/cgroup/systemd/libpod_parent/(libpod-conmon-<uuid>/<uuid>)*
# these leaks cannot be unmounted or cleaned up
# trying to start a different stateless container should fail now

mheon · 2018-05-08T18:04:25Z

To provide an update here, I'm working on a more general overhaul of our CGroup handling now. Hopefully, once it's ready, it will address this and our other CGroup issues.

mheon · 2018-05-09T18:56:13Z

My original thought was that this was the oci-umount hook, given that it seems to only occur with mounts. However, oci-umount does nothing with CGroups, so that doesn't seem likely. oci-systemd-hook does use both mounts and CGroups, so that seems to be a likely candidate.

rhatdan · 2018-05-10T15:17:45Z

Yes this is definately oci-systemd-hook, but the problem, I believe, is runc is not firing the oci-systemd-hook in poststop.

wking · 2018-05-11T16:23:11Z

... the problem, I believe, is runc is not firing the oci-systemd-hook in poststop.

This is probably a reference to opencontainers/runc#1797.

We aren't consuming this yet, but these pkg/hooks changes lay the groundwork for future libpod changes to support post-exit hooks [1,2]. [1]: containers#730 [2]: opencontainers/runc#1797 Signed-off-by: W. Trevor King <wking@tremily.us>

wking · 2018-05-11T20:17:21Z

In some discussion on #podman with @baude and @mhean, the solution to this may be defining a postexit extension stage (groundwork in #758). We wouldn't set those hooks in the OCI config, because the OCI spec contains no post-exit hooks (see also discussion about adding new hooks to the spec in opencontainers/runtime-spec#926), but we could pass them through to conmon via the --exit-command it grew in cri-o/cri-o#1366 (and which we don't actually use yet despite the option having been added to support podman use cases).

We aren't consuming this yet, but these pkg/hooks changes lay the groundwork for future libpod changes to support post-exit hooks [1,2]. [1]: containers#730 [2]: opencontainers/runc#1797 Signed-off-by: W. Trevor King <wking@tremily.us>

aalba6675 · 2018-05-14T13:30:37Z

Hi I have an observation not related to exit/posthook: when the container is merely started there is already a doubled path. We can't blame runc posthook for this so is it oci-systemd-hook that starts with a doubled path mount?

Container is running (not tried to exit at all):

cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

When I stop the container, the doubled path remains mounted but I can now manually unmount both the single/double path from the host.

mheon · 2018-05-14T13:32:16Z

@aalba6675 Is this after a restart (container has already started and stopped once, then started again)?

aalba6675 · 2018-05-14T13:33:23Z

@mheon - no - this is a clean start from boot with the mount --make-private /tmp just to get the bind mounts working.

aalba6675 · 2018-05-14T13:34:15Z

It seems that if I take the trouble to umount <double_path>; umount <single_path>; from the host, the container can be start/stop without any issues. BTW I am using podman from master.

This allows callers to avoid delegating to OCI runtimes for cases where they feel that the runtime hook handling is unreliable [1]. [1]: #730 (comment) Signed-off-by: W. Trevor King <wking@tremily.us> Closes: #855 Approved by: rhatdan

Instead of delegating to the runtime, since some runtimes do not seem to handle these reliably [1]. [1]: containers#730 (comment) Signed-off-by: W. Trevor King <wking@tremily.us>

Instead of delegating to the runtime, since some runtimes do not seem to handle these reliably [1]. [1]: #730 (comment) Signed-off-by: W. Trevor King <wking@tremily.us> Closes: #864 Approved by: rhatdan

mheon · 2018-06-14T20:37:29Z

@aalba6675 Can you try this with Podman 0.6.2 to see if it's fixed? We execute postrun hooks ourselves now, instead of calling out to runc, so cleanup of the mounts should be happening now.

thoraxe · 2018-06-17T00:56:25Z

I was able to cause this problem with 0.6.2 but I'm not sure how to reproduce it or unmount the cgroups...

mheon · 2018-06-18T18:07:10Z

Verified here. We're further than we were before - oci-systemd-hook is actually running. Now it's throwing errors. This may actually be an oci-systemd-hook bug.

aalba6675 · 2018-06-19T12:54:51Z

Sorry for being MIA - just reproduced this on
Version: 0.6.4-dev
Go Version: go1.10.2
OS/Arch: linux/amd64

Things are looking better: @mheon I saw the error message

failed to stop container 6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0: cgroups: unable to remove paths /sys/fs/cgroup/systemd/libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0

The failure is due to the doubled child path: /sys/fs/cgroup/systemd/libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0/ctr/libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0/ctr - so the parent cannot be removed.

@thoraxe - after the pod is stopped you should be able to manually umount the paths on the host; to cleanup I do the following

# podman stop <the_container>
# umount the doubled path
sudo umount /sys/fs/cgroup/systemd/libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0/ctr/libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0/ctr

# now umount the single path
sudo umount /sys/fs/cgroup/systemd/libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0/ctr

mheon · 2018-06-19T13:01:34Z

I spent some more time debugging this yesterday. New conclusions:

oci-systemd-hook is being correctly called and is running, but incorrectly has a dependency on the PID part of state and will refuse to start if it is not set in the post-hook. This is nonsensical, as postdelete hooks run after the container has definitely stopped and does not have a PID. Fix for this will need to be in oci-systemd-hook
However, oci-systemd-hook is not responsible for performing cleanup of the CGroup mounts we see leaking. The code assumes that they will be cleaned up when the container's mount namespace closes. This assumption does not seem to be holding true for cases where the container has volume mounts.

rhatdan · 2018-06-19T13:18:53Z

Matt did you attempt to remove the pid failure, and it still failed to find the directory or was this just on a non systemd container

mheon · 2018-06-19T13:22:42Z

@rhatdan I do believe we had another error, let me see what that was

mheon · 2018-06-19T13:25:19Z

@rhatdan
Jun 19 09:23:46 devel.lldp.net oci-systemd-hook[4989]: systemdhook <error>: 585a70c1e938: Failed to open config file: /var/lib/containers/storage/overlay-containers/585a70c1e938ba2aa7a4613f4af212b1f6bb82d9e80dea43d8445d99ea1b00cd/userdata/config.json: No such file or directory

It's trying to hit c/storage config when c/storage has already deleted the container.

rhatdan · 2018-06-19T15:13:14Z

So in this case everything was cleaned up correct?

mheon · 2018-06-19T15:19:39Z

It looks like everything is being cleaned up with no volume mounts present, but I'm not 100% sure, given that oci-systemd-hook is still throwing errors. With volume mounts, I can't get systemd to do anything except exit instantly, so I think I must be doing something wrong.

mheon · 2018-06-19T15:21:08Z

I can confirm that, while the container is exiting instantly if mounts are present, it is leaving mounts lying around in /tmp.
tmpfs on /tmp/ocitmp.Rvr8xj type tmpfs (rw,nosuid,nodev,relatime,context="system_u:object_r:container_file_t:s0:c172,c562",size=65536k,mode=755) tmpfs on /tmp/ocitmp.Rvr8xj/.containerenv type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /tmp/ocitmp.Rvr8xj/secrets type tmpfs (rw,nosuid,nodev,seclabel,mode=755)

aalba6675 · 2018-06-20T00:56:18Z

@mheon I think you are seeing cascading effects of bugs:

The /tmp stuff is due to the default mount --make-shared on /tmp; the container startup bails leaking the ocitmp mount. Reported here projectatomic/oci-systemd-hook#92. Are you using a host (F28, silverblue) that is make-shared by default on /tmp? Again this seems to happen only with volume mounts.

#893 - may also be a duplicate of this bug; it you don't clean up the cgroups mount manually on the host before your ExecReload, the cgroup will leak and you will see the "explosion".

mheon · 2018-06-20T01:04:30Z

@aalba6675 I think #893 is probably separate, as the CGroups mounts aren't being created there - oci-systemd-hook will only run on containers with init set as command, which is not true for the containers there. Therefore, we shouldn't have any CGroup mounts present at all there. I think that might be a result of our CGroup handling conflicting with systemd's.

On /tmp - my development VM is still on F27, but /tmp is definitely shared propagation. So I think I am indeed hitting that bug.

aalba6675 · 2018-06-20T01:05:39Z

A workaround is to patch the template to /tmp/oci/ocitmp.XXXX and have /tmp/oci --make-private.

rhatdan · 2018-06-20T12:14:04Z

First off we should probably not use /tmp at all, except for when we are doing this as a non privileged user. We should be doing this under /run/libpod

aalba6675 · 2018-06-20T12:51:22Z

On Fedora 28 /run has shared propagation, so this would at least need a tmpfs at /run/libpod/tmp with private propagation then use a template like /run/libpod/tmp/ocitmp.XXXXXX.

rhatdan · 2018-06-20T12:56:11Z

How about projectatomic/oci-systemd-hook#98 to fix this?

rhatdan · 2018-06-20T12:56:24Z

Could you try it out.

rhatdan · 2018-07-12T18:33:59Z

@aalba6675 Could you checkout podman-0.7.1 which added podman container cleanup which could fix some of these issues.

aalba6675 changed the title ~~podman with bind mount leaving cgroup debris~~ podman with bind mount leaving cgroup debris and prevents container restart May 6, 2018

aalba6675 mentioned this issue May 6, 2018

Cannot move mount from /tmp/ocitmp.XXXX to .../merged/run projectatomic/oci-systemd-hook#92

Open

mheon mentioned this issue May 9, 2018

CGroup Handling Enhancements #507

Closed

wking mentioned this issue May 11, 2018

hooks: Add package support for extension stages #758

Closed

wking mentioned this issue May 31, 2018

libpod: Execute poststop hooks locally #864

Closed

rhatdan closed this as completed Jul 17, 2018

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023

podman with bind mount leaving cgroup debris and prevents container restart #730

podman with bind mount leaving cgroup debris and prevents container restart #730

Comments

aalba6675 commented May 6, 2018 • edited Loading

aalba6675 commented May 6, 2018 • edited Loading

aalba6675 commented May 6, 2018

rhatdan commented May 6, 2018

aalba6675 commented May 6, 2018 • edited Loading

aalba6675 commented May 6, 2018

rhatdan commented May 6, 2018

aalba6675 commented May 6, 2018

aalba6675 commented May 6, 2018

mheon commented May 6, 2018

aalba6675 commented May 7, 2018

mheon commented May 7, 2018

aalba6675 commented May 7, 2018 • edited Loading

mheon commented May 7, 2018

aalba6675 commented May 7, 2018

mheon commented May 8, 2018

mheon commented May 9, 2018

rhatdan commented May 10, 2018

wking commented May 11, 2018

wking commented May 11, 2018

aalba6675 commented May 14, 2018 • edited Loading

mheon commented May 14, 2018

aalba6675 commented May 14, 2018

aalba6675 commented May 14, 2018 • edited Loading

mheon commented Jun 14, 2018

thoraxe commented Jun 17, 2018

mheon commented Jun 18, 2018

aalba6675 commented Jun 19, 2018

mheon commented Jun 19, 2018

rhatdan commented Jun 19, 2018

mheon commented Jun 19, 2018

mheon commented Jun 19, 2018

rhatdan commented Jun 19, 2018

mheon commented Jun 19, 2018

mheon commented Jun 19, 2018

aalba6675 commented Jun 20, 2018 • edited Loading

mheon commented Jun 20, 2018

aalba6675 commented Jun 20, 2018 • edited Loading

rhatdan commented Jun 20, 2018

aalba6675 commented Jun 20, 2018

rhatdan commented Jun 20, 2018

rhatdan commented Jun 20, 2018

rhatdan commented Jul 12, 2018

aalba6675 commented May 6, 2018 •

edited

Loading

aalba6675 commented May 6, 2018 •

edited

Loading

aalba6675 commented May 6, 2018 •

edited

Loading

aalba6675 commented May 7, 2018 •

edited

Loading

aalba6675 commented May 14, 2018 •

edited

Loading

aalba6675 commented May 14, 2018 •

edited

Loading

aalba6675 commented Jun 20, 2018 •

edited

Loading

aalba6675 commented Jun 20, 2018 •

edited

Loading