Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman with bind mount leaving cgroup debris and prevents container restart #730

Closed
aalba6675 opened this issue May 6, 2018 · 56 comments
Closed
Assignees
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@aalba6675
Copy link

aalba6675 commented May 6, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

Description

When podman stops a systemd container with bind mounts, it leaves behind a lot of cgroup debris.
This prevents the container from starting the third time.

Steps to reproduce the issue:

  1. Workaround: currently to get working bind mounts I have to set mount --make-private /tmp. Otherwise oci-systemd-hook cannot move the mount to the overlay. This on Fedora 28. Cannot move mount from /tmp/ocitmp.XXXX to .../merged/run projectatomic/oci-systemd-hook#92

  2. Create a systemd-based fedora:28 container

podman create --name bobby_silver -v /srv/docker/volumes/podman/home:/home:z   --env container=podman  --entrypoint=/sbin/init --stop-signal=RTMIN+3 fedora:28
  1. Start container 3 times
podman start bobby_silver
podman stop bobby_silver
podman start bobby_silver
podman stop bobby_silver
podman start bobby_silver
podman stop bobby_silver

Describe the results you received:
After first start/stop cycle there is cgroup debris:

cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0 type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0 type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0 type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
## third time unlucky
unable to start container "bobby_silver": container create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:58: mounting \\\"cgroup\\\" to rootfs \\\"/var/lib/containers/storage/overlay/52f7959a1a8a171b2c8aee587ea81c964e84130681444f0ff03b3202804a91cb/merged\\\" at \\\"/sys/fs/cgroup\\\" caused \\\"stat /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0: no such file or directory\\\"\""

journal

May 06 10:37:05 podman.localdomain kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
May 06 10:37:05 podman.localdomain audit: ANOM_PROMISCUOUS dev=vethd814e8bb prom=256 old_prom=0 auid=1050 uid=0 gid=0 ses=3
May 06 10:37:05 podman.localdomain kernel: IPv6: ADDRCONF(NETDEV_UP): vethd814e8bb: link is not ready
May 06 10:37:05 podman.localdomain kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethd814e8bb: link becomes ready
May 06 10:37:05 podman.localdomain kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
May 06 10:37:05 podman.localdomain kernel: cni0: port 2(vethd814e8bb) entered blocking state
May 06 10:37:05 podman.localdomain kernel: cni0: port 2(vethd814e8bb) entered disabled state
May 06 10:37:05 podman.localdomain kernel: device vethd814e8bb entered promiscuous mode
May 06 10:37:05 podman.localdomain kernel: cni0: port 2(vethd814e8bb) entered blocking state
May 06 10:37:05 podman.localdomain kernel: cni0: port 2(vethd814e8bb) entered forwarding state
May 06 10:37:05 podman.localdomain NetworkManager[1159]: <info>  [1525574225.9736] device (vethd814e8bb): carrier: link connected
May 06 10:37:05 podman.localdomain NetworkManager[1159]: <info>  [1525574225.9747] manager: (vethd814e8bb): new Veth device (/org/freedesktop/NetworkManager/Devices/12)
May 06 10:37:05 podman.localdomain systemd-udevd[18311]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
May 06 10:37:05 podman.localdomain systemd-udevd[18311]: Could not generate persistent MAC address for vethd814e8bb: No such file or directory
May 06 10:37:05 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=89
May 06 10:37:05 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=91
May 06 10:37:05 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=92
May 06 10:37:05 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=93
May 06 10:37:05 podman.localdomain audit: NETFILTER_CFG table=filter family=2 entries=155
May 06 10:37:06 podman.localdomain conmon[18363]: conmon cd8be22a52efaed7e279 <ninfo>: about to waitpid: 18364
May 06 10:37:06 podman.localdomain kernel: SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
May 06 10:37:06 podman.localdomain oci-systemd-hook[18389]: systemdhook <error>: cd8be22a52ef: pid not found in state: Success
May 06 10:37:06 podman.localdomain conmon[18363]: conmon cd8be22a52efaed7e279 <error>: Failed to create container: exit status 1
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=filter family=2 entries=156
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=94
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=96
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=94
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=96
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=10 entries=78
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=10 entries=80
May 06 10:37:06 podman.localdomain kernel: cni0: port 2(vethd814e8bb) entered disabled state
May 06 10:37:06 podman.localdomain audit: ANOM_PROMISCUOUS dev=vethd814e8bb prom=0 old_prom=256 auid=1050 uid=0 gid=0 ses=3
May 06 10:37:06 podman.localdomain kernel: device vethd814e8bb left promiscuous mode
May 06 10:37:06 podman.localdomain kernel: cni0: port 2(vethd814e8bb) entered disabled state
May 06 10:37:06 podman.localdomain NetworkManager[1159]: <info>  [1525574226.1560] device (vethd814e8bb): released from master device cni0
May 06 10:37:06 podman.localdomain gnome-shell[3393]: Removing a network device that was not added
May 06 10:37:06 podman.localdomain gnome-shell[2067]: Removing a network device that was not added
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=94
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=93
May 06 10:37:06 podman.localdomain audit: NETFILTER_CFG table=nat family=2 entries=91

Describe the results you expected:
start/stop without any issue

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

# podman version
Version:       0.5.2-dev
Go Version:    go1.10.1
OS/Arch:       linux/amd64


Output of podman info:

``
host:
MemFree: 17688948736
MemTotal: 33667493888
SwapFree: 0
SwapTotal: 0
arch: amd64
cpus: 8
hostname: podman.localdomain
kernel: 4.16.5-300.fc28.x86_64
os: linux
uptime: 10h 3m 7.91s (Approximately 0.42 days)
insecure registries:
registries: []
registries:
registries:

  • docker.io
  • registry.fedoraproject.org
  • quay.io
  • registry.access.redhat.com
    store:
    ContainerStore:
    number: 4
    GraphDriverName: overlay
    GraphOptions:
  • overlay.override_kernel_check=true
    GraphRoot: /var/lib/containers/storage
    GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    ImageStore:
    number: 2
    RunRoot: /var/run/containers/storage

**Additional environment details (AWS, VirtualBox, physical, etc.):**
* physical
* Fedora 28
@aalba6675
Copy link
Author

aalba6675 commented May 6, 2018

After getting into this situation, I can't start even non-bind mounted containers.
alice_gold is a Fedora:28 container without any bind mounts

unable to start container "alice_gold": container create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:58: mounting \\\"cgroup\\\" to rootfs \\\"/var/lib/containers/storage/overlay/bc6d5e48c36118d012e5996158e5f7de7c0bd386b63cda3be40686a90203c019/merged\\\" at \\\"/sys/fs/cgroup\\\" caused \\\"stat /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5/2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5: no such file or directory\\\"\""
: internal libpod error

@aalba6675
Copy link
Author

If I don't use bind mounts at all and leave /tmp as mount --make-shared, then all non-bind mount containers work: they can be start/stop without any cgroup debris.

@aalba6675 aalba6675 changed the title podman with bind mount leaving cgroup debris podman with bind mount leaving cgroup debris and prevents container restart May 6, 2018
@rhatdan
Copy link
Member

rhatdan commented May 6, 2018

@mrunalp I think this might be runc not running posthook oci-systemd-hook.

@aalba6675 Could you see if the journal reports that oci-systemd-hook ran in the posthook?

@aalba6675
Copy link
Author

aalba6675 commented May 6, 2018

@rhatdan The posthook doesn't seem to be run. This is the journal when the container is stopped. Leaving behind a lot of cgroup mounted on impossibly long paths like

/sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
libpod_parent/libpod-
conmon-
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0 type cgroup 
(rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)```

Journal:

May 06 20:58:29 podman.localdomain audit[5949]: SERVICE_STOP pid=5949 uid=0 auid=1050 ses=3 subj=system_u:system_r:container_t:s0:c124,c228 msg='unit=dbus comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 06 20:58:29 podman.localdomain audit[5949]: SERVICE_STOP pid=5949 uid=0 auid=1050 ses=3 subj=system_u:system_r:container_t:s0:c124,c228 msg='unit=systemd-user-sessions comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 06 20:58:29 podman.localdomain audit[6133]: SYSTEM_SHUTDOWN pid=6133 uid=0 auid=1050 ses=3 subj=system_u:system_r:container_t:s0:c124,c228 msg=' comm="systemd-update-utmp" exe="/usr/lib/systemd/systemd-update-utmp" hostname=? addr=? terminal=? res=success'
May 06 20:58:29 podman.localdomain audit[5949]: SERVICE_STOP pid=5949 uid=0 auid=1050 ses=3 subj=system_u:system_r:container_t:s0:c124,c228 msg='unit=systemd-update-utmp comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 06 20:58:29 podman.localdomain audit[5949]: SERVICE_STOP pid=5949 uid=0 auid=1050 ses=3 subj=system_u:system_r:container_t:s0:c124,c228 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 06 20:58:29 podman.localdomain conmon[5938]: conmon cd8be22a52efaed7e279 <ninfo>: container 5949 exited with status 0

Old school (after the container is stopped):

lscgroup | grep pod
cpu,cpuacct:/libpod_parent
cpu,cpuacct:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
cpu,cpuacct:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
pids:/libpod_parent
pids:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
pids:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
cpuset:/libpod_parent
cpuset:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
cpuset:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
hugetlb:/libpod_parent
hugetlb:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
hugetlb:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
memory:/libpod_parent
memory:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
memory:/
libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
blkio:/libpod_parent
blkio:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
blkio:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
freezer:/libpod_parent
freezer:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
freezer:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
net_cls,net_prio:/libpod_parent
net_cls,net_prio:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
net_cls,net_prio:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
devices:/libpod_parent
devices:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
devices:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
perf_event:/libpod_parent
perf_event:/libpod_parent/libpod-conmon-2a5aee03fd0d0c99666543c00c5943a507aee7a6c6cbf932a714e3d090a85ea5
perf_event:/libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0

@aalba6675
Copy link
Author

Is there a way to manually clear these cgroups/mounts? Once I get into this state I can't start simple containers (i.e. those without bind mounts).

@rhatdan
Copy link
Member

rhatdan commented May 6, 2018

Can't you just umount them?

@aalba6675
Copy link
Author

Trying:

umount -t cgroup $(mount | grep ^cgroup.*libpod | gawk '{print $3}')

I get either

umount: /sys/fs/cgroup/systemd/
libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
: not mounted.

or

umount: /sys/fs/cgroup/systemd/libpod_parent/
libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
libpod_parent/libpod-conmon-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/
cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0
: no mount point specified.

@aalba6675
Copy link
Author

Another observation: notify_on_release is 0 in the /sys/fs/cgroup/systemd/libpod_parent hierarchy.

@mheon
Copy link
Member

mheon commented May 6, 2018

CGroups issue could be related to #496
#507 may fix it, but has serious issues with Systemd-managed CGroups that still need to be solved.

@aalba6675
Copy link
Author

@mheon - any idea why this would be triggered by a kludgy bind mount (mount --make-private /tmp)?

Non-bind mount containers are functioning without leaving group debris .

Once this situation is triggered (lots of cgroup mounted on /sys/fs/cgroup/systemd/libpod_parent/<long non existent path with uuid repeated>), then even non-bind mount containers could no longer be started.

@mheon
Copy link
Member

mheon commented May 7, 2018

@aalba6675 The mounts situation makes it sound like it could be oci-umount-hook firing, and less like our other CGroup issues (though it's wierd you're not seeing CGroups left over even in cases where mounts aren't involved, we should still be leaking one or two). I'm not familiar enough with that hook to know for sure what the cause might be, though.

@aalba6675
Copy link
Author

aalba6675 commented May 7, 2018

@mheon you are right, I spoke too soon. The cgroups are only visible in the legacy tools libcgroup-tools.

lscgroup | grep libpod_parent
cpu,cpuacct:/libpod_parent
cpu,cpuacct:/libpod_parent/libpod-conmon-9a3f523c5c1055be951b2635c9371372f0de0ead65129da8bbab9c21a63f82e6
cpu,cpuacct:/libpod_parent/libpod-conmon-85b047f7598912928dd50ae435e932d8f6d36340bd29d94378cd76c04c1b9d3a
cpu,cpuacct:/libpod_parent/libpod-conmon-85b047f7598912928dd50ae435e932d8f6d36340bd29d94378cd76c04c1b9d3a/85b047f7598912928dd50ae435e932d8f6d36340bd29d94378cd76c04c1b9d3a
cpu,cpuacct:/libpod_parent/libpod-conmon-f31ddb42076b6f92fa4e3528aa69e3835657bac718a15fa1f21eded28d33df6d
cpu,cpuacct:/libpod_parent/libpod-conmon-9616fc8f2a42ad9276f4b65bf20aa48f1d921a3b56433b7b0c17799c7d48f5df
cpu,cpuacct:/libpod_parent/libpod-conmon-9616fc8f2a42ad9276f4b65bf20aa48f1d921a3b56433b7b0c17799c7d48f5df/9616fc8f2a42ad9276f4b65bf20aa48f1d921a3b56433b7b0c17799c7d48f5df
devices:/libpod_parent
devices:/libpod_parent/libpod-conmon-9a3f523c5c1055be951b2635c9371372f0de0ead65129da8bbab9c21a63f82e6
devices:/libpod_parent/libpod-conmon-85b047f7598912928dd50ae435e932d8f6d36340bd29d94378cd76c04c1b9d3a

There is leaking in /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-* as well. However there are no mounts of type cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-conmon-*, so this doesn't prevent these stateless containers from start/stop multiple times.

In the case of bind mounts, there seems to be a recursive loop as the directory libpod-conmon-<uuid>/<uuid> gets repeated libpod-conmon-<uuid>/<uuid>/libpod-conmon-<uuid>/<uuid> etc until the container won't start.

@mheon
Copy link
Member

mheon commented May 7, 2018

The recursive CGroup path seems like a separate bug - our current CGroup issues are mainly a lack of cleanup, whereas this seems to actually be duplicating the Conmon scope repeatedly. Interesting - I'll look at this more tomorrow.

@aalba6675
Copy link
Author

Thanks! Let me summarize a reproducer for Fedora 28:

## prolog
mount --make-private /tmp
mkdir -p /volumes/podman/home
semanage fcontext -a -e /var/lib/containers /volumes
restorecon -R /volumes

podman create --name bobby_silver --env container=podman --entrypoint /sbin/init --stop-signal=RTMIN+3  \
    -v /volumes/podman/home:/home:z fedora:28

## now start stop 3 times
podman start bobby_silver; podman stop bobby_silver
podman start bobby_silver; podman stop bobby_silver
podman start bobby_silver; podman stop bobby_silver
# the third time will fail; there will be lots of mount type cgroup leakage, i.e., the cgroup will be mounted on paths like
# /sys/fs/cgroup/systemd/libpod_parent/(libpod-conmon-<uuid>/<uuid>)*
# these leaks cannot be unmounted or cleaned up
# trying to start a different stateless container should fail now

@mheon
Copy link
Member

mheon commented May 8, 2018

To provide an update here, I'm working on a more general overhaul of our CGroup handling now. Hopefully, once it's ready, it will address this and our other CGroup issues.

@mheon
Copy link
Member

mheon commented May 9, 2018

My original thought was that this was the oci-umount hook, given that it seems to only occur with mounts. However, oci-umount does nothing with CGroups, so that doesn't seem likely. oci-systemd-hook does use both mounts and CGroups, so that seems to be a likely candidate.

@rhatdan
Copy link
Member

rhatdan commented May 10, 2018

Yes this is definately oci-systemd-hook, but the problem, I believe, is runc is not firing the oci-systemd-hook in poststop.

@wking
Copy link
Contributor

wking commented May 11, 2018

... the problem, I believe, is runc is not firing the oci-systemd-hook in poststop.

This is probably a reference to opencontainers/runc#1797.

wking added a commit to wking/libpod that referenced this issue May 11, 2018
We aren't consuming this yet, but these pkg/hooks changes lay the
groundwork for future libpod changes to support post-exit hooks [1,2].

[1]: containers#730
[2]: opencontainers/runc#1797

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/libpod that referenced this issue May 11, 2018
We aren't consuming this yet, but these pkg/hooks changes lay the
groundwork for future libpod changes to support post-exit hooks [1,2].

[1]: containers#730
[2]: opencontainers/runc#1797

Signed-off-by: W. Trevor King <wking@tremily.us>
@wking
Copy link
Contributor

wking commented May 11, 2018

In some discussion on #podman with @baude and @mhean, the solution to this may be defining a postexit extension stage (groundwork in #758). We wouldn't set those hooks in the OCI config, because the OCI spec contains no post-exit hooks (see also discussion about adding new hooks to the spec in opencontainers/runtime-spec#926), but we could pass them through to conmon via the --exit-command it grew in cri-o/cri-o#1366 (and which we don't actually use yet despite the option having been added to support podman use cases).

wking added a commit to wking/libpod that referenced this issue May 12, 2018
We aren't consuming this yet, but these pkg/hooks changes lay the
groundwork for future libpod changes to support post-exit hooks [1,2].

[1]: containers#730
[2]: opencontainers/runc#1797

Signed-off-by: W. Trevor King <wking@tremily.us>
@aalba6675
Copy link
Author

aalba6675 commented May 14, 2018

Hi I have an observation not related to exit/posthook: when the container is merely started there is already a doubled path. We can't blame runc posthook for this so is it oci-systemd-hook that starts with a doubled path mount?

Container is running (not tried to exit at all):

cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)
cgroup on /sys/fs/cgroup/systemd/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr/libpod_parent/libpod-cd8be22a52efaed7e2790d2eb3421c00542c3eb9763bfe715c3ad23647c419e0/ctr type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,name=systemd)

When I stop the container, the doubled path remains mounted but I can now manually unmount both the single/double path from the host.

@mheon
Copy link
Member

mheon commented May 14, 2018

@aalba6675 Is this after a restart (container has already started and stopped once, then started again)?

@aalba6675
Copy link
Author

@mheon - no - this is a clean start from boot with the mount --make-private /tmp just to get the bind mounts working.

@aalba6675
Copy link
Author

aalba6675 commented May 14, 2018

It seems that if I take the trouble to umount <double_path>; umount <single_path>; from the host, the container can be start/stop without any issues. BTW I am using podman from master.

rh-atomic-bot pushed a commit that referenced this issue May 31, 2018
This allows callers to avoid delegating to OCI runtimes for cases
where they feel that the runtime hook handling is unreliable [1].

[1]: #730 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>

Closes: #855
Approved by: rhatdan
wking added a commit to wking/libpod that referenced this issue May 31, 2018
Instead of delegating to the runtime, since some runtimes do not seem
to handle these reliably [1].

[1]: containers#730 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/libpod that referenced this issue May 31, 2018
Instead of delegating to the runtime, since some runtimes do not seem
to handle these reliably [1].

[1]: containers#730 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/libpod that referenced this issue May 31, 2018
Instead of delegating to the runtime, since some runtimes do not seem
to handle these reliably [1].

[1]: containers#730 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/libpod that referenced this issue Jun 1, 2018
Instead of delegating to the runtime, since some runtimes do not seem
to handle these reliably [1].

[1]: containers#730 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/libpod that referenced this issue Jun 1, 2018
Instead of delegating to the runtime, since some runtimes do not seem
to handle these reliably [1].

[1]: containers#730 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
rh-atomic-bot pushed a commit that referenced this issue Jun 4, 2018
Instead of delegating to the runtime, since some runtimes do not seem
to handle these reliably [1].

[1]: #730 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>

Closes: #864
Approved by: rhatdan
@mheon
Copy link
Member

mheon commented Jun 14, 2018

@aalba6675 Can you try this with Podman 0.6.2 to see if it's fixed? We execute postrun hooks ourselves now, instead of calling out to runc, so cleanup of the mounts should be happening now.

@thoraxe
Copy link

thoraxe commented Jun 17, 2018

I was able to cause this problem with 0.6.2 but I'm not sure how to reproduce it or unmount the cgroups...

@mheon
Copy link
Member

mheon commented Jun 18, 2018

Verified here. We're further than we were before - oci-systemd-hook is actually running. Now it's throwing errors. This may actually be an oci-systemd-hook bug.

@aalba6675
Copy link
Author

Sorry for being MIA - just reproduced this on
Version: 0.6.4-dev
Go Version: go1.10.2
OS/Arch: linux/amd64

Things are looking better: @mheon I saw the error message

failed to stop container 6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0: cgroups: unable to remove paths /sys/fs/cgroup/systemd/libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0

The failure is due to the doubled child path: /sys/fs/cgroup/systemd/libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0/ctr/libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0/ctr - so the parent cannot be removed.

@thoraxe - after the pod is stopped you should be able to manually umount the paths on the host; to cleanup I do the following

# podman stop <the_container>
# umount the doubled path
sudo umount /sys/fs/cgroup/systemd/libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0/ctr/libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0/ctr

# now umount the single path
sudo umount /sys/fs/cgroup/systemd/libpod_parent/libpod-6cab0d8c9dc817e69fd0c02a7657e9a83edab3903f25f50d758c1511283bbbf0/ctr

@mheon
Copy link
Member

mheon commented Jun 19, 2018

I spent some more time debugging this yesterday. New conclusions:

  • oci-systemd-hook is being correctly called and is running, but incorrectly has a dependency on the PID part of state and will refuse to start if it is not set in the post-hook. This is nonsensical, as postdelete hooks run after the container has definitely stopped and does not have a PID. Fix for this will need to be in oci-systemd-hook
  • However, oci-systemd-hook is not responsible for performing cleanup of the CGroup mounts we see leaking. The code assumes that they will be cleaned up when the container's mount namespace closes. This assumption does not seem to be holding true for cases where the container has volume mounts.

@rhatdan
Copy link
Member

rhatdan commented Jun 19, 2018

Matt did you attempt to remove the pid failure, and it still failed to find the directory or was this just on a non systemd container

@mheon
Copy link
Member

mheon commented Jun 19, 2018

@rhatdan I do believe we had another error, let me see what that was

@mheon
Copy link
Member

mheon commented Jun 19, 2018

@rhatdan
Jun 19 09:23:46 devel.lldp.net oci-systemd-hook[4989]: systemdhook <error>: 585a70c1e938: Failed to open config file: /var/lib/containers/storage/overlay-containers/585a70c1e938ba2aa7a4613f4af212b1f6bb82d9e80dea43d8445d99ea1b00cd/userdata/config.json: No such file or directory

It's trying to hit c/storage config when c/storage has already deleted the container.

@rhatdan
Copy link
Member

rhatdan commented Jun 19, 2018

So in this case everything was cleaned up correct?

@mheon
Copy link
Member

mheon commented Jun 19, 2018

It looks like everything is being cleaned up with no volume mounts present, but I'm not 100% sure, given that oci-systemd-hook is still throwing errors. With volume mounts, I can't get systemd to do anything except exit instantly, so I think I must be doing something wrong.

@mheon
Copy link
Member

mheon commented Jun 19, 2018

I can confirm that, while the container is exiting instantly if mounts are present, it is leaving mounts lying around in /tmp.
tmpfs on /tmp/ocitmp.Rvr8xj type tmpfs (rw,nosuid,nodev,relatime,context="system_u:object_r:container_file_t:s0:c172,c562",size=65536k,mode=755) tmpfs on /tmp/ocitmp.Rvr8xj/.containerenv type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /tmp/ocitmp.Rvr8xj/secrets type tmpfs (rw,nosuid,nodev,seclabel,mode=755)

@aalba6675
Copy link
Author

aalba6675 commented Jun 20, 2018

@mheon I think you are seeing cascading effects of bugs:

The /tmp stuff is due to the default mount --make-shared on /tmp; the container startup bails leaking the ocitmp mount. Reported here projectatomic/oci-systemd-hook#92. Are you using a host (F28, silverblue) that is make-shared by default on /tmp? Again this seems to happen only with volume mounts.

#893 - may also be a duplicate of this bug; it you don't clean up the cgroups mount manually on the host before your ExecReload, the cgroup will leak and you will see the "explosion".

@mheon
Copy link
Member

mheon commented Jun 20, 2018

@aalba6675 I think #893 is probably separate, as the CGroups mounts aren't being created there - oci-systemd-hook will only run on containers with init set as command, which is not true for the containers there. Therefore, we shouldn't have any CGroup mounts present at all there. I think that might be a result of our CGroup handling conflicting with systemd's.

On /tmp - my development VM is still on F27, but /tmp is definitely shared propagation. So I think I am indeed hitting that bug.

@aalba6675
Copy link
Author

aalba6675 commented Jun 20, 2018

A workaround is to patch the template to /tmp/oci/ocitmp.XXXX and have /tmp/oci --make-private.

@rhatdan
Copy link
Member

rhatdan commented Jun 20, 2018

First off we should probably not use /tmp at all, except for when we are doing this as a non privileged user. We should be doing this under /run/libpod

@aalba6675
Copy link
Author

On Fedora 28 /run has shared propagation, so this would at least need a tmpfs at /run/libpod/tmp with private propagation then use a template like /run/libpod/tmp/ocitmp.XXXXXX.

@rhatdan
Copy link
Member

rhatdan commented Jun 20, 2018

How about projectatomic/oci-systemd-hook#98 to fix this?

@rhatdan
Copy link
Member

rhatdan commented Jun 20, 2018

Could you try it out.

@rhatdan
Copy link
Member

rhatdan commented Jul 12, 2018

@aalba6675 Could you checkout podman-0.7.1 which added podman container cleanup which could fix some of these issues.

@rhatdan rhatdan closed this as completed Jul 17, 2018
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

6 participants