Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

prepare-app: perf issue with mounting /sys (benchmarks) #2351

Closed
alban opened this issue Mar 31, 2016 · 3 comments · Fixed by #2386
Closed

prepare-app: perf issue with mounting /sys (benchmarks) #2351

alban opened this issue Mar 31, 2016 · 3 comments · Fixed by #2386

Comments

@alban
Copy link
Member

alban commented Mar 31, 2016

Issue found via #2324: starting a pod with 100 apps is really slow. The problem goes away when testing with this patch:

--- a/stage1/prepare-app/prepare-app.c
+++ b/stage1/prepare-app/prepare-app.c
@@ -152,7 +152,7 @@ int main(int argc, char *argv[])
        };
        static const mount_point dirs_mount_table[] = {
                { "/proc", "/proc", "bind", NULL, MS_BIND|MS_REC },
-               { "/sys", "/sys", "bind", NULL, MS_BIND|MS_REC },
+               { "/sys", "/sys", "bind", NULL, MS_BIND },
                { "/dev/shm", "/dev/shm", "bind", NULL, MS_BIND },
                { "/dev/pts", "/dev/pts", "bind", NULL, MS_BIND },
                { "/run/systemd/journal", "/run/systemd/journal", "bind", NULL, MS_BIND },

prepare-app recursively bind-mounts /sys from stage1 to the stage2 (apps' rootfs). "Recursive" means it includes all the cgroup mounts because they are in /sys/fs/cgroup. Moreover, rkt bind mounts some cgroup knob files in the cgroup filesystem for enabling the memory and cpu isolator.

The number of cgroup bind mounts in stage1 is linear with the number of apps: O(n)

The number of cgroup bind mounts in stage2 is quadratic with the number of apps: O(n^2)

With one app, I have 17 bind mounts related to cgroups. With 100 apps, 17 * 100 * 100 = 170.000 bind mounts.

For each change in the mount table, systemd is notified via inotify on /proc/self/mountinfo and it checks the configuration of that mount in /etc/systemd/system, /run/systemd/system, /usr/local/lib/systemd/system and /usr/lib64/systemd/system.

systemd does about 30 syscalls per new mount notified via /proc/self/mountinfo. That would be 5.100.000 syscalls for mounting cgroups in a 100-app pod.

@alban
Copy link
Member Author

alban commented Mar 31, 2016

appc/spec says /sys should be made available in stage2's rootfs:
https://github.com/appc/spec/blob/master/spec/OS-SPEC.md#devices-and-file-systems

But it does not mandate to make cgroups available in /sys/fs/cgroup or to make the bind mount recursive.

@alban
Copy link
Member Author

alban commented May 4, 2016

This was reverted by #2542 => reopen.

@alban
Copy link
Member Author

alban commented May 24, 2016

I'll try to mount a new sysfs in the stage2 rootfs instead of a bind mount of stage1's rootfs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants