New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mounting host /var with rkt fly causes O(2^(number of containers)) growth in number of bind mounts #2553

Open
mikedanese opened this Issue Apr 29, 2016 · 18 comments

Comments

Projects
None yet
7 participants
@mikedanese

mikedanese commented Apr 29, 2016

Run this a couple times to reproduce:

rkt run \
  --volume host-var,kind=host,source=/var \
  --mount volume=host-var,target=/host_var \
  --insecure-options=image \
  --stage1-name=coreos.com/rkt/stage1-fly:1.4.0 \
  docker://busybox --exec=/bin/sh -- -c "echo hi"
mount | wc -l
@mikedanese

This comment has been minimized.

mikedanese commented Apr 29, 2016

mikedanese@k-0-master:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04 LTS
Release:        16.04
Codename:       xenial

Rkt tarball pulled down from github releases page.

@pquerna

This comment has been minimized.

pquerna commented Apr 29, 2016

Also ran into this, did two things to work around:

  1. only mounting the specific path we needed in /var, not all of /var.

  2. When running under a systemd unit file, use the uuid-file to cleanup exited rkt containers a bit more than waiting for a rkt gc.

ExecStartPre=-/usr/bin/mkdir -p /run/rkt-uuids
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/run/rkt-uuids/my-service

ExecStart=/usr/bin/rkt run \
    --uuid-file-save=/run/rkt-uuids/my-service \
    --stage1-name=coreos.com/rkt/stage1-fly:1.4.0 \
        .... more options ....
@steveeJ

This comment has been minimized.

Contributor

steveeJ commented Apr 29, 2016

@mikedanese

I just tried to reproduce this on my machine but as you can see the growth is linear:

$ mount | wc -l; for i in `seq 1 10`; do sudo rkt run   --volume host-var,kind=host,source=/var  --mount volume=host-var,target=/host_var   --insecure-options=image   --stage1-name=coreos.com/rkt/stage1-fly:1.4.0  docker://busybox --exec=/bin/sh -- -c "echo -n more mounts.."  2>/dev/null ; mount | wc -l; done
31
more mounts..55
more mounts..79
more mounts..103
more mounts..127
more mounts..151
more mounts..175
more mounts..199
more mounts..223
more mounts..247
more mounts..271

The stage1-fly code does not use MS_REC for volume mounts, so I can't really explain the behavior you're seeing. Could you upload the output of mount after the first 2 executions?

/cc @alban

@mikedanese

This comment has been minimized.

mikedanese commented Apr 29, 2016

@steveeJ is your rkt data dir /var/lib/rkt?

root@k-0-minion-c3s3:/home/mikedanese# mount | wc -l
69
root@k-0-minion-c3s3:/home/mikedanese# rkt run   --volume host-var,kind=host,source=/var   --mount volume=host-var,target=/host_var   --insecure-options=image   --stage1-name=coreos.com/rkt/stage1-fly:1.4.0 docker://busybox --exec=/bin/true 2>/dev/null; mount | wc -l
125
root@k-0-minion-c3s3:/home/mikedanese# rkt run   --volume host-var,kind=host,source=/var   --mount volume=host-var,target=/host_var   --insecure-options=image   --stage1-name=coreos.com/rkt/stage1-fly:1.4.0 docker://busybox --exec=/bin/true 2>/dev/null; mount | wc -l
237
root@k-0-minion-c3s3:/home/mikedanese# rkt run   --volume host-var,kind=host,source=/var   --mount volume=host-var,target=/host_var   --insecure-options=image   --stage1-name=coreos.com/rkt/stage1-fly:1.4.0 docker://busybox --exec=/bin/true 2>/dev/null; mount | wc -l
461
root@k-0-minion-c3s3:/home/mikedanese# rkt run   --volume host-var,kind=host,source=/var   --mount volume=host-var,target=/host_var   --insecure-options=image   --stage1-name=coreos.com/rkt/stage1-fly:1.4.0 docker://busybox --exec=/bin/true 2>/dev/null; mount | wc -l
909
root@k-0-minion-c3s3:/home/mikedanese# rkt run   --volume host-var,kind=host,source=/var   --mount volume=host-var,target=/host_var   --insecure-options=image   --stage1-name=coreos.com/rkt/stage1-fly:1.4.0 docker://busybox --exec=/bin/true 2>/dev/null; mount | wc -l
1805
root@k-0-minion-c3s3:/home/mikedanese# rkt run   --volume host-var,kind=host,source=/var   --mount volume=host-var,target=/host_var   --insecure-options=image   --stage1-name=coreos.com/rkt/stage1-fly:1.4.0 docker://busybox --exec=/bin/true 2>/dev/null; mount | wc -l
3597
root@k-0-minion-c3s3:/home/mikedanese# rkt run   --volume host-var,kind=host,source=/var   --mount volume=host-var,target=/host_var   --insecure-options=image   --stage1-name=coreos.com/rkt/stage1-fly:1.4.0 docker://busybox --exec=/bin/true 2>/dev/null; mount | wc -l
7181
root@k-0-minion-c3s3:/home/mikedanese# rkt run   --volume host-var,kind=host,source=/var   --mount volume=host-var,target=/host_var   --insecure-options=image   --stage1-name=coreos.com/rkt/stage1-fly:1.4.0 docker://busybox --exec=/bin/true 2>/dev/null; mount | wc -l
14349

output of mounts here: https://clbin.com/7Z1UA

@mikedanese

This comment has been minimized.

mikedanese commented Apr 29, 2016

after 0 executions: https://clbin.com/A4N4y

after 1 execution: https://clbin.com/u0aOA

after 2 executions: https://clbin.com/fC0OY

@mikedanese

This comment has been minimized.

mikedanese commented Apr 29, 2016

mikedanese@k-0-master:~$ uname -a
Linux k-0-master 4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
@mikedanese

This comment has been minimized.

mikedanese commented Apr 29, 2016

@pquerna thanks for the workarounds.

@steveeJ

This comment has been minimized.

Contributor

steveeJ commented Apr 30, 2016

@mikedanese please also provide the content of /proc/self/mountinfo for 0-2 executions?

@mikedanese

This comment has been minimized.

mikedanese commented May 2, 2016

/proc/self/mountinfo on the host:

0 invocations https://clbin.com/lmuYI

1 invocation https://clbin.com/TLZJO

2 invocations https://clbin.com/XcXN6

3 invocations https://clbin.com/bOdLH

@alban alban added the component/fly label May 4, 2016

@alban

This comment has been minimized.

Member

alban commented May 4, 2016

I can reproduce the bug.

/sys is shared on all "fly" containers (in /var/lib/rkt):
https://github.com/coreos/rkt/blob/master/stage1_fly/run/main.go#L266

This is enough to make the number of mounts grow O(n^2). When starting only 2 fly containers, those 3 lines are enough to explain the problem:

cgroup on /var/lib/rkt/pods/run/1a.../stage1/rootfs/opt/stage2/busybox/rootfs/sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /var/lib/rkt/pods/run/67.../stage1/rootfs/opt/stage2/busybox/rootfs/sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /var/lib/rkt/pods/run/1a.../stage1/rootfs/opt/stage2/busybox/rootfs/host_var/lib/rkt/pods/run/67.../stage1/rootfs/opt/stage2/busybox/rootfs/sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
  • Line 1: the cgroup is mounted in the first fly container (i.e. in /var)
  • Line 2: the cgroup is mounted in the second fly container (i.e. in /var)
  • Line 3: since /var is mounted with MS_SHARED, the cgroup mount from line 2 is propagated to the first fly container.

@alban alban added this to the v1.7.0 milestone May 4, 2016

@alban alban added the kind/bug label May 4, 2016

@alban

This comment has been minimized.

Member

alban commented May 4, 2016

I don't know how we could fix this. At a minimum, we should recommend against making a volume of any host directory that is a parent directory of /var/lib/rkt.

@alban

This comment has been minimized.

Member

alban commented May 4, 2016

@steveeJ Do you have /var/lib/rkt mounted separately? That would explain why you don't reproduce the bug.

@iaguis

This comment has been minimized.

Member

iaguis commented May 4, 2016

@steveeJ Do you have /var/lib/rkt mounted separately? That would explain why you don't reproduce the bug.

I think he was using a tmpfs on /var/lib/rkt to work around #1498

@alban

This comment has been minimized.

Member

alban commented May 25, 2016

No progress on this. Moving milestone.

@lucab lucab modified the milestones: v1.9.0, v1.8.0 Jun 9, 2016

@steveeJ

This comment has been minimized.

Contributor

steveeJ commented Jun 14, 2016

@alban do you think you could write a test for this to demonstrate the issue?

@tmrts tmrts modified the milestones: v1.10.0, v1.9.0 Jun 23, 2016

@tmrts

This comment has been minimized.

Contributor

tmrts commented Jun 23, 2016

@steveeJ @alban missed your message, assigning it to him for the next release

alban added a commit to kinvolk/rkt that referenced this issue Jul 5, 2016

tests: new test TestFlyVolumeVar
To reproduce the explosive number of bind mounts manually, you can run
the following command several times:

> rkt --insecure-options=image run \
>     --volume host-var,kind=host,source=/var \
>     --mount volume=host-var,target=/host_var \
>     image.aci

It uses the volume /var because rkt stores the pods in a subdirectory of
/var by default.

The new test TestFlyVolumeVar implemented in this patch reproduces the
issue automatically. Instead of using the volume /var, it uses the
volume ctx.DataDir() because the test framework stores the pods in that
directory.

Reproduces rkt#2553
@alban

This comment has been minimized.

Member

alban commented Jul 5, 2016

@steveeJ test implemented in #2871

@alban alban modified the milestones: v1.11.0, v1.10.0 Jul 6, 2016

@iaguis

This comment has been minimized.

Member

iaguis commented Jul 20, 2016

@steveeJ could you reproduce this issue with @alban's "test"?

@iaguis iaguis modified the milestones: v1+, v1.11.0 Jul 20, 2016

@jonboulle jonboulle assigned lucab and unassigned alban and steveeJ Oct 14, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment