Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

kvm: too low thread limit #3382

Closed
lucab opened this issue Nov 16, 2016 · 7 comments
Closed

kvm: too low thread limit #3382

lucab opened this issue Nov 16, 2016 · 7 comments

Comments

@lucab
Copy link
Member

lucab commented Nov 16, 2016

I haven't fully debugged where the value is coming from, but (as of rkt 1.19.0) the current thread/proc limit on kvm is too low:

$ sudo ./rkt run --insecure-options=all  docker://debian --exec /bin/bash -- -c "cat /proc/sys/kernel/threads-max"
[    0.784177] debian[135]: 1926

This results in a very low thread/proc limit on systemd-pid1 itself:

$ sudo ./rkt run --insecure-options=all  docker://debian --exec /bin/bash -- -c "cat /proc/1/limits | grep proc"
[    0.960046] debian[135]: Max processes             963                  963                  processes

This in turn results in occasional fork/clone failures, eg. during a daemon-reload. This was spotted while writing the CRI tests at #3371.

@lucab
Copy link
Member Author

lucab commented Nov 16, 2016

/cc @jjlakis @lukasredynk @grahamwhaley @sameo

I'm currently disabling the CRI smoke test on kvm. The threads-max value used to be correlated with RAM value, but changing lkvm invocation adding some memory doesn't seem to influence it here.

@lucab
Copy link
Member Author

lucab commented Nov 22, 2016

For reference, this affects both lkvm and qemu. I also have a strong feeling that the "stop" non-deterministic test failures in #3091 are due to this.

@grahamwhaley
Copy link

Hi @lucab (I'm now back at desk...) Thanks for the info - I'll go track down where those values come from and report back here.

@lucab
Copy link
Member Author

lucab commented Nov 22, 2016

Thanks @grahamwhaley. I did some code reading the other day, here are the references I collected while at it:

However I'm missing how we end up with such a low value, while it looks like it should default to a much higher MAX_THREADS on boot.

@grahamwhaley
Copy link

Digging into this some more, changing the RAM values in stage1/init/kvm/resources.go does effect the value you get in the runtime - but - due to the, err, quirks of the make system, if you just do a modification and rebuild, then those do not turn up in the relevant aci afaict :-(

Just for example:

Clean build:
And in build dir, md5sum `find . -type f -name init`

f8b18db2c6c618ee7d59be39b5006016  ./target/tools/init
f8b18db2c6c618ee7d59be39b5006016  ./aci-for-kvm-flavor/rootfs/init
f8b18db2c6c618ee7d59be39b5006016  ./aci-for-kvm-flavor/kvm-lkvm/rootfs/init

touch stage1/init/kvm/resources.go and make

5bc5cb96da60cf9fa1d23c29bd18f2d8  ./target/tools/init
5bc5cb96da60cf9fa1d23c29bd18f2d8  ./aci-for-kvm-flavor/rootfs/init
f8b18db2c6c618ee7d59be39b5006016  ./aci-for-kvm-flavor/kvm-lkvm/rootfs/init

and we can see the kvm-lkvm init does not get updated (I was going to go check in the aci itself, but the symptoms here match what I was seeing).

Anyway, it does look like there is a direct ratio of RAM size to threads-max - the direct ratio always being around '128', even on my host machine.
Thus, I bumped the resources.go settings to 512 (from 128), and my free is now 1Gbyte, and my threads-max is 7965 (under lkvm).

This leaves us with a couple of things then:

  1. Is there an easy way to fix the build system :-)
  2. To contemplate the current RAM settings - do we just bump them up for now, and is there a longer term better solution.

For (2), I'll go have a conversation and contemplate with the other Clear Containers architects to start with.

@grahamwhaley
Copy link

Just to add more info, extending the memory on the command line also works for me. This:

sudo ./rkt --debug --stage1-path=./stage1-kvm-lkvm.aci run --insecure-options=all docker://debian --memory=1G --exec /bin/bash --interactive

Gives me (still with the 512 in the resources.go) ~1.4G in free and 11417 in thread-max, for both lkvm and qemu runs.

@lucab
Copy link
Member Author

lucab commented Nov 25, 2016

changing the RAM values does effect the value you get in the runtime - but - due to the, err, quirks of the make system

Ah, I overlooked this. I'd guess that some stamp files are not properly setup for kvm init.

Anyway, it does look like there is a direct ratio of RAM size to threads-max - the direct ratio always being around '128', even on my host machine.

Then with stop and CRI tests we hit pathological cases, as the pod start with either one or no app, thus 128(+128) MB of RAM, thus a very low thread-max setting and an even lower nproc limit on pid1.

To contemplate the current RAM settings - do we just bump them up for now, and is there a longer term better solution.

For the short term I think we should bump the systemMemOverhead and ensure that VM are always spawned with systemMem+<n>*appMem memory (where n may be 1 or 2). For the longer term, it would be nice to:

  • double-check those default memory value (I don't see any reference regarding how they have been initially estimated)
  • at kernel level, check if thread-max can be decoupled from RAM value (if it makes sense, perhaps via a boot cmdline) or at least not halved for pid1 limit
  • teach systemd to re-raise its soft/hard nproc limit (I didn't check in deep, but it should be exceptionally possible for pid1). It may be useful anyway for memory hotplugging.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants