New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource-control doesn't work for user instance #3744

Closed
tchernomax opened this Issue Jul 17, 2016 · 4 comments

Comments

4 participants
@tchernomax
Copy link
Contributor

tchernomax commented Jul 17, 2016

Submission type

  • Bug report

systemd version the issue has been seen with

systemd 230
+PAM -AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN

Used distribution

Archlinux (update: 17/07/2016)

Expected behaviour you didn't see

$ systemd-run --user -p CPUQuota=20% stress -c 1
$ top
…
…  20  …  stress 
…

Unexpected behaviour you saw

$ systemd-run --user -p CPUQuota=10% stress -c 1
$ top
…
…  100  …  stress 
…

Steps to reproduce the problem

With cgroup v1 (not unified hierarchy).

$ systemd-run --user -p CPUQuota=10% stress -c 1

remarks

I don't know if it's normal or not.
On one side, I read that cgroup can't be created by unprivileged process (same test without --user option works great).
On the other side, I found bug #3500 which make me think it's possible, at least on unified hierarchy. I don't use unified hierarchy:

$ cat /proc/cmdline
root=LABEL=ROOTPART rootfstype=btrfs rootflags=subvol=rootfs rw add_efi_memmap initrd=EFI/arch/initramfs-linux.img

In case it's not possible, I think it should be mentioned in the systemd.resource-control man page.

Do you need more information ?
Thank you.

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Jul 19, 2016

on cgroupv1 delegation of cgroup controllers to unprivileged processes is not safe, hence we don't do it. This is fixed for cgroupv2.

But yeah, I figure we should document that.

@LukeShu

This comment has been minimized.

Copy link
Contributor

LukeShu commented Sep 13, 2017

So am I understanding correctly that with cgroup v2 / unified hierarchy, that the stress process should be limited to 20% CPU?

Because that's not what I'm seeing (systemd 234 / linux 4.11.9 / Parabola (like Arch)). Should I open a regression of this bug?

@tchernomax

This comment has been minimized.

Copy link
Contributor

tchernomax commented Sep 14, 2017

@LukeShu currently you can't control CPU with cgroup v2

https://www.kernel.org/doc/Documentation/cgroup-v2.txt

The interface for the cpu controller hasn't been merged yet

@LukeShu

This comment has been minimized.

Copy link
Contributor

LukeShu commented Sep 15, 2017

@tchernomax Wow, I'm embarased.

However, I still think this has regressed. But let's test it using a v2 controller that actually exists: memory

  1. Disable any swap space, it doesn't count toward MemoryMax, and MemorySwapMax can't be set with bus_append_unit_property_assignment(); so disabling swap will make monitoring this easier.
  2. Let ./malloc be a program that keeps malloc(3)ing more memory 1M at a time, and printing to stderr how much it has. Also, have it write to the memory, so that the kernel has to actually assign it a page.

First, let's verify that it works for the system instance:

  1. In one terminal:
$ journalctl --no-hostname --follow --unit=malloc.service
-- Logs begin at Fri 2015-05-08 22:24:57 EDT. --
  1. In a second terminal:
$ sudo  systemd-run --unit=malloc.service --property=MemoryMax=100M ./malloc
Running as unit: malloc.service
$ 
  1. In the first window, you'll see:
...
Sep 14 22:45:29 malloc[31295]: have        103,809,024 B
Sep 14 22:45:29 malloc[31295]: have        104,857,600 B
Sep 14 22:45:29 systemd[1]: malloc.service: Main process exited, code=killed, status=9/KILL
Sep 14 22:45:29 systemd[1]: malloc.service: Unit entered failed state.
Sep 14 22:45:29 systemd[1]: malloc.service: Failed with result 'signal'.

We can see that it got killed once it allocated tried allocating anything more than 100M=100*1024*1024=104857600. I'm kind of disappointed it got SIGKILLed instead of having malloc fail, but that's neither here nor there.

Now, let's try the same thing, but with the user instance:

$ journalctl --no-hostname --follow --user-unit=malloc.service
-- Logs begin at Fri 2015-05-08 22:24:57 EDT. --

For fun, go ahead and run htop in another window. And, finally:

$ systemd-run --user --unit=malloc.service --property=MemoryMax=100M ./malloc
Running as unit: malloc.service
$ 

This time, you see it use up the entire system memory before the OOM killer fixes that:

...
Sep 14 23:18:18 malloc[24274]: have      2,343,567,360 B
Sep 14 23:18:18 malloc[24274]: have      2,344,615,936 B
Sep 14 23:18:18 malloc[24274]: have      2,345,664,512 B
Sep 14 23:18:18 malloc[24274]: have      2,346,713,088 B
Sep 14 23:18:31 systemd[388]: malloc.service: Main process exited, code=killed, status=9/KILL
Sep 14 23:18:31 systemd[388]: malloc.service: Unit entered failed state.
Sep 14 23:18:31 systemd[388]: malloc.service: Failed with result 'signal'.

(the journal actually stopped updating during the low-memory situation; htop tells me that it did fill the entire ~3.6G I had free, not just the 2.3G that it logged to the journal.

Which makes sense, because

$ cat /sys/fs/cgroup/user.slice/user-$UID.slice/user@$UID.service/cgroup.controllers 
io memory pids
$ cat /sys/fs/cgroup/user.slice/user-$UID.slice/user@$UID.service/cgroup.subtree_control 
$ 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment