Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-nspawn seems to ignore MemorySwapMax parameter #6074

Closed
vp1981 opened this issue Jun 2, 2017 · 5 comments

Comments

2 participants
@vp1981
Copy link

commented Jun 2, 2017

Submission type

  • Bug report

systemd version the issue has been seen with

$ systemctl --version

systemd 233
+PAM -AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN default-hierarchy=hybrid

NOTE: Do not submit bug reports about anything but the two most recently released systemd versions upstream!

Used distribution

$ cat /etc/os-release

NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
ID_LIKE=archlinux
ANSI_COLOR="0;36"
HOME_URL="https://www.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"

Kernel:

$ uname -a

Linux smoon4.vl-lomov.ru 4.11.3-2-ck-sandybridge #1 SMP PREEMPT Fri May 26 15:51:22 EDT 2017 x86_64 GNU/Linux

(ck patches with stock kernel configuration).

The host has 8G RAM and 8G swap.

In case of bug report: Expected behaviour you didn't see

I created systemd-nspawn container node2-smoon4 , set resource limits on it

[Service]
MemoryHigh=500M
MemoryMax=900M
MemorySwapMax=1M

in /etc/systemd/system/systemd-nspawn@node2-smoon4.service.d/memory.conf
and started the container by

$ sudo systemctl systemd-nspawn@node2-smoon4

Then I run test program in the container

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>

#define SIZE 32000000
uint64_t arr1[SIZE];
uint64_t arr2[SIZE];
uint64_t arr3[SIZE];
uint64_t arr4[SIZE];
uint64_t arr5[SIZE];
uint64_t arr6[SIZE];
uint64_t arr7[SIZE];
uint64_t arr8[SIZE];

int main(void) {
  uint32_t i = 0;

  printf("Populating arrays...\n");
  sleep(10);
  for(i = 0; i < SIZE; i++)
  {
    arr1[i] = rand();
    arr2[i] = rand();
    arr3[i] = rand();
    arr4[i] = rand();
    arr5[i] = rand();
    arr6[i] = rand();
    arr7[i] = rand();
    arr8[i] = rand();
  }
  printf("  DONE\n");
  sleep(10);

  uint64_t res = 0;

  printf("Manipulating of arrays...\n");
  sleep(10);
  for(i = 0; i < SIZE; i++)
  {
    res += arr1[i] + arr2[i] + arr3[i] + arr4[i] + arr5[i]
         + arr6[i] + arr7[i] + arr8[i];
  }
  printf("  DONE\n");
  sleep(10);

  printf("Average: %f\n", (double)res/((double) SIZE*8));
  return 0;
}

I would expect that program will fail because it requires more memory (about 2G) than the container suppose to have.

In case of bug report: Unexpected behaviour you saw

But the program runs fine except I turn off swap on host. Moreover, it runs fine regardless of MemorySwapMax value, I tried 1, 10 and 10M. On host I see that swap is used (of about 1G) while memory used by the program (according to top and systemd-cgtop) is about 900M.

Besides that value 0 is treated as wrong, though I would expect that this will mean to not use swap at all.

In case of bug report: Steps to reproduce the problem

  1. Create a container, I used steps described on page https://wiki.archlinux.org/index.php/Systemd-nspawn;
  2. Stop it, create the memory.conf, see above;
  3. Reload systemd daemon:
    $ sudo systemctl daemon-reload
  4. Start the container
    $ sudo systemctl start systemd-nspawn@container
    The systemd assumes that container is located under /var/lib/machines but beginning with ver. 233 systemd allows to use symlinked containers, in my case the container node2-smoon4 is symlink in /var/lib/machines to other local directory (say /mnt/storage/containers/node2-smoon4).
  5. Optionally check that it has limited resources:
    $ sudo systemctl status systemd-nspawn@container
    In my case I see line: Memory: ... (high: ... max: ...).
  6. Login into the container, compile the program, run it. Depending on there is enough swap space or not, program will finish or fail.

P.S. It might be that I incorrectly understand the systemd.resource-control(5), because MemoryHigh mentions that it "disables MemoryLimit=", same as MemorySwapMax while MemoryMax say it "replaces MemoryLimit=".

@evverx

This comment has been minimized.

Copy link
Member

commented Jun 2, 2017

Thanks for the bug report.

It appears that you're not using the unified cgroup hierarchy. Could you attach the output of grep cgroup /proc/self/mountinfo? Also, could you run

systemd-analyze set-log-level debug
systemctl start <container-name>
journalctl --sync
journalctl -u <container-name>

? Do you see something like

cgroup-compat: Applying MemoryMax -1 as MemoryLimit

?

@vp1981

This comment has been minimized.

Copy link
Author

commented Jun 2, 2017

Hello,

It appears that you're not using the unified cgroup hierarchy

Does this

grep cgroup /proc/self/mountinfo

24 17 0:21 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:4 - tmpfs tmpfs ro,mode=755                 
25 24 0:22 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:5 - cgroup2 cgroup rw      
26 24 0:23 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup rw,xattr,name=systemd                                                                                           
28 24 0:25 / /sys/fs/cgroup/cpu rw,nosuid,nodev,noexec,relatime shared:7 - cgroup cgroup rw,cpu       
29 24 0:26 / /sys/fs/cgroup/net_cls rw,nosuid,nodev,noexec,relatime shared:8 - cgroup cgroup rw,net_cls                                                                                                      
30 24 0:27 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,blkio   
31 24 0:28 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup rw,freezer                                                                                                     
32 24 0:29 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,perf_event                                                                                               
33 24 0:30 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:12 - cgroup cgroup rw,cpuset                                                                                                       
34 24 0:31 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,memory                                                                                                       
35 24 0:32 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,devices                                                                                                     
36 24 0:33 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,pids    

shows that that host don't use unified cgroup hierarchy?

Do you see something like

cgroup-compat: Applying MemoryMax -1 as MemoryLimit

No, this is copy-pasted from journalctl:

...
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: Trying to enqueue job systemd-nspawn@node2-smoon4.service/start/replace
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: Installed new job systemd-nspawn@node2-smoon4.service/start as 3935
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: Enqueued job systemd-nspawn@node2-smoon4.service/start as 3935
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: Failed to set cpu.shares: No such file or directory
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: Failed to set cpu.cfs_period_us: No such file or directory
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: Failed to set cpu.cfs_quota_us: No such file or directory
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: cgroup-compat: Applying MemoryMax 943718400 as MemoryLimit
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: Passing 0 fds to service
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: About to execute: /usr/bin/systemd-nspawn --quiet --keep-unit --boot --link-journal=try-guest --network-veth -U --setting
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: Forked /usr/bin/systemd-nspawn as 14188
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: Changed dead -> start
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: Starting Container node2-smoon4...
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[14188]: systemd-nspawn@node2-smoon4.service: Executing: /usr/bin/systemd-nspawn --quiet --keep-unit --boot --link-journal=try-guest --network-veth -U --settings=o
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: Got notification message from PID 14188 (STATUS=Container running., X_NSPAWN_LEADER_PID=14190)
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: Got notification message from PID 14188 (READY=1)
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: Changed start -> running
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: systemd-nspawn@node2-smoon4.service: Job systemd-nspawn@node2-smoon4.service/start finished, result=done
Jun 03 07:03:25 smoon4.vl-lomov.ru systemd[1]: Started Container node2-smoon4.
...

I see: cgroup-compat: Applying MemoryMax 943718400 as MemoryLimit, so I assume that this parameter works. But I don't see MemoryHigh though according systemd.resource-control(5) this sets "soft" limit so it may be not reported as such. Also, there is no information about MemorySwapMax.

Below is part of systemctl status systemd-nspawn@node2-smoon4:

● systemd-nspawn@node2-smoon4.service - Container node2-smoon4                                                                                                                                              
   Loaded: loaded (/usr/lib/systemd/system/systemd-nspawn@.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/systemd-nspawn@node2-smoon4.service.d
           └─memory.conf
   Active: active (running) since Sat 2017-06-03 07:03:25 +08; 5min ago
     Docs: man:systemd-nspawn(1)
 Main PID: 14188 (systemd-nspawn)
   Status: "Container running: Startup finished in 1.440s."
    Tasks: 8 (limit: 16384)
   Memory: 24.5M (high: 500.0M max: 900.0M swap max: 1.0M)
   CGroup: /machine.slice/systemd-nspawn@node2-smoon4.service
           ├─payload                                                                                                                                                                                        
           │ ├─init.scope                                                                                                                                                                                   
           │ │ └─14190 /usr/lib/systemd/systemd                                                                                                                                                             
           │ └─system.slice                                                                                                                                                                                 
           │   ├─console-getty.service                                                                                                                                                                      
           │   │ └─14232 /sbin/agetty --noclear --keep-baud console 115200,38400,9600 vt220                                                                                                                 
           │   ├─dbus.service                                                                                                                                                                               
           │   │ └─14221 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation                                                                                         
           │   ├─sshd.service                                                                                                                                                                               
           │   │ └─14231 /usr/bin/sshd -D                                                                                                                                                                   
           │   ├─systemd-journald.service                                                                                                                                                                   
           │   │ └─14209 /usr/lib/systemd/systemd-journald                                                                                                                                                  
           │   ├─systemd-logind.service                                                                                                                                                                     
           │   │ └─14220 /usr/lib/systemd/systemd-logind                                                                                                                                                    
           │   └─systemd-networkd.service                                                                                                                                                                   
           │     └─14222 /usr/lib/systemd/systemd-networkd                                                                                                                                                  
           └─supervisor                                                                                                                                                                                     
             └─14188 /usr/bin/systemd-nspawn --quiet --keep-unit --boot --link-journal=try-guest --network-veth -U --settings=override --machine=node2-smoon4   
...

P.S. Instead of systemctl start node2-smoon4 I run systemctl start systemd-nspawn@node2-smoon4, hope this doesn't make any difference.

@evverx

This comment has been minimized.

Copy link
Member

commented Jun 3, 2017

Does this ... shows that that host don't use unified cgroup hierarchy?

This shows that the hybrid hierarchy is being used, so MemorySwapMax= and MemoryHigh= are not supported. systemd.unified_cgroup_hierarchy=yes should be passed to use the unified cgroup hierarchy. Could you try it?

By the way, there was an attempt to add MemorySwapLimit= for the v1 memory controller, but it wasn't merged #2171.

@vp1981

This comment has been minimized.

Copy link
Author

commented Jun 3, 2017

Thank you, that works perfectly fine. I added systemd.unified_cgroup_hierarchy=yes to kernel command line and rebooted. Now grep cgroup /proc/self/mountinfo shows only one line and container respects memory settings.

Actually I was a bit surprised that now it works differently: I used the same settings in memory.conf, run test program, its memory usage was increased up to 500M and then it was killed. So seems that container obeys MemoryHigh as "hard" memory limit. From earlier tests I assumed that MemoryHigh is in some sense a "soft" limit and systemd will allow container to increase memory usage until it reaches MemoryMax. Now I'm not sure about logic and the use of both settings. I was confused by this sentence (MemoryHigh from systemd.resource-control(5)):

Memory usage may go above the limit if unavoidable, but the processes are heavily slowed down and memory is taken away aggressively in such cases. This is the main mechanism to control memory usage of a unit.

After tests I thought that "may go above the limit" means that it will reach the limit of "MemoryMax", from the manual:

Specify the absolute limit on memory usage of the executed processes in this unit. If memory usage cannot be contained under the limit, out-of-memory killer is invoked inside the unit. It is recommended to use MemoryHigh= as the main control mechanism and use MemoryMax= as the last line of defense.

Now test shows me that I was wrong.

Anyway, as it works as I want I'm happy. As I understand, the new (?) unified cgroup hierarchy will be default after some time, is it?

@vp1981

This comment has been minimized.

Copy link
Author

commented Jun 5, 2017

Concerning my interest in difference between these types of settings I guess I can found answers in cgroup-v2.txt. Thanks again for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.