Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-oomd: kill more process than expected (and reported) #32304

Open
tchernomax opened this issue Apr 16, 2024 · 0 comments
Open

systemd-oomd: kill more process than expected (and reported) #32304

tchernomax opened this issue Apr 16, 2024 · 0 comments
Labels
bug 🐛 Programming errors, that need preferential fixing oomd

Comments

@tchernomax
Copy link
Contributor

tchernomax commented Apr 16, 2024

systemd version the issue has been seen with

255

Used distribution

Archlinux

Linux kernel version used

6.8.5-arch1-1

CPU architectures issue was seen on

x86_64

Component

systemd-oomd

Expected behaviour you didn't see

avril 16 21:08:50 systemd-oomd[93234]: Considered 97 cgroups for killing, top candidates were:
avril 16 21:08:50 systemd-oomd[93234]:         Path: /user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome-slack-96235.scope
avril 16 21:08:50 systemd-oomd[93234]:                 Memory Pressure Limit: 0.00%
avril 16 21:08:50 systemd-oomd[93234]:                 Pressure: Avg10: 1.77 Avg60: 0.69 Avg300: 0.32 Total: 1s
avril 16 21:08:50 systemd-oomd[93234]:                 Current Memory Usage: 388.4M
avril 16 21:08:50 systemd-oomd[93234]:                 Memory Min: 0B
avril 16 21:08:50 systemd-oomd[93234]:                 Memory Low: 0B
avril 16 21:08:50 systemd-oomd[93234]:                 Pgscan: 3033791
avril 16 21:08:50 systemd-oomd[93234]:                 Last Pgscan: 2934276
avril 16 21:08:50 systemd-oomd[93234]:         Path: /user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome-firefox\x2dw-96379.scope
…
avril 16 21:08:50 systemd-oomd[93234]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/tmux-spawn-37cd26f4-2381-4d42-89b4-858274ab358c.scope due to memory pressure for /user.slice/user-1000.slice/user@1000.service being 11.05% > 10.00% for > 5s with reclaim activityavril 16 21:08:50 systemd[1562]: tmux-spawn-37cd26f4-2381-4d42-89b4-858274ab358c.scope: systemd-oomd killed 3 process(es) in this unit.
avril 16 21:08:50 systemd[1562]: tmux-spawn-37cd26f4-2381-4d42-89b4-858274ab358c.scope: Failed with result 'oom-kill'.
avril 16 21:08:50 systemd[1562]: tmux-spawn-37cd26f4-2381-4d42-89b4-858274ab358c.scope: Consumed 3.880s CPU time.

and systemd-oomd kill tmux-spawn-37cd26f4-2381-4d42-89b4-858274ab358c.scope.

Unexpected behaviour you saw

exactly the same output as in "Expected behaviour you didn't see" but with those additional lines:

avril 16 21:08:50 systemd[1562]: amazon-music.service: systemd-oomd killed some process(es) in this unit.
avril 16 21:08:50 systemd[1562]: app-gnome-slack-96235.scope: systemd-oomd killed some process(es) in this unit.
avril 16 21:08:50 systemd[1562]: teams.service: systemd-oomd killed some process(es) in this unit.
avril 16 21:08:50 systemd[1562]: app-gnome-firefox\x2dw-96379.scope: systemd-oomd killed some process(es) in this unit.

→ systemd-oomd killed 4 more cgroup than what it reported/logged.

Steps to reproduce the problem

oomctl:

Dry Run: no
Swap Used Limit: 98.00%
Default Memory Pressure Limit: 60.00%
Default Memory Pressure Duration: 5s
System Context:
        Memory: Used: 0B Total: 0B
        Swap: Used: 0B Total: 0B
Swap Monitored CGroups:
Memory Pressure Monitored CGroups:
        Path: /system.slice
                Memory Pressure Limit: 80.00%
                Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 7s
                Current Memory Usage: 625.9M
                Memory Min: 0B
                Memory Low: 0B
                Pgscan: 18254138
                Last Pgscan: 18254138
        Path: /user.slice/user-1000.slice/user@1000.service
                Memory Pressure Limit: 10.00%
                Pressure: Avg10: 0.18 Avg60: 0.03 Avg300: 0.00 Total: 18s
                Current Memory Usage: 8.4G
                Memory Min: 0B
                Memory Low: 0B
                Pgscan: 26770970
                Last Pgscan: 26770970

and some stress --vm 2 --vm-bytes 1G --vm-keep --vm-hang 10 to build memory pressure.

Additional program output to the terminal or log subsystem illustrating the issue

I strace systemd-oomd and saw it killing all those cgroup.
eg.:

openat(AT_FDCWD, "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/teams.service/cgroup.procs", O_RDONLY|O_CLOEXEC) = 11
fstat(11, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(11, "95157\n95167\n95169\n95173\n95174\n95189\n95251\n95253\n95263\n95282\n95496\n95508\n", 4096) = 72
pidfd_open(95157, 0)                    = 12
pidfd_send_signal(12, SIGKILL, NULL, 0) = 0
close(12)                               = 0
pidfd_open(95167, 0)                    = 12

from the manual :
https://www.freedesktop.org/software/systemd/man/latest/systemd.resource-control.html#ManagedOOMSwap=auto%7Ckill

If the cgroup passes the limits set by oomd.conf(5) or the unit configuration, systemd-oomd will select a descendant cgroup and send SIGKILL to all of the processes under it.

There is nothing mentioning that it will kill 5 cgroups at the same time.

Is there something I don't get ?

@tchernomax tchernomax added the bug 🐛 Programming errors, that need preferential fixing label Apr 16, 2024
@github-actions github-actions bot added the oomd label Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing oomd
Development

No branches or pull requests

1 participant