Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrective actions doesn't taken #61

Closed
hakavlad opened this issue Apr 14, 2019 · 24 comments
Closed

Corrective actions doesn't taken #61

hakavlad opened this issue Apr 14, 2019 · 24 comments

Comments

@hakavlad
Copy link

@hakavlad hakavlad commented Apr 14, 2019

I ran oomd with desktop.json. And at low memory oomd doesn't kill OOM (memhog was killed by OOMK):

Is it OK?

@danobi

This comment has been minimized.

Copy link
Member

@danobi danobi commented Apr 17, 2019

It looks like oomd isn't picking up any cgroups to monitor. Are you running systemd? If so, what does your cgroup hierarchy look like?

@hakavlad

This comment has been minimized.

Copy link
Author

@hakavlad hakavlad commented Apr 17, 2019

Are you running systemd?

Yes. It's Fedora 29 with default cgroups settings.

@hakavlad

This comment has been minimized.

Copy link
Author

@hakavlad hakavlad commented Apr 17, 2019

cgroup hierarchy look like

https://pastebin.com/WiJndMfa

@danobi

This comment has been minimized.

Copy link
Member

@danobi danobi commented Apr 17, 2019

It looks like you're on cgroup1. You'll need to be on cgroup2 for oomd to work.

@hakavlad

This comment has been minimized.

Copy link
Author

@hakavlad hakavlad commented Apr 18, 2019

OK, thanks, cgroup2 mounted at /sys/fs/cgroup/unified on my distro (hierarchy: https://pastebin.com/ffeyhGF6). I ran oomd as

# oomd_bin -f /sys/fs/cgroup/unified

The next problem is output like follow:

WARNING: cgroup memory controller not enabled on /sys/fs/cgroup/unified/...

I tried to enable memory controller (as recommended in https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git/tree/Documentation/admin-guide/cgroup-v2.rst):

# echo "+memory" > /sys/fs/cgroup/unified/cgroup.subtree_control

and got the next output:

-bash: echo: write error: No such file or directory

What am I doing wrong? How can I fix it and enable memory controller?

@hakavlad

This comment has been minimized.

Copy link
Author

@hakavlad hakavlad commented Apr 18, 2019

Maybe this should be remarked in readme:

oomd required to work:

  • cgroup2 must be enabled and path to cgroup2 directory must be correctly specified via -f option
  • cgroup memory controller should be enabled (and how to enable it)
@hakavlad

This comment has been minimized.

Copy link
Author

@hakavlad hakavlad commented Apr 18, 2019

What am I doing wrong? How can I fix it and enable memory controller?

I added cgroup_no_v1=memory in boot cmdline to fix the problem:

A cgroup v2 controller is available only if it is not currently in
use via a mount against a cgroup v1 hierarchy. Or, to put things
another way, it is not possible to employ the same controller against
both a v1 hierarchy and the unified v2 hierarchy. This means that it
may be necessary first to unmount a v1 controller (as described
above) before that controller is available in v2. Since systemd(1)
makes heavy use of some v1 controllers by default, it can in some
cases be simpler to boot the system with selected v1 controllers dis‐
abled. To do this, specify the cgroup_no_v1=list option on the ker‐
nel boot command line; list is a comma-separated list of the names of
the controllers to disable, or the word all to disable all v1 con‐
trollers.

-- http://man7.org/linux/man-pages/man7/cgroups.7.html

@bulbigood

This comment has been minimized.

Copy link

@bulbigood bulbigood commented Apr 19, 2019

What am I doing wrong? How can I fix it and enable memory controller?

I added cgroup_no_v1=memory in boot cmdline to fix the problem:

How did you make the memory controller stay on reboot? I did everything you wrote, but cgroup.subtree_control is reset.

@hakavlad

This comment has been minimized.

Copy link
Author

@hakavlad hakavlad commented Apr 19, 2019

How did you make the memory controller stay on reboot?

I didn't make it stay on reboot. You can edit /etc/default/grub, add cgroup_no_v1=memory to GRUB_CMDLINE_LINUX_DEFAULT line, and run grub2-mkconfig -o /boot/grub2/grub.cfg if you on Fedora.

@hakavlad

This comment has been minimized.

Copy link
Author

@hakavlad hakavlad commented Apr 19, 2019

Turn on memory controllers is not inherited by the underlying groups (for example, I enabled controller in system.slice, but it was not enabled in system services). It's a next problem.

@bulbigood

This comment has been minimized.

Copy link

@bulbigood bulbigood commented Apr 19, 2019

I ran into the same problem. OOMD throws out many warnings. That's why I wanted to try to find a way to install a controller when mounting cgroup on startup.

@hakavlad

This comment has been minimized.

Copy link
Author

@hakavlad hakavlad commented Apr 19, 2019

oomd needs more documentation.

You'll need to be on cgroup2 for oomd to work.

It's not enought. Cgroup2 was enabled, and oomd behavior was unexpected.

oomd should crash if /sys/fs/cgroup is not cgroup2 root dir to prevent unexpected behavior (and a warning should be displayed: specify the correct path to the сgroup2). Otherwise, its behavior will seem good, but in a critical situation, it screwed up.

@bulbigood

This comment has been minimized.

Copy link

@bulbigood bulbigood commented Apr 19, 2019

I went further.
I added the following parameters:
cgroup_no_v1=all cgroup_disable=cpu,io cgroup_enable=memory swapaccount=1

cgroup_no_v1 turned off all bindings to cgroups, so systemd did not mount cgroup_v2 to /sys/fs/cgroup/unified, but did it right in /sys/fs/cgroup.
Now -f parameter is not required for oomd.

After reboot I got this:
$ cat cgroup.subtree_control
memory pids

But oomd again produces errors! Like this:
[../util/Fs.cpp:174] Unable to open /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.swap.max/memory.stat

memory.swap.max is not directory...

@danobi

This comment has been minimized.

Copy link
Member

@danobi danobi commented Apr 19, 2019

Thank you for the feedback. I'll make sure to include all the suggestions when I start drafting the production setup guide this afternoon.

Re: cgroup1 + cgroup2: there will probably be subtle issues if oomd is used on a mixed hierarchy, ie some controllers on cgroup1 and some on cgroup2. I suggest keeping the system cgroup2 only and letting the hierarchy be managed by systemd. I think there's a special setting you can use in systemd to propagate controllers to all descendents.

According to the cgroup2 documentation:

No controller is enabled by default. Controllers can be enabled and disabled by writing to the “cgroup.subtree_control” file:

It's probably going to be a pain to manually do everything or write a script.

@danobi

This comment has been minimized.

Copy link
Member

@danobi danobi commented Apr 19, 2019

@bulbigood Can you share your config? I think I saw that issue before and maybe fixed it.

Another thing is that some distros set the kernel CONFIG_MEMCG_SWAP=n (ie disable memory cgroup swap control), so that might cause some issues.

@hakavlad

This comment has been minimized.

Copy link
Author

@hakavlad hakavlad commented Apr 21, 2019

WARNING: cgroup memory controller not enabled on /sys/fs/cgroup/unified/...

It's an ERROR if corrective actions cannot be performed without it.

Crash if memory controller is not enabled (it is ERROR, not WARNING) and corrective actions cannot be performed.

@bulbigood

This comment has been minimized.

Copy link

@bulbigood bulbigood commented Apr 21, 2019

@danobi
$ zgrep CONFIG_MEMCG /proc/config.gz
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_SWAP_ENABLED=y
CONFIG_MEMCG_KMEM=y

My boot options: https://pastebin.com/E1DCCLDt
oomd_bin log: https://pastebin.com/V6tDMM6C

@danobi

This comment has been minimized.

Copy link
Member

@danobi danobi commented Apr 23, 2019

@bulbigood can I ask what commit you're built against? I think this issue was fixed in deb7c91 .

I introduced this check: https://github.com/facebookincubator/oomd/blob/master/Oomd.cpp#L216-L219

@bulbigood

This comment has been minimized.

Copy link

@bulbigood bulbigood commented Apr 25, 2019

@danobi I compiled the latest version, and on it such a problem. Maybe the problem in oomd dependencies? I'll see which versions of the dependent packages.

@hakavlad

This comment has been minimized.

Copy link
Author

@hakavlad hakavlad commented Aug 11, 2019

systemd.unified_cgroup_hierarchy=1 swapaccount=1 in boot cmdline fixes the problem. Other boot options are overhead. Please update guides.

systemd.unified_cgroup_hierarchy

When specified without an argument or with a true argument, enables the usage of unified cgroup hierarchy (a.k.a. cgroups-v2). When specified with a false argument, fall back to hybrid or full legacy cgroup hierarchy.

If this option is not specified, the default behaviour is determined during compilation (the -Ddefault-hierarchy= meson option). If the kernel does not support unified cgroup hierarchy, the legacy hierarchy will be used even if this option is specified.

https://www.freedesktop.org/software/systemd/man/systemd.html#systemd.unified_cgroup_hierarchy

Now oomd works and kills memory hogs (and innocent processes too, seems like oomd kills all processes in memhog.scope, not only fattest process, it is not good for desktop), output: http://okturing.com/src/6737/body

@hakavlad hakavlad closed this Aug 11, 2019
@danobi

This comment has been minimized.

Copy link
Member

@danobi danobi commented Aug 13, 2019

systemd.unified_cgroup_hierarchy=1 swapaccount=1 in boot cmdline fixes the problem. Other boot options are overhead. Please update guides.

Will do.

seems like oomd kills all processes in memhog.scope, not only fattest process, it is not good for desktop

Yeah that's by design. The smallest granularity oomd will operate on is a cgroup. Doing per-process is kind of a mess, especially when multiple teams own different services on a system. It's much easier to delegate a cgroup tree than to try and manage individual processes everywhere.

@greatquux

This comment has been minimized.

Copy link

@greatquux greatquux commented Sep 6, 2019

sigh. we really need something like oomd on the desktop! in that case you really do want just the hogging process to be killed. though i understand this also isn't facebook's priority. i know my biggest culprit is firefox so i guess i'll try running it in its own cgroup and modify the oom.json to account for this. good instructions for doing this here:
https://samthursfield.wordpress.com/2015/05/07/running-firefox-in-a-cgroup-using-systemd/

@hakavlad

This comment has been minimized.

Copy link
Author

@hakavlad hakavlad commented Sep 6, 2019

we really need something like oomd on the desktop!

@greatquux Look at https://github.com/hakavlad/nohang, it was originally designed for desktops, it also supports PSI and GUI notifications and is very flexible.

@greatquux

This comment has been minimized.

Copy link

@greatquux greatquux commented Sep 6, 2019

thanks @hakavlad that looks awesome! it's probably just a matter of time until distributions start packaging something like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.