New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add systemd-oomd option to kill single, largest process instead of entire cgroup #25853
Comments
I assume this is meant for scope units only (since killing a pseudo random process of a service compromises the service anyway). @kevgrig Is your RFE covered by this change? |
It'd also be nice to have this not just for the user session, but also for individual apps. For instance: right now oomd will kill your whole browser, when killing the largest tab would have sufficed. Turning off oomd and letting the kernel OOM handler kill processes tends to lock up the system... I don't think that change you linked has anything to do with oomd, right? I don't think it addresses the RFE |
To be frank, I don't understand if #25376 applies as per the comment:
Whereas, in my case, I'm fine continuing to run However, in principle, yes, the ability to configure |
@AdrianVovk: the internal organization of browser processes (e.g. 1:1 mapping of processes to tab) is a different abstraction that IMO should stay out of oomd. For cooperative actions, you may be interested in #23606. @kevgrig (I missed the origin of the kill action in the report.) |
If I understand correctly, this wouldn't kill anything in my user's
This does seem like a nice future option, but I run an off-beat DE (Xfce) so I doubt that will come any time soon.
In my opinion, option 1 is not useful for my (very common) use case, and option 2 is currently inapplicable, and the OOM killer is something that already exists that would work well. |
oomd/systemd-oomd by design is cgroup v2 only so that we can take advantage of the resource control mechanisms provided by cgroups. In the past we discussed process killing but decided to stay firm on this direction. As such in an environment that doesn't group processes accordingly into cgroups, we recommend not using systemd-oomd. In your case if you still want to say, use systemd-oomd for system.slice and rely on the kernel OOM killer for everything else it's a matter of overriding/tweaking the options for Related, Benjamin Berg worked on cgroupify which some (all?) variants of Fedora uses to separate browser tabs into different cgroups. Maybe this can be tweaked for your environment. |
I run Fedora and it looks like this is a known issue with explicit disregard for non-GNOME and non-KDE DEs:
Opened Xfce issue: https://gitlab.xfce.org/xfce/xfce4-session/-/issues/158 |
It feels like implementing The current whole-cgroup killing mode is basically unusable for my main desktop workflow, which entails testing the software I develop in a terminal (Konsole) shell session. Every time things don't go entirely to plan and the tested process OOMs, Right now, the only way to preserve my sanity is to disable If adding a single-process killing mode is absolutely not going to fly, what do you consider the best course of action? Shells and terminal emulators are likely here to stay, and patching bash (and all the other shells out there) to put every single pipeline into a separate cgroup is impractical. |
@tootea You can wrap your debug/testing runs into a scope:
to restrict the blast radius. |
@Werkov Yes, I have actually tried that in the past and it gets part of the job done, but I couldn't find a way to make it work well with GDB. Either I end up debugging systemd-run instead of my executable (requiring a cumbersome sequence of commands to end up on the right process every time the debugger restarts it), or the entire gdb session ends up in the cgroup, so we're back to square one in terms of losing context. |
The bottom line is, while killing an entire cgroup might make great sense for server deployments with containerized services, I'm having a hard time imagining a desktop application which requires or prefers having its entire cgroup killed instead of just one process. Sure, we can apply workarounds to dozens of different apps to make them treat a cgroup as the new process (atomic unit of lifetime management), but wouldn't it be much easier to just improve systemd-oomd a bit and perhaps eventually make |
A frequent issue that I experience is Android Studio eating up all ram (or perhaps a rogue process) that then leads to an entire system halt. The only solution is to restart the entire system, perhaps ruining my work flow in the process. This is an issue I experience especially during power work, where Android Studio can use up to 16GB or more of memory. Having this feature would allow Android Studio to be axed, especially if a memory leak occurs, and then letting me resume my work. |
This should be also possible with DEs like GNOME or KDE that launch apps in dedicated scopes. |
I use GNOME. I am unaware of this concept of "implicit scoping" as I am an end user / developer. Android Studio is not installed via Flatpak, but via a tar archive containing the executable and other data, if that effects the situation in any way. |
Flatpak is not necessary. AS may ship its own desktop file or you can add your own. The app started via GNOME and this desktop entry should get its own .scope. Or if you run an executable manually, you may use the systemd-run wrapper above. |
Just adding my wish for this. The obvious example for me is:
As things stand the whole scope unit is taken out by systemd-oomd including my login. |
@traylenator Is (Nice idiom for memory allocation BTW.) |
My motivation for looking at systemd-oomd was a situation last week: Random user launched 1700 threads of clang compilation. System
Users slice
It's an extreme case and killing the whole user slice would have easily been justified but killing of the clangs in possibly |
I think there's a false premise here of "well, the user can just decide whether or not to use systemd-oomd". Various distributions are now shipping with For an advanced user, finding this thread and disabling For an average user just running a single-user DE, many won't even know where to look and they'll just connect Linux with, "It just randomly kills my entire session and logs me out." This is slightly made better by the major DEs integrating The underlying point of this issue is that becoming a default carries responsibilities. You could argue that Edit: A counter-argument to my point is that even if |
For DEs that don't use cgroups to isolate individual apps, yes 100% defaulting to oomd is a mistake on those systems. And to be frank it's up to whoever configured oomd to manage those DEs to configure things correctly
Not particularly... enabling oomd is just a few lines of configuration, disabling it is just as easy.
Ultimately DEs, apps (like browsers, IDEs), and other upstreams (like sshd) need to start organizing sub-components that the OS is allowed to manage separately into separated cgroups on their own, or distros should opt them out of oomd entirely (via something like IMO oomd performs well on server systems because server workloads know how to organize themselves into separate cgroups: the common microservice setup ends up w/ multiple containers each doing one small task, which lets oomd kill individual microservices instead of bringing down the whole stack. In contrast, desktop software that could benefit from letting the OS manage parts of the app independently (tabs in a web browser, compiler running in an IDE, etc) simply don't tell the OS about it.
Not quite sure I'm following what the situation is here. Is
If emacs were to be updated to put compilation jobs in dedicated cgroups, the system would know that's it's safe (for the consistency of emacs) to kill just the compilation job rather than killing the whole editor (or session), and you as a sysadmin would gain the ability to impose targeted resource limits on just emacs compilation jobs. |
I can expand on the two examples, the clang one a typical session might be:
So now I have a single session scope
Taking out the scope unit destroys the clangs (great) but also emacs, bash and the sshd bits. Game over for systemd-oomd and PSI metrics is so good at recognising the culprit cgroup. I do get part of reluctance is Looking also at
|
Ah I get the issue now. No much we can do automatically in this case. One possible solution is having users run long-running tasks via Another approach, which may be better for your situation, is opting
Well sshd already puts everything in a cgroup, courtesy of logind: But it might be useful for sshd to fork off a new cgroup anyway for the process that it runs, so that it can be managed independently. So:
Another reason we'd like to avoid shooting individual processes is because it'll leave services in an inconsistent state potentially. If some service/app/DE/whatever is running in one cgroup, then it's telling us that "I'm expecting that all these processes are managed together, as a group". Started/stopped/killed/resource limits applied/etc. So while it's technically possible to kill individual processes that's not what we're being asked to do by the services My understanding is that the kernel OOM killer's primary objective is to keep the kernel functional, no matter the consequences for userspace. So it can kill individual processes indiscriminately, potentially leaving services in inconsistent states. oomd wants to avoid this. Real-world example: If you have a browser w/ a bunch of tabs, and each tab is a collection of 3 processes, then it makes a lot more sense to kill all three processes that make up a tab at once as a group rather than killing the one process that's using the most RAM and letting the other two processes crash. If the browser doesn't tell us that it's safe to kill a tab's 3 processes without bringing down the rest of the browser, the best we can do is kill the whole browser |
Thanks a lot for the consideration - much appreciated.
We have both High and Max mem limits for sure, we'd be even more in mess with out them for sure. As far as I can tell the system recovered because their compilation finished.
Of course entirely sensible, killing a bit of sssd is clearly bad. User sessions are quite different I would say though. The bash shell and the thing running in it are hugely unrelated I'd say. The
Identifying the "bad" process is hard but that's not a reason to try - the per cgroup PSI at least gives you 100% correct user&cgroup and then highest memory in that is probably going to be right or at least not too wrong. If its the best you can do its still justified. User is destroying box - I'll keep killing what I think it is till I get it - bash and the login should survive since it's almost certainly not that. |
@AdrianVovk Let me repeat my question above which is still unanswered: If I put in the effort and submit a PR implementing (Whether or not that setting should become a default on desktop systems is secondary. Let's first have the opt-in for those of us who can't stand the current default. But I personally think it would be a good default behaviour on desktops. I can see how always shooting the whole cgroup might make perfect sense in a containerized HA server environment, because having a whole service die and another instance take over is preferable to having a degraded instance limping on. But IMHO most desktop workloads have little to gain from that ideological purity; there's always the user around who can freely pull the plug on their entire session if it ends up in a half-broken state, so I see no justification for a preemptive strike. And as long as a process is still the atomic unit of lifetime management in most contexts, apps have to somehow handle single processes quitting anyway. Even your hypothetical browser with three PIDs per tab will inevitably sometimes see one of the processes segfault, at which point something has to happen with the remaining two without any involvement of cgroups.) |
Not up to me, I'm not someone who works on oomd. I'm just a distro dev w/ opinions :) I'm personally not very against such a mode, but I'm not particularly for it either. I'm slightly concerned that given the capability you propose apps will just use it to avoid the work of properly supporting cgroups 🤷
Just to be clear: I'm not for killing the whole session. When the whole session dies at the hands of oomd it's a symptom of oomd being misconfigured, or a lack of use of cgroups in the DE and/or apps
Sure, it's just that this "something" can often be a segfault, or hang, or whatever undefined behavior of its own. If the OS can avoid inducing situations like that it should. In other words: individual tab processes crashing (which may ultimately bring down the rest of the tab's processes one way or another) due to programmer error is OK in my books. Individual tab processes being killed because the browser happens to be the biggest user of RAM in that moment, causing the rest of the browser to fall into some undefined behavior (probably having more processes ultimately crash) is something we can and should avoid as OS developers |
I'm not sure this is possible to implement reliably in userspace. IIUC, oomd uses cgroup's |
oomd actually kills other harmless processes, instead of just the "offending process". Here is an example of a real world usage on arch linux which have cgroups enabled. As you can see, it considers other harmless processes eligible to kill which is wrong! Only offending process/program here is firefox or should I say, "/user.slice/user-1000.slice/user@1000.service/app.slice/app-flatpak-org.mozilla.firefox-6051.scope"
|
Component
systemd-oomd
Is your feature request related to a problem? Please describe
systemd-oomd
killed my entire user session when killing the single, largest process would have been sufficient.I'm running systemd-oomd v249 and it's nice that v251 added more details of what is killed but it would be nicer if I could configure
systemd-oomd
to try killing the largest memory user first so that I don't lose other work.Describe the solution you'd like
Configuration option to kill the largest PID rather than the entire cgroup.
Describe alternatives you've considered
The systemd version you checked that didn't have the feature you are asking for
249
The text was updated successfully, but these errors were encountered: