New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd-logind - memory leak on SSH connections #8015
Comments
Hmm, so valgrind output is generated during normal runtime when the process is abnormally terminated by SIGINT. It shows all memory allocated at that time, which is different from leaked memory... |
I think I understand what you're saying, and I disagree. Correct me if I figured it wrongly: Well, the reason I consider it a clearly wrong behavior is simple: systemd-logind will consume all memory of a machine, if we continue doing SSH connection during it's lifetime. This behavior shouldn't be acceptable, do you agree? If all applications behaved this way, we couldn't get a machine running for 24 or 48h, because the applications would end up getting OOM'ed all the time. It is considered a leak if you allocate a memory, use it and don't free that memory in a feasible time. What is the point of freeing all the memory in the end of a program, letting it consume all RAM of a machine during the lifetime of the process? Basically it's the same as saying "this program should be terminated in a regular basis or it'll break your system" heheh |
Well, I am not saying there wasn't a leak somewhere, I am just saying that the tool you used (or specifically, the way you used it) is not useful for finding it... What does "loginctl" actually report when this happen? how many open sessions? |
Thanks for your clarification Lennart! I did the following experiment: ran the "while true; ssh" for 1 minute, after that captured the output of loginctl: Then, waited another 9 minutes and re-captured the output of loginctl - I was hoping it maybe cleared the sessions due to a delayed mechanism (something in the line of garbage collection), but the results were the same: Cheers, Guilherme |
I had that behaviour once, which was due to an upgrade of logind without reboot of the machine (just saying in case it helps diagnose...) |
Thanks boucman ...in my case it's consistent, I mean you can start a machine, run the ssh loop aforementioned, and you'll realize the continous increase of RAM. BTW, I noticed that those sessions created from the ssh loop are kept on "closing" state - what does prevent them to be released? Seems to me if after the session is on closing state for a while, a timeout was triggered and the session was removed, we wouldn't see the memory issue. |
Hmm... I cannot reproduce this (with recent snapshot of systemd on Fedora 27 x86_64)... |
@guilhermepiccoli it appears you are leaking full sessions. Question is of course why. If you look into those sessions with "loginctl session-status", what do you see? Is this in some container env or so? or anything else weird? do those sessions possibly leave processes around? if so, we won't close them. |
yuwata, I was able to reproduce using upstream systemd, built from my own. Maybe the distro version is a bit different and somewhat does not show the issue? Lennart: I've been testing using a LXD container, but the issue reproduces on bare-metal system, just re-checked. I'm using Ubuntu 18.04 candidate with upstream systemd. I proposed a pull request that fixed it for me: #8062 Guilherme |
cgroup empty notifications are not reliable inside containers, hence the LXD and the baremetal case are actually very different. Before looking into the LXD case I'd hence focus on the baremetal case. |
Thanks for the hint! I'll focus on bare-metal then |
Any update on this? We'r experiencing the same on bionic LXD container, and it's rather annoying since we have munin monitoring which logs in/off every 10 minutes, so we'r hitting session limit in a week or so... |
@fr33l Please provide results of |
This one for regular user:
|
Just in case if someone will find this in google, we were able to solve this by enabling security nesting for LXD container:
|
Thanks @fr33l , pretty useful! |
I could reproduce this on Debian 9.5 as a VM on vSphere 5.5:
with 2 parallel executions
I had to ctrl-c and re-execute it once to see it. (no lxc involved) |
Is there a workaround or setting to limit memory usage? |
Hi @poettering, we're seeing this with CloudStack virtual routers that are based on Debian 9.6 and as @resmo we see the issue of memory growth and the version as follows:
Can you help, advise any workaround? |
I also tried a newer version from debian backports, seeing the same issue:
|
On Debian I fix the memory consumption commenting the following line on /etc/pam.d/common-session:
But, with that, the ssh sessions doesn't terminate correctly. The fix is already done:
|
Add vm.min_free_kbytes to sysctl periodically clear disk cache (depending on memory size) only start guest services specific to hypervisor use systemvm code to determine hypervisor type (not systemd) start cloud service at end of post init rather than through systemd reduce initial threads started for httpd fix vmtools config file disable all required services (do not start on boot) start only required services during post init. add '@include null' to /etc/pam.d/systemd-user as per systemd/systemd#8015 (comment) remove cloud agent service startup from VR
There is a known bug in Linux Kernel CGrouop handling until 5.3: pam_systemd creates a CGroup per session, which is not freed completely and slowly eats all memory. |
let's close this. There was a kernel bug in this, and it has long been fixed. If this is reproducible on current systemd systems, please file a new bug |
Submission type
systemd version the issue has been seen with
Upstream version (at commit 6cddc79).
But it's reproducible in much older versions, tested with 204 and 229 too.
Used distribution
Ubuntu 18.04 (Bionic) - I've built the upstream version of systemd on top of it.
Bug description
The systemd-logind tool presents a clear memory leak on an event of SSH connection. Basically, at each SSH connection, some memory is allocated and this portion of memory stays there even after the SSH is disconnected - code seems to lack a free somewhere.
It has come to my knowledge a case of OOM in this tool due to the huge memory footprint of systemd-logind after some weeks of machine uptime.
Valgrind analysis showed many potential leaks in session creation routines:
valgrind_bionic.txt
Steps to reproduce the problem
To reproduce the issue and expose the memory leak, a simple ssh loop would be enough:
while true; do ssh <hostname> "whoami" 1>/dev/null; done
A 5min run led to these results:
The numbers of the graph:
upstreamAnon.txt
We measured the anonymous pages of systemd-logind based on /proc smaps of the process.
The text was updated successfully, but these errors were encountered: