Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
systemd-logind - memory leak on SSH connections #8015
Comments
yuwata
added
bug 🐛
login
labels
Jan 27, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
poettering
Jan 27, 2018
Owner
Hmm, so valgrind output is generated during normal runtime when the process is abnormally terminated by SIGINT. It shows all memory allocated at that time, which is different from leaked memory...
|
Hmm, so valgrind output is generated during normal runtime when the process is abnormally terminated by SIGINT. It shows all memory allocated at that time, which is different from leaked memory... |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
guilhermepiccoli
Jan 27, 2018
I think I understand what you're saying, and I disagree. Correct me if I figured it wrongly:
you're saying that Valgrind is measuring all the memory allocated during the process lifetime, and since we are terminating the process in an abnormal way (SIGINT), that memory isn't freed. The way you're saying seems that systemd-logind would free all memory in a regular program termination. Is my understanding of your statement right?
Well, the reason I consider it a clearly wrong behavior is simple: systemd-logind will consume all memory of a machine, if we continue doing SSH connection during it's lifetime. This behavior shouldn't be acceptable, do you agree? If all applications behaved this way, we couldn't get a machine running for 24 or 48h, because the applications would end up getting OOM'ed all the time.
As the graph (and data) showed, the memory consumption of logind is continuous increasing...
It is considered a leak if you allocate a memory, use it and don't free that memory in a feasible time. What is the point of freeing all the memory in the end of a program, letting it consume all RAM of a machine during the lifetime of the process? Basically it's the same as saying "this program should be terminated in a regular basis or it'll break your system" heheh
guilhermepiccoli
commented
Jan 27, 2018
|
I think I understand what you're saying, and I disagree. Correct me if I figured it wrongly: Well, the reason I consider it a clearly wrong behavior is simple: systemd-logind will consume all memory of a machine, if we continue doing SSH connection during it's lifetime. This behavior shouldn't be acceptable, do you agree? If all applications behaved this way, we couldn't get a machine running for 24 or 48h, because the applications would end up getting OOM'ed all the time. It is considered a leak if you allocate a memory, use it and don't free that memory in a feasible time. What is the point of freeing all the memory in the end of a program, letting it consume all RAM of a machine during the lifetime of the process? Basically it's the same as saying "this program should be terminated in a regular basis or it'll break your system" heheh |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
poettering
Jan 28, 2018
Owner
Well, I am not saying there wasn't a leak somewhere, I am just saying that the tool you used (or specifically, the way you used it) is not useful for finding it...
What does "loginctl" actually report when this happen? how many open sessions?
|
Well, I am not saying there wasn't a leak somewhere, I am just saying that the tool you used (or specifically, the way you used it) is not useful for finding it... What does "loginctl" actually report when this happen? how many open sessions? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
guilhermepiccoli
Jan 29, 2018
Thanks for your clarification Lennart!
I did the following experiment: ran the "while true; ssh" for 1 minute, after that captured the output of loginctl:
Then, waited another 9 minutes and re-captured the output of loginctl - I was hoping it maybe cleared the sessions due to a delayed mechanism (something in the line of garbage collection), but the results were the same:
Cheers,
Guilherme
guilhermepiccoli
commented
Jan 29, 2018
|
Thanks for your clarification Lennart! I did the following experiment: ran the "while true; ssh" for 1 minute, after that captured the output of loginctl: Then, waited another 9 minutes and re-captured the output of loginctl - I was hoping it maybe cleared the sessions due to a delayed mechanism (something in the line of garbage collection), but the results were the same: Cheers, Guilherme |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
boucman
Jan 29, 2018
Contributor
I had that behaviour once, which was due to an upgrade of logind without reboot of the machine (just saying in case it helps diagnose...)
|
I had that behaviour once, which was due to an upgrade of logind without reboot of the machine (just saying in case it helps diagnose...) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
guilhermepiccoli
Jan 29, 2018
Thanks boucman ...in my case it's consistent, I mean you can start a machine, run the ssh loop aforementioned, and you'll realize the continous increase of RAM.
BTW, I noticed that those sessions created from the ssh loop are kept on "closing" state - what does prevent them to be released? Seems to me if after the session is on closing state for a while, a timeout was triggered and the session was removed, we wouldn't see the memory issue.
guilhermepiccoli
commented
Jan 29, 2018
|
Thanks boucman ...in my case it's consistent, I mean you can start a machine, run the ssh loop aforementioned, and you'll realize the continous increase of RAM. BTW, I noticed that those sessions created from the ssh loop are kept on "closing" state - what does prevent them to be released? Seems to me if after the session is on closing state for a while, a timeout was triggered and the session was removed, we wouldn't see the memory issue. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
yuwata
Jan 31, 2018
Member
Hmm... I cannot reproduce this (with recent snapshot of systemd on Fedora 27 x86_64)...
|
Hmm... I cannot reproduce this (with recent snapshot of systemd on Fedora 27 x86_64)... |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
poettering
Jan 31, 2018
Owner
@guilhermepiccoli it appears you are leaking full sessions. Question is of course why. If you look into those sessions with "loginctl session-status", what do you see? Is this in some container env or so? or anything else weird? do those sessions possibly leave processes around? if so, we won't close them.
|
@guilhermepiccoli it appears you are leaking full sessions. Question is of course why. If you look into those sessions with "loginctl session-status", what do you see? Is this in some container env or so? or anything else weird? do those sessions possibly leave processes around? if so, we won't close them. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
guilhermepiccoli
Jan 31, 2018
yuwata, I was able to reproduce using upstream systemd, built from my own. Maybe the distro version is a bit different and somewhat does not show the issue?
Lennart: I've been testing using a LXD container, but the issue reproduces on bare-metal system, just re-checked. I'm using Ubuntu 18.04 candidate with upstream systemd.
I proposed a pull request that fixed it for me: #8062
I'm not sure how to relate issues/pull requests in GitHub, feel free to do it your way.
Thanks,
Guilherme
guilhermepiccoli
commented
Jan 31, 2018
|
yuwata, I was able to reproduce using upstream systemd, built from my own. Maybe the distro version is a bit different and somewhat does not show the issue? Lennart: I've been testing using a LXD container, but the issue reproduces on bare-metal system, just re-checked. I'm using Ubuntu 18.04 candidate with upstream systemd. I proposed a pull request that fixed it for me: #8062 Guilherme |
poettering
referenced this issue
Jan 31, 2018
Open
logind: effectively finalize/free a session after it was stopped #8062
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
poettering
Jan 31, 2018
Owner
Lennart: I've been testing using a LXD container, but the issue reproduces on bare-metal system, just re-checked.
cgroup empty notifications are not reliable inside containers, hence the LXD and the baremetal case are actually very different. Before looking into the LXD case I'd hence focus on the baremetal case.
cgroup empty notifications are not reliable inside containers, hence the LXD and the baremetal case are actually very different. Before looking into the LXD case I'd hence focus on the baremetal case. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
guilhermepiccoli
commented
Jan 31, 2018
|
Thanks for the hint! I'll focus on bare-metal then |
guilhermepiccoli commentedJan 26, 2018
Submission type
systemd version the issue has been seen with
Upstream version (at commit 6cddc79).
But it's reproducible in much older versions, tested with 204 and 229 too.
Used distribution
Ubuntu 18.04 (Bionic) - I've built the upstream version of systemd on top of it.
Bug description
The systemd-logind tool presents a clear memory leak on an event of SSH connection. Basically, at each SSH connection, some memory is allocated and this portion of memory stays there even after the SSH is disconnected - code seems to lack a free somewhere.
It has come to my knowledge a case of OOM in this tool due to the huge memory footprint of systemd-logind after some weeks of machine uptime.
Valgrind analysis showed many potential leaks in session creation routines:
valgrind_bionic.txt
Steps to reproduce the problem
To reproduce the issue and expose the memory leak, a simple ssh loop would be enough:
while true; do ssh <hostname> "whoami" 1>/dev/null; doneA 5min run led to these results:
The numbers of the graph:
upstreamAnon.txt
We measured the anonymous pages of systemd-logind based on /proc smaps of the process.