New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large number of LookupDynamicUserByName calls by nss-systemd slows down pid 1 #9128
Comments
I should add that for the time being removing |
Update: we're now seeing a similar issue on the same machine, this time with
Interestingly, this whole codepath seems limited to |
hmm, this smells if something is leaked in PID 1 when these requests are made, and this causes O(n) behaviour or worse somewhere. Any chance you can get some profiling data about PID 1 when this happens? i.e. "perf" data or so that tells us what PID 1 is spending its time on? |
@yuwata: no, on this host we're not using pam_systemd so logind isn't really in play. It looks like we may have originally confused the effect for the cause here. Upon further testing, we found that this happens even when both nss-systemd and nss-mymachines are disabled, and there's little to no traffic on dbus. I've filed #9138 with some more details, as we think that's a separate issue, and we're still not sure if/how it relates to this one. |
Now that #9148 is merged, is there anything left on this issue? |
No, I don't think so. Will reopen if we find evidence that this is still an issue. |
systemd version the issue has been seen with
Used distribution
Expected behaviour you didn't see
Unexpected behaviour you saw
Steps to reproduce the problem
On a git master server, there are a large number of short-lived ssh connections, from a variety of different users. On this host,
nss-systemd
is enabled in/etc/nsswitch.conf
forpasswd
andgroup
, butpam_systemd
is not in use (meaning, sessions are not tracked bylogind
and user processes are accounted undersshd.service
). Runningbusctl monitor
shows that every time an incoming ssh connection is established, aLookupDynamicUserByName
call for the ssh login user is issued over the bus, which internally is processed by runningdynamic_user_lookup_name
(defined insrc/core/dynamic-user.c
). Over a variable period of time (generally between 6 hours and a day from what we've observed), the processing of these calls becomes slower and slower, to the point that several seconds delays are noticeable in thebusctl monitor
output. At the same time, commands likesystemctl
start lagging; eventually, pid 1 stops reaping children altogether, causing zombie processes to pile up by the thousand. It's worth noting that this only happens consistently on this one host, which happens to have a very large and essentially random distribution of user logins (meaning, all the time hundreds of different users might be logging in via ssh at the same time); similarly configured hosts with comparable amounts of traffic, but less distributed (meaning, most of the logins are concentrated into a few users) have not exhibited this issue so far. I'm still working on a synthetic repro for this issue, and will update this if I can come up with something useful. Also worth noting, as far as I can tell this is a regression in 238 (though I can't prove this conclusively at the moment).The text was updated successfully, but these errors were encountered: