New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
session dbus-daemon crashed (SIGABRT) in libnss-systemd #15859
Comments
This might fix systemd#15859, a bug which I find very puzzling.
This is very puzzling. I prepped a possible fix in #16041. But I am not sure if it actually fixes anything, but it's the only thing that remotely makes sense to me. We see EBADF on fclose() of an open_memstream() FILE*, and I am not sure how that possibly could ever happen... Does this happen regularly for you? |
It happens intermittently and I'm not sure how to trigger it.
If you would like I can apply the patch locally and see if it fixes the
issue, but I'm not sure how to tell the difference between the patch
fixing the issue and the conditions to trigger the issue not occurring.
…--
bye,
pabs
https://wiki.debian.org/PaulWise
|
do you have any special NSS setup btw? ldap or so? lots of users/groups or so? If the issue doesn't pop up with the patch applied anymore we should probably close this and assume it fixed until it pops up again and then reopen, or so? |
No special NSS setup, just a standalone desktop system. Two real users
and 73 system users for daemons etc.
I'll apply the patch tomorrow and report back at the end of the month
if there have been any dbus-daemon crashes or not.
…--
bye,
pabs
https://wiki.debian.org/PaulWise
|
This might fix #15859, a bug which I find very puzzling.
Applied the patch to my local system, will report any issues I see.
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
Unfortunately I just got another pair of crashes with Debian systemd
245.6-2 with the patch cherry-picked on top. Attached the backtraces:
https://github.com/systemd/systemd/files/4935409/crashes.txt
|
Does the version you tested include 75f6d5d? |
I assume so, given the comment "...with the patch cherry-picked on top." |
did you reboot after patching/rebuilding/installing systemd? NSS modules remain pinned in running processes... only way to update them safely is to reboot? |
The patch was included in the systemd I was testing.
The upgrade to the patched version occurred 2020-07-08 14:58:06
The crashes occurred after a boot at 2020-07-17 09:34:50
The crash occurred 2020-07-17 10:00:33
Looking at my systemd journal log, the crash appears to be associated
with one of my cron jobs. All of my cron jobs have special environment
variables set to be able to identify their processes. Looking at the
environment variables in the dbus-daemon core dumps, it appears to be
one that invokes `nm-online -q`. In addition to the special environment
variables, my cron jobs set DISPLAY=:0 which IIRC was required to make
evolution address-export and other things requiring dbus work in cron.
Since adding DISPLAY=:0 I have switched to Wayland but I didn't yet add
WAYLAND_DISPLAY=wayland-0 to my cron jobs. So perhaps the nm-online
failed to contact the session dbus-daemon (although it seems to work
most of the time) and started a new dbus-daemon, which didn't like the
environment it was in and passed incorrect things to libnss-systemd?
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
It is possible that the crash is caused by memory corruption in some other part of the code. I looked at the code involved and don't see anything obvious either. I guess we'll need to wait and see if other people hit this. |
https://bugzilla.redhat.com/show_bug.cgi?id=1823038 is another case. |
@keszybz The storage backing the FILE* is allocated by malloc by __open_memstream() and so is easily susceptible to buffer overflows from nearby chunks. In general it looks like you only use open_memstream_unlocked() from src/basic/fileio.c, and so any failure to coordinate by the callers could result in corruption. I looked over the code in src/basic/fd-util.c and I don't see anything immediately wrong. These cases are hard to track down :-( |
There is always exactly one caller — the memstream object is never passed outside of the originating function. (In the whole codebase there is one exception in dbus introspection code, but that's code path is not touched here.) So there is no question of coordination, afaict. |
But can it return EBADF in that case? We only check that the errno we got is not EBADF. |
No, you won't get EBADF in that case, and the allocation during |
FTR: I got another pair of crashes with libnss-systemd 245.7-1 from
Debian bullseye, AFAICT this version includes the patch from above.
I'm assuming that the backtrace isn't going to be interesting but if it
is please let me know before it is auto-deleted in a week's time.
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
I think we need to go over the glibc code with a fine comb and figure out in what circumstances it can return EBADF. Maybe EBADF is a legitimate return value for memstreams? |
@keszybz I rather suspect this is the consequence of unrelated memory corruption (but I could be wrong). |
FTR: I got another pair of crashes with libnss-systemd 246.6-1 from Debian bullseye. I'm assuming that the backtrace isn't going to be interesting but if it is please let me know before it is auto-deleted in a week's time. |
This might fix systemd#15859, a bug which I find very puzzling. (cherry picked from commit 75f6d5d)
This might fix systemd#15859, a bug which I find very puzzling. (cherry picked from commit 75f6d5d)
Is this still reproducible with current versions of systemd/glibc? If not, let's close this |
The dbus-daemon crash appears to be fixed for some time now, not seeing
it with systemd 252.6-1 and glibc 2.36-9 from Debian bookworm.
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
Thanks. Then, let's close this. |
systemd version the issue has been seen with
Used distribution
Unexpected behaviour you saw
Steps to reproduce the problem
The text was updated successfully, but these errors were encountered: