Skip to content

systemd-coredump with journal storage doesn't handle cores >2GiB #26748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mgreen89 opened this issue Mar 10, 2023 · 5 comments · Fixed by #28127
Closed

systemd-coredump with journal storage doesn't handle cores >2GiB #26748

mgreen89 opened this issue Mar 10, 2023 · 5 comments · Fixed by #28127
Labels
bug 🐛 Programming errors, that need preferential fixing coredump
Milestone

Comments

@mgreen89
Copy link

mgreen89 commented Mar 10, 2023

systemd version the issue has been seen with

253

Used distribution

n/a (irrelevant - should occur across all distros)

Linux kernel version used

n/a (irrelevant - should occur on any still-supported kernel version)

CPU architectures issue was seen on

x86_64

Component

systemd-coredump

Expected behaviour you didn't see

systemd-coredump being able to store cores larger than 2GiB in size when using journal storage

Unexpected behaviour you saw

systemd-coredump doesn't pass core to journal for storage, and the following log is output in journalctl:

Mar 10 10:48:19 ip-10-0-0-10.ec2.internal systemd-coredump[27985]: Core data too short.

Steps to reproduce the problem

Configure the kernel core pattern to use systemd-coredump:

sysctl -w 'kernel.core_pattern=|/usr/lib/systemd/systemd-coredump %p %u %g %s %t %e'

(this may vary between versions, but the issue is present in the latest version of systemd-coredump)

Configure the coredump settings to handle large cores and set journal storage.
In coredump.conf:

[Coredump]
Storage=journal
ProcessSizeMax=16G
JournalSizeMax=16G

OPTIONAL: configure journal settings - this is not required as it doesn't get as far as journal:
In journald.conf

[Journal]
compress=yes
SystemMaxUse=50%

Start a process that uses more than 2GB of memory, and then generate a core from it (e.g. with kill -11 <pid> to send a SIGSEGV).

The core will be dumped to a temporary file in /var/lib/systemd/coredump.
In the journal handling section here this file is read in, but the read API reads at most 0x7ffff000 (2,147,479,552) bytes of data (see the Notes section here) so it will never read the entire core if it's larger than this and will always bail out here.

Additional program output to the terminal or log subsystem illustrating the issue

No response

@mgreen89 mgreen89 added the bug 🐛 Programming errors, that need preferential fixing label Mar 10, 2023
@poettering poettering added this to the v254 milestone Mar 13, 2023
@keszybz
Copy link
Member

keszybz commented May 17, 2023

In systemd 253 252, the default format for journal files has been saved to have 32 bit sizes and offsets. The maximum file size is 4 GiB. I'm not sure exactly what the biggest object size is possible, but it obviously must be lower than 4 GiB.

Does setting SYSTEMD_JOURNAL_COMPACT=0 in the journald environment fix the issue?

/cc @DaanDeMeyer

@mrc0mmand
Copy link
Member

Reproducer:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
	char *memhog = malloc(1024UL * 1024UL * 3072UL);

	printf("%c\n", *(char*)NULL);

	return 0;
}

Relevant settings from /etc/systemd/coredump.conf

[Coredump]
Storage=journal
ProcessSizeMax=32G
JournalSizeMax=32G

With defaults settings (i.e. compact journal):

# gcc -o main main.c
# ./main 
Segmentation fault (core dumped)
# journalctl -o short-monotonic --no-hostname -e -t systemd-coredump
[ 1106.687520] systemd-coredump[1362]: Core data too short.
[ 1106.959637] systemd-coredump[1362]: Failed to attach the core to the journal entry: Input/output error
[ 1106.959793] systemd-coredump[1362]: [🡕] Process 1360 (main) of user 0 dumped core.
                                       
                                       Stack trace of thread 1360:
                                       #0  0x0000000000401154 n/a (/root/main + 0x1154)
                                       #1  0x00007efdd1a2808a __libc_start_call_main (libc.so.6 + 0x2808a)
                                       #2  0x00007efdd1a2814b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2814b)
                                       #3  0x0000000000401075 n/a (/root/main + 0x1075)
                                       ELF object binary architecture: AMD x86-64

With SYSTEMD_JOURNAL_COMPACT=0:

# cat >/etc/systemd/system/systemd-journald.service.d/override.conf <<EOF
[Service]
Environment=SYSTEMD_JOURNAL_COMPACT=0
EOF
# systemctl restart systemd-journald
# journalctl --rotate
# journalctl --header | grep -E "^(File path|Incompatible)"
File path: /run/log/journal/9bf9b3f580594531b16df234623574a6/system.journal
Incompatible flags: COMPRESSED-ZSTD KEYED-HASH COMPACT
File path: /var/log/journal/71e85f810d2143719a6f34d2e38220e9/system.journal
Incompatible flags: COMPRESSED-ZSTD KEYED-HASH
File path: /var/log/journal/71e85f810d2143719a6f34d2e38220e9/system@fd65b9c89a704d1690367d5a28713f2d-0000000000000001-0005fcf9fdf11e68.journal
Incompatible flags: COMPRESSED-ZSTD KEYED-HASH COMPACT
# ./main 
Segmentation fault (core dumped)
# journalctl -o short-monotonic --no-hostname -e -t systemd-coredump
[ 1616.073968] systemd-coredump[1551]: Core data too short.
[ 1616.430179] systemd-coredump[1551]: Failed to attach the core to the journal entry: Input/output error
[ 1616.430633] systemd-coredump[1551]: [🡕] Process 1549 (main) of user 0 dumped core.
                                       
                                       Stack trace of thread 1549:
                                       #0  0x0000000000401154 n/a (/root/main + 0x1154)
                                       #1  0x00007f4162c2808a __libc_start_call_main (libc.so.6 + 0x2808a)
                                       #2  0x00007f4162c2814b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2814b)
                                       #3  0x0000000000401075 n/a (/root/main + 0x1075)
                                       ELF object binary architecture: AMD x86-64

So, as @mgreen89 already explained, the main issue here is that the read() syscall here:

n = read(fd, field + 9, size);

only reads up to 0x7ffff000 (2,147,479,552) bytes - that's not our limitation, but limitation of the read() (and similar) syscalls on both 32-bit and 64-bit platforms (see man 2 read, the NOTES section).

To mitigate this, we'd have to read the coredump in chunks - I guess using loop_read() from io-util.h might help us here?

@mrc0mmand
Copy link
Member

mrc0mmand commented May 31, 2023

Also, while loop_read() does help and the short read is gone, there's still a hardcoded journal entry limit (ENTRY_SIZE_MAX) that we'll hit instead:

if (st.st_size > ENTRY_SIZE_MAX / (sealed ? 1 : 2)) {
log_ratelimit_error(JOURNAL_LOG_RATELIMIT,
"File passed too large (%"PRIu64" bytes). Ignoring.",
(uint64_t) st.st_size);
return;

I.e. with patch:

diff --git a/src/coredump/coredump.c b/src/coredump/coredump.c
index a6b0d96488..64d3239037 100644
--- a/src/coredump/coredump.c
+++ b/src/coredump/coredump.c
@@ -614,7 +614,7 @@ static int allocate_journal_field(int fd, size_t size, char **ret, size_t *ret_s
 
         memcpy(field, "COREDUMP=", 9);
 
-        n = read(fd, field + 9, size);
+        n = loop_read(fd, field + 9, size, false);
         if (n < 0)
                 return log_error_errno((int) n, "Failed to read core data: %m");
         if ((size_t) n < size)
# ./main 
Segmentation fault (core dumped)
# journalctl -e --no-hostname -o short-monotonic
...
[  964.670714] kernel: main[12721]: segfault at 0 ip 0000000000401154 sp 00007ffcf3210ee0 error 4 in main[401000+1000] likely on CPU 6 (core 0, socket 6)
[  964.670727] kernel: Code: 00 00 0f 1f 40 00 f3 0f 1e fa eb 8a 55 48 89 e5 48 83 ec 10 b8 00 00 00 c0 48 89 c7 e8 f5 fe ff ff 48 89 45 f8 b8 00 00 00 00 <>
[  964.677575] systemd[1]: Started systemd-coredump@5-12722-0.service - Process Core Dump (PID 12722/UID 0).
[  973.631314] systemd-journald[12700]: File passed too large (2147479552 bytes). Ignoring.
[  974.073500] systemd[1]: systemd-coredump@5-12722-0.service: Deactivated successfully.
[  974.081679] systemd[1]: systemd-coredump@5-12722-0.service: Consumed 9.218s CPU time.

@mgreen89
Copy link
Author

Thanks for looking into this.

I was initially attracted to the journal storage mostly as it seemed to have a slightly more resilient story for core rotation. I've since changed just to use the MaxUse and KeepFree options built into systemd-coredump.

I think there's probably a hole here, but it might just be a documentation issue at this time if this is opening up further issues.

poettering added a commit to poettering/systemd that referenced this issue Jun 22, 2023
poettering added a commit to poettering/systemd that referenced this issue Jun 22, 2023
poettering added a commit to poettering/systemd that referenced this issue Jun 22, 2023
The man page claimed the default was 10M, but that's not true, it's
767M.

Also mention there's no point in increasing it further.

See: systemd#26748
@poettering
Copy link
Member

Fix in #28127

@poettering poettering linked a pull request Jun 22, 2023 that will close this issue
poettering added a commit to poettering/systemd that referenced this issue Jun 23, 2023
poettering added a commit to poettering/systemd that referenced this issue Jun 23, 2023
poettering added a commit to poettering/systemd that referenced this issue Jun 23, 2023
The man page claimed the default was 10M, but that's not true, it's
767M.

Also mention there's no point in increasing it further.

See: systemd#26748
valentindavid pushed a commit to valentindavid/systemd that referenced this issue Aug 8, 2023
Fixes: systemd#26748
(cherry picked from commit a73c74d)
(cherry picked from commit fa0ef8e)
(cherry picked from commit 540a490)
nmeyerhans pushed a commit to nmeyerhans/systemd that referenced this issue Jan 21, 2024
Fixes: systemd#26748
(cherry picked from commit a73c74d)
(cherry picked from commit fa0ef8e)
yuwata pushed a commit to yuwata/systemd that referenced this issue Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Programming errors, that need preferential fixing coredump
Development

Successfully merging a pull request may close this issue.

4 participants