Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGABRT/SIGSEGV crash for 8.1903.0 possibly related to journald reloading #3537

Closed
onyxmaster opened this issue Mar 7, 2019 · 7 comments
Closed
Labels
Milestone

Comments

@onyxmaster
Copy link

onyxmaster commented Mar 7, 2019

Expected behavior

Should not crash when imjournal reloads.

Actual behavior

This is the output of journald logs:

Mar 07 16:04:59.551935 ara1 rsyslogd[45249]: imjournal: sd_journal_get_cursor() failed: Cannot assign requested address [v8.1903.0]
Mar 07 16:04:59.560719 ara1 rsyslogd[45249]: imjournal: journal reloaded... [v8.1903.0 try https://www.rsyslog.com/e/0 ]
Mar 07 16:04:59.564719 ara1 kernel: traps: rs:action-0-bui[45253] general protection ip:7fcbcd0fd828 sp:7fcbc48fe6e0 error:0 in libfastjson.so.4.2.0[7fcbcd0fb000+a000]
Mar 07 16:04:59.569003 ara1 systemd[1]: rsyslog.service: Main process exited, code=killed, status=11/SEGV
Mar 07 16:04:59.569505 ara1 systemd[1]: rsyslog.service: Unit entered failed state.
Mar 07 16:04:59.569594 ara1 systemd[1]: rsyslog.service: Failed with result 'signal'.
Mar 07 16:10:50.914214 ara1 rsyslogd[50863]: imjournal: sd_journal_get_cursor() failed: Cannot assign requested address [v8.1903.0]
Mar 07 16:10:50.921102 ara1 sshd[62308]: pam_unix(sshd:session): session closed for user hg
Mar 07 16:10:50.925060 ara1 systemd[1]: rsyslog.service: Main process exited, code=killed, status=6/ABRT
Mar 07 16:10:50.925440 ara1 systemd[1]: rsyslog.service: Unit entered failed state.
Mar 07 16:10:50.925479 ara1 systemd[1]: rsyslog.service: Failed with result 'signal'.

Every time the sd_journal_get_cursor() fails and imjournal reloads journal the rsyslog process dies with either SIGABRT or SIGSEGV.
The 8.1901.0 did not exhibit this issue:

Mar 06 00:18:33.990406 antares rsyslogd[21993]: imjournal: sd_journal_get_cursor() failed: Cannot assign requested address [v8.1901.
Mar 06 00:18:34.008503 antares rsyslogd[21993]: imjournal: journal reloaded... [v8.1901.0 try https://www.rsyslog.com/e/0 ]
... (no errors here)

Steps to reproduce the behavior

Install the 8.1903.0 and enable imjournal along with (probably, see trap above) enable JSON output (see configs below), then just wait till journald cursor "wraps" over.

The apt.log from the upgraded system:

Start-Date: 2019-03-06  09:00:14
Commandline: /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install rsyslog
Upgrade: rsyslog:amd64 (8.1901.0-0adiscon2xenial1, 8.1903.0-0adiscon1xenial1)
End-Date: 2019-03-06  09:00:18

Environment

  • rsyslog version: 8.1903.0 x86_64-pc-linux-gnu
  • platform: Ubuntu 16.04.6 LTS

These are the customized parts:
/etc/rsyslog.d/00-external.conf:

template(name="ForwardFormat" type="string" string="<%%PRI%>%TIMESTAMP:::date-rfc3339% %HOSTNAME% %jsonmesg%")
action(type="omfwd" template="ForwardFormat"
    target="..." port="12514" protocol="tcp" tcp_framing="octet-counted"
    keepalive="on" resendlastmsgonreconnect="on"
    queue.type="LinkedList" queue.filename="q_external" queue.size="10000" queue.maxdiskspace="100m" queue.saveonshutdown="on"
    action.resumeRetryCount="-1" action.reportSuspension="on" action.reportSuspensionContinuation="on" action.resumeInterval="10")

/etc/rsyslog.d/10-journald.conf:

module(load="imjournal" StateFile="imjournal.state" IgnorePreviousMessages="on" PersistStateInterval="100")

The Ubuntu enabled-by-default imklog module is disabled, and the $MaxMessageSize is set to 64k (but there are no records of such length in the journald journal).

@rgerhards
Copy link
Member

@jvymazal could you have a look? This is the only commit since 8.1901.0:

920c28f#diff-b1ea6478b8060f07cd30ecde78bfdc49

Of course, it could also be a side-effect of some other change - among others different jounal libs on the distro.

@jvymazal
Copy link
Contributor

jvymazal commented Mar 7, 2019

I will have a look, and try to verify once more the current free calls actually match allocations within journal (that is if the journal calls do the malloc()s per documented state, if they do not then I am afraid we are out of luck without help from someone from systemd people)

@onyxmaster
Copy link
Author

My Linux debugging skills are bit, well, flaky and I'm (yet) too lazy to build rsyslogd from source with debug info, but please take a look at this:

# valgrind rsyslogd -n
==110705== Memcheck, a memory error detector
==110705== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==110705== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==110705== Command: rsyslogd -n
==110705==
rsyslogd: Could not open output pipe '/dev/xconsole':: No such file or directory [v8.1903.0 try https://www.rsyslog.com/e/2039 ]
rsyslogd: imjournal: sd_journal_get_cursor() failed: Cannot assign requested address [v8.1903.0]
==110705== Thread 3 in:imjournal:
==110705== Invalid free() / delete / delete[] / realloc()
==110705==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==110705==    by 0x847FE1B: ??? (in /usr/lib/rsyslog/imjournal.so)
==110705==    by 0x84804A4: ??? (in /usr/lib/rsyslog/imjournal.so)
==110705==    by 0x45F795: ??? (in /usr/sbin/rsyslogd)
==110705==    by 0x505B6B9: start_thread (pthread_create.c:333)
==110705==    by 0x5D9841C: clone (clone.S:109)
==110705==  Address 0xf434fb0 is 0 bytes inside a block of size 127 free'd
==110705==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==110705==    by 0x847FE1B: ??? (in /usr/lib/rsyslog/imjournal.so)
==110705==    by 0x847FF52: ??? (in /usr/lib/rsyslog/imjournal.so)
==110705==    by 0x848053C: ??? (in /usr/lib/rsyslog/imjournal.so)
==110705==    by 0x45F795: ??? (in /usr/sbin/rsyslogd)
==110705==    by 0x505B6B9: start_thread (pthread_create.c:333)
==110705==    by 0x5D9841C: clone (clone.S:109)
==110705==  Block was alloc'd at
==110705==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==110705==    by 0x5DA9A17: __vasprintf_chk (vasprintf_chk.c:80)
==110705==    by 0x5DA98F1: __asprintf_chk (asprintf_chk.c:32)
==110705==    by 0x4051F85: sd_journal_get_cursor (in /lib/x86_64-linux-gnu/libsystemd.so.0.14.0)
==110705==    by 0x847FE2E: ??? (in /usr/lib/rsyslog/imjournal.so)
==110705==    by 0x84804A4: ??? (in /usr/lib/rsyslog/imjournal.so)
==110705==    by 0x45F795: ??? (in /usr/sbin/rsyslogd)
==110705==    by 0x505B6B9: start_thread (pthread_create.c:333)
==110705==    by 0x5D9841C: clone (clone.S:109)

P.S. Thanks for making rsyslogd. It's ability to easily ship everything everywhere without installing any extra software it's truly indispensable for us.

@onyxmaster
Copy link
Author

onyxmaster commented Mar 7, 2019

I took a look at the diff mentioned here, and line 474 looks particularly suspicious. The last_cursor is a global, which is free-d, then potentially assigned by sd_journal_get_cursor on the next line. It looks to me like if the sd_journal_get_cursor fails, the value of last_cursor is retained, so the next time the code gets here, it will be a double-free attempt since the local finalize_it block doesn't contain the last_cursor cleanup; not that I'm implying it should, I think resetting last_cursor to NULL should be done locally right before (or after, if logging the error is of utmost importance to you) calling the LogError.

I may be completely wrong =)

@jvymazal
Copy link
Contributor

You are definitely NOT completely wrong =) Actually you are very much correct, thank you for saving me some time with hunting this down, PR is already open so hopefully this will be resolved soon.

@rgerhards
Copy link
Member

closed via #3540

@rgerhards rgerhards added this to the v8.1904 milestone Mar 18, 2019
@lock
Copy link

lock bot commented Dec 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Dec 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants