logreader: stop processing if the connection is closed #3985

sparlane · 2022-04-14T01:26:55Z

If the connection is closed and log_reader_work_finished was running,
calling log_pipe_notify will result in the LogReader object being no longer
usable because it was cleaned up.

So to prevent SEGV, don't allow execution to continue into the next block
when the connection is marked as NC_CLOSE

It seems more likely for this to occur when the log reader is a socket
that was opened by "logger" from busybox.

We started experiencing this after updating from glibc 2.29 to 2.35
and gcc from 8.3 to 11.2.

Signed-off-by: Scott Parlane scott.parlane@alliedtelesis.co.nz

If the connection is closed and log_reader_work_finished was running, calling log_pipe_notify will result in the LogReader object being no longer usable because it was cleaned up. So to prevent SEGV, don't allow execution to continue into the next block when the connection is marked as NC_CLOSE It seems more likely for this to occur when the log reader is a socket that was opened by "logger" from busybox. We started experiencing this after updating from glibc 2.29 to 2.35 and gcc from 8.3 to 11.2. Signed-off-by: Scott Parlane <scott.parlane@alliedtelesis.co.nz>

kira-syslogng · 2022-04-14T01:26:58Z

This user does not have permission to start the build. Can one of the admins verify this patch and start the build?
(admin: you have the next options (make sure you checked the code):
"ok to test" to accept this pull request (and further changes) for testing
"test this please" for a one time test run
"add to whitelist" add author of a Pull Request to whitelist (globally, be careful, it means this user can trigger kira for any PR)
do nothing -> CI won't start)

kira-syslogng · 2022-04-14T01:26:58Z

This user does not have permission to start the build. Can one of the admins verify this patch and start the build?
(admin: you have the next options (make sure you checked the code):
"ok to test" to accept this pull request (and further changes) for testing
"test this please" for a one time test run
"add to whitelist" add author of a Pull Request to whitelist (globally, be careful, it means this user can trigger kira for any PR)
do nothing -> CI won't start)

bazsi · 2022-04-14T04:23:37Z

@kira-syslogng ok to test;

kira-syslogng · 2022-04-14T04:28:01Z

Build FAILURE

bazsi · 2022-04-14T04:29:58Z

Can you show a backtrace that this fixes? The "self" pointer at that point should still be referenced, so at the very least the pointer should still be valid.

The NC_CLOSE handler in afsocket-source.c will deinit the pipe (afsocket_sd_close_connection) prior to returning, which means that the condition right after the log_pipe_notify() call (in log_reader_work_finished) should not be entered. So I think your change should not actually change behaviour.

sparlane · 2022-04-20T01:00:47Z

From syslog-ng 3.21.1:

Thread 1 (Thread 0x77fa5020 (LWP 405)):
#0 0x0ff59618 in log_proto_server_reset_error (s=0x0) at ./lib/logproto/logproto-server.h:164
#1 log_reader_work_finished (s=0x10053800) at lib/logreader.c:426
#2 0x0fc75b98 in iv_fd_poll_and_run (st=st@entry=0x100270a0, abs=) at iv_fd.c:219
#3 0x0fc770b4 in iv_main () at iv_main_posix.c:112
#4 0x0ff5e060 in main_loop_run (self=0xfff629c <main_loop>) at lib/mainloop.c:657
#5 0x10001264 in main (argc=, argv=) at syslog-ng/main.c:316

I added debug print statements to the start of every function in logreader.c, and most of the branches
pending_close wasn't set, but notify_code was set to NC_CLOSE, and it called the close/free functions while it was processing the message.

Yes, I know 3.21.1 is old, but that is what we currently use, I did test with the latest release and I get an identical backtrace (with different line numbers)

There are some limitations on where we see this:

if init is upstart we see this, if it's systemd we don't (I suspect because of the way that systemd interferes with syslog)
we don't see this on our x86_64 target, but do see it on at least armv7 and ppc
I think it's triggered specifically by the use of logger, which starts opens the the socket to syslog, writes and message and then closes the socket.
I can get this to happen often if I replace our syslog-ng-ctl reload/reopen calls with restarting the syslog-ng job, I suspect this is because syslog-ng needs to be busy to have both the close and a message at the same time.

bazsi · 2022-04-20T06:08:09Z

Thanks for the dump. I am looking into this with v3.21 sources.

…

On Wed, Apr 20, 2022, 03:01 Scott Parlane ***@***.***> wrote: From syslog-ng 3.21.1: Thread 1 (Thread 0x77fa5020 (LWP 405)): #0 0x0ff59618 in log_proto_server_reset_error (s=0x0) at ./lib/logproto/logproto-server.h:164 #1 <#1> log_reader_work_finished (s=0x10053800) at lib/logreader.c:426 #2 <#2> 0x0fc75b98 in iv_fd_poll_and_run ***@***.***=0x100270a0, abs=) at iv_fd.c:219 #3 <#3> 0x0fc770b4 in iv_main () at iv_main_posix.c:112 #4 <#4> 0x0ff5e060 in main_loop_run (self=0xfff629c <main_loop>) at lib/mainloop.c:657 #5 <#5> 0x10001264 in main (argc=, argv=) at syslog-ng/main.c:316 I added debug print statements to the start of every function in logreader.c, and most of the branches pending_close wasn't set, but notify_code was set to NC_CLOSE, and it called the close/free functions while it was processing the message. Yes, I know 3.21.1 is old, but that is what we currently use, I did test with the latest release and I get an identical backtrace (with different line numbers) There are some limitations on where we see this: - if init is upstart we see this, if it's systemd we don't (I suspect because of the way that systemd interferes with syslog) - we don't see this on our x86_64 target, but do see it on at least armv7 and ppc - I think it's triggered specifically by the use of logger, which starts opens the the socket to syslog, writes and message and then closes the socket. - I can get this to happen often if I replace our syslog-ng-ctl reload/reopen calls with restarting the syslog-ng job, I suspect this is because syslog-ng needs to be busy to have both the close and a message at the same time. — Reply to this email directly, view it on GitHub <#3985 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFOK5X3X5MOBCRHUA6TYITVF5JMVANCNFSM5TMKCQIQ> . You are receiving this because you commented.Message ID: ***@***.***>

bazsi · 2022-04-27T18:18:18Z

Sorry, this scrolled out of my attention span in my INBOX. I've glimpsed at the backtrace you sent and it's genuinely useful. Thanks for that. I need to get going again :( but will keep you posted as I look into it again. It might actually be related to CPU architectures, self->proto is manipulated in parallel threads and in that sense, ARM and ppc might be more sensitive than x86.

…

On Wed, Apr 20, 2022 at 3:01 AM Scott Parlane ***@***.***> wrote: From syslog-ng 3.21.1: Thread 1 (Thread 0x77fa5020 (LWP 405)): #0 0x0ff59618 in log_proto_server_reset_error (s=0x0) at ./lib/logproto/logproto-server.h:164 #1 <#1> log_reader_work_finished (s=0x10053800) at lib/logreader.c:426 #2 <#2> 0x0fc75b98 in iv_fd_poll_and_run ***@***.***=0x100270a0, abs=) at iv_fd.c:219 #3 <#3> 0x0fc770b4 in iv_main () at iv_main_posix.c:112 #4 <#4> 0x0ff5e060 in main_loop_run (self=0xfff629c <main_loop>) at lib/mainloop.c:657 #5 <#5> 0x10001264 in main (argc=, argv=) at syslog-ng/main.c:316 I added debug print statements to the start of every function in logreader.c, and most of the branches pending_close wasn't set, but notify_code was set to NC_CLOSE, and it called the close/free functions while it was processing the message. Yes, I know 3.21.1 is old, but that is what we currently use, I did test with the latest release and I get an identical backtrace (with different line numbers) There are some limitations on where we see this: - if init is upstart we see this, if it's systemd we don't (I suspect because of the way that systemd interferes with syslog) - we don't see this on our x86_64 target, but do see it on at least armv7 and ppc - I think it's triggered specifically by the use of logger, which starts opens the the socket to syslog, writes and message and then closes the socket. - I can get this to happen often if I replace our syslog-ng-ctl reload/reopen calls with restarting the syslog-ng job, I suspect this is because syslog-ng needs to be busy to have both the close and a message at the same time. — Reply to this email directly, view it on GitHub <#3985 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFOK5X3X5MOBCRHUA6TYITVF5JMVANCNFSM5TMKCQIQ> . You are receiving this because you commented.Message ID: ***@***.***>

-- Bazsi

bazsi · 2022-04-28T20:43:24Z

I was delving into the sources and have so fair failed to find a scenario which leads to this crash. Certainly I am missing something. If you still have a core file on the crash, can you please run these gdb commands and record their results:

(gdb) bt full
(gdb) frame 1
(gdb) p *s

These are things I've checked:

the reference to the self pointer should be ensured by the reference taken by MainLoopIOJob. reader->io_job.engage takes this reference as an I/O event is submitted to the I/O work pool and is not released until log_reader_work_finished() returns. The crash happens within log_reader_work_finished() at this reference should still be intact.
if the self pointer is still valid, the question is how self->proto can be NULL at the point of the crash.
self->proto is initialized to NULL as a LogReader is created, but is set via log_reader_open() right after the constructor call. so there's no LogReader with self->proto == NULL unless we are in the process of closing it.
The process of closing happens when we read EOF on the socket. When we encounter an EOF, self->notify_code is set to NC_CLOSE and log_reader_work_finished() is invoked in the main thread (this function is the completion callback of the READ event the we finish handling).
As we enter log_reader_work_finished(), we check if self->pending_close is set. It should never be set in the afsocket case, as it is used by the file driver for cases where we close a file in the non-main thread.
The next thing in log_reader_work_finished() is to call log_pipe_notify() that calls the afsocket_sc_notify() virtual function to inform the socket connection that the associated connection is closing.
- log_pipe_notify() calls afsocket_sd_close_connection(), which calls log_reader_close_proto(). log_reader_close_proto() will set self->proto to NULL (e.g. the crash trigger), but also calls log_pipe_deinit() on the source connection.
- The deinit method for AFSocketSourceConnection (afsocket_sc_deinit) drops the owner reference AND calls log_pipe_deinit() on the LogReader.
- The deinit method of the LogReader (log_reader_deinit), will clear the PIF_INITIALIZED flag in LogReader->flags
After returning from log_pipe_notify(), we check the PIF_INITIALIZED flag and if it's cleared we don't do anything. If it's not we enter the crash location, and if self->proto is NULL, we indeed crash.

In summary the PIF_INITIALIZED flag in LogReader->flags prevents the crash, it is cleared in the same process that sets proto to NULL.

The only way this could happen, if log_reader_work_finished() would reenter a second time, maybe as an invocation from log_pipe_notify() but work_finished is only called as a completion callback (either by ivykis in threaded mode, or ourselves if threading is disabled). The backtrace shows, that work_finished is called by ivykis, so threading indeed seems to be enabled.

Also NOTE that the entire list above is executed in series by the main thread. The asynchronous closing that happens in wildcard-file() is not usedin the case of afsocket.

With the extra information that you can hopefully send me, could pinpoint where I am not right in the analysis above.

sparlane · 2022-04-29T00:51:59Z

Hi,

I had to regenerate it, my exact code is here:
https://github.com/sparlane/syslog-ng/tree/syslog_3_21_debug

Core was generated by `/usr/sbin/syslog-ng -F -f /etc/syslog-ng.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0ff598bc in log_proto_server_reset_error (s=0x0) at ./lib/logproto/logproto-server.h:164
164	./lib/logproto/logproto-server.h: No such file or directory.
(gdb) bt full
#0  0x0ff598bc in log_proto_server_reset_error (s=0x0) at ./lib/logproto/logproto-server.h:164
No locals.
#1  log_reader_work_finished (s=0x10055710) at lib/logreader.c:440
        self = 0x10055710
#2  0x0fc75b98 in iv_fd_poll_and_run (st=st@entry=0x10028250, abs=<optimized out>) at iv_fd.c:219
        fd = 0x10055898
        active = {next = 0x7fd07de8, prev = 0x7fd07de8}
        run_timers = <optimized out>
#3  0x0fc770b4 in iv_main () at iv_main_posix.c:112
        _abs = {tv_sec = 0, tv_nsec = 0}
        abs = <optimized out>
        st = 0x10028250
        run_timers = <optimized out>
#4  0x0ff5e338 in main_loop_run (self=0xfff62c4 <main_loop>) at lib/mainloop.c:657
No locals.
#5  0x10001308 in main (argc=<optimized out>, argv=<optimized out>) at syslog-ng/main.c:322
        rc = 0
        ctx = <optimized out>
        error = 0x0
        main_loop = 0xfff62c4 <main_loop>
        exit_before_main_loop_run = <optimized out>
(gdb) frame 1
#1  log_reader_work_finished (s=0x10055710) at lib/logreader.c:440
440	lib/logreader.c: No such file or directory.
(gdb) p *s
Attempt to dereference a generic pointer.
(gdb) p *self
$2 = {super = {super = {ref_cnt = {counter = 268771413}, flags = 1726283135, queue = 0xff5a814 <log_source_queue>, cfg = 0x100441f0, expr_node = 0x0, pipe_next = 0x10055690, discarded_messages = 0x0, persist_name = 0x0, 
      plugin_name = 0x0, pre_init = 0x0, init = 0xff596f0 <log_reader_init>, deinit = 0xff59010 <log_reader_deinit>, post_deinit = 0x0, generate_persist_name = 0x0, clone = 0x0, free_fn = 0xff58e3c <log_reader_free>, notify = 0x0}, 
    options = 0x100676ec, threaded = 0, pos_tracked = 0, stats_id = 0x10054cd0 "\020\004X\264f\344\375\177\017\365\236", <incomplete sequence \374>, stats_instance = 0x10055bb0 "\020\004V\245f\344\375\177eam,anonymous", window_size = {
      counter = {value = 2147483647}}, last_message_seen = 0x0, recvd_messages = 0x0, last_ack_count = 0, ack_count = 0, window_full_sleep_nsec = 0, last_ack_rate_time = {tv_sec = 0, tv_nsec = 0}, ack_tracker = 0x10056e00, 
    wakeup = 0xff58dc4 <log_reader_wakeup>, window_empty_cb = 0xff58dd8 <log_reader_window_empty>}, proto = 0x0, immediate_check = 0, control = 0x10055690, options = 0x100676e8, poll_events = 0x0, peer_addr = 0x1004b908, restart_task = {
    cookie = 0x10055710, handler = 0xff598e0 <log_reader_io_handle_in>, pad = {0x100557b4, 0x100557b4, 0x23, 0x0, 0x0, 0x0}}, schedule_wakeup = {cookie = 0x10055710, handler = 0xff597b0 <log_reader_wakeup_triggered>, owner = 0x10028250, 
    list = {next = 0x100557d8, prev = 0x100557d8}}, last_msg_sent_event = {cookie = 0x10055710, handler = 0xff58d0c <_last_msg_sent>, owner = 0x10028250, list = {next = 0x100557ec, prev = 0x100557ec}}, io_job = {
    engage = 0xff57710 <log_pipe_ref>, work = 0xff59344 <log_reader_work_perform>, completion = 0xff597d0 <log_reader_work_finished>, release = 0xff57790 <log_pipe_unref>, user_data = 0x10055710, working = 0, cond = 0, work_item = {
      cookie = 0x100557f4, work = 0xff5f6b0 <_work>, completion = 0xff5f64c <_complete>, list = {next = 0x0, prev = 0x0}}}, watches_running = 0, suspended = 0, notify_code = 0, pending_close = 0, pending_close_cond = 0x10069180, 
  pending_close_lock = {mutex = 0x0, unused = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 0, __nusers = 0, {__elision_data = {__espins = 0, __elision = 0}, __list = {__next = 0x0}}}, __size = '\000' <repeats 23 times>, 
      __align = 0}}, idle_timer = {expires = {tv_sec = 0, tv_nsec = 0}, cookie = 0x10055710, handler = 0xff58ed4 <log_reader_idle_timeout>, pad = {0x0, 0x0, 0xffffffff, 0x0}}}

The output from my additional debug for this logreader is:

Fri Apr 29 00:41:51 2022
 0x10055710: Created log reader
Fri Apr 29 00:41:51 2022
 0x10055710: log_reader_open
Fri Apr 29 00:41:51 2022
 0x10055710: log_reader_apply_proto_and_poll_events: start
Fri Apr 29 00:41:51 2022
 0x10055710: log_reader_apply_proto_and_poll_events: self->proto is set
Fri Apr 29 00:41:51 2022
 0x10055710: log_reader_init
Fri Apr 29 00:41:51 2022
 0x10055710: log_reader_work_finished
Fri Apr 29 00:41:51 2022
 0x10055710: notify_code is set
Fri Apr 29 00:41:51 2022
 (nil): afsocket_sd_close_connection
Fri Apr 29 00:41:51 2022
 (nil): UNIX
Fri Apr 29 00:41:51 2022
 0x10055710: log_reader_close_proto
Fri Apr 29 00:41:51 2022
 0x10055710: log_reader_close_proto_deferred
Fri Apr 29 00:41:51 2022
 0x10055710: log_reader_apply_proto_and_poll_events: start
Fri Apr 29 00:41:51 2022
 0x10055710: self->proto set to NULL @ log_reader_apply_proto_and_poll_events
Fri Apr 29 00:41:51 2022
 0x10055710: log_reader_deinit
Fri Apr 29 00:41:51 2022
 0x10055710: log_reader_free
Fri Apr 29 00:41:51 2022
 0x10055710: PIF_INITIALIZED

sparlane · 2022-04-29T01:13:54Z

For comparison, I used the official syslog-ng-3.36.1.tar.gz in place of our syslog-ng, and I see the same crash

Core was generated by `/usr/sbin/syslog-ng -F -f /etc/syslog-ng.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0ff5093c in log_proto_server_reset_error (s=0x0) at ./lib/logproto/logproto-server.h:164
164	./lib/logproto/logproto-server.h: No such file or directory.
[Current thread is 1 (Thread 0x77fdb020 (LWP 383))]
(gdb) bt full
#0  0x0ff5093c in log_proto_server_reset_error (s=0x0) at ./lib/logproto/logproto-server.h:164
No locals.
#1  log_reader_work_finished (s=0x100bde80) at lib/logreader.c:402
        self = 0x100bde80
#2  0x0fc65b98 in iv_fd_poll_and_run (st=st@entry=0x100270f0, abs=<optimized out>) at iv_fd.c:219
        fd = 0x100bdb9c
        active = {next = 0x7fd9b208, prev = 0x7fd9b208}
        run_timers = <optimized out>
#3  0x0fc670b4 in iv_main () at iv_main_posix.c:112
        _abs = {tv_sec = 0, tv_nsec = 0}
        abs = <optimized out>
        st = 0x100270f0
        run_timers = <optimized out>
#4  0x0ff560e0 in main_loop_run (self=0xfff69f8 <main_loop>) at lib/mainloop.c:668
No locals.
#5  0x10001270 in main (argc=<optimized out>, argv=<optimized out>) at syslog-ng/main.c:311
        rc = 0
        ctx = <optimized out>
        error = 0x0
        main_loop = 0xfff69f8 <main_loop>
        exit_before_main_loop_run = 0
(gdb) frame 1
#1  log_reader_work_finished (s=0x100bde80) at lib/logreader.c:402
402	lib/logreader.c: No such file or directory.
(gdb) p *s
Attempt to dereference a generic pointer.
(gdb) p *self
$1 = {super = {super = {ref_cnt = {counter = 65725}, flags = -1730795671, queue = 0xff522d4 <log_source_queue>, cfg = 0x10045170, expr_node = 0x0, pipe_next = 0x100bb480, discarded_messages = 0x0, persist_name = 0x0, plugin_name = 0x0, 
      signal_slot_connector = 0x100bb590, pre_init = 0x0, init = 0xff506bc <log_reader_init>, deinit = 0xff502d8 <log_reader_deinit>, post_deinit = 0x0, on_config_inited = 0x0, generate_persist_name = 0x0, arcs = 0xff4e098 <_arcs>, 
      clone = 0x0, free_fn = 0xff4fde8 <log_reader_free>, notify = 0x0, info = 0x0}, options = 0x10069ccc, threaded = 0, name = 0x100bdbe0 "\020\n\265ݘ\326'ieam,AF_UNIX(anonymous)", 
    stats_id = 0x100bb390 "\020\n\350;\230\326'i\020\006", <incomplete sequence \340>, stats_instance = 0x100bbea0 "\020\n뻘\326'ieam,anonymous", window_size = {counter = {value = 100}}, dynamic_window = {pool = 0x0, stat = {n = 0, 
        sum = 0}}, window_initialized = 1, initial_window_size = 100, full_window_size = 100, window_size_to_be_reclaimed = {value = 0}, pending_reclaimed = {value = 0}, stat_window_size = 0x0, stat_full_window = 0x0, 
    last_message_seen = 0x0, recvd_messages = 0x0, stat_window_size_cluster = 0x0, stat_full_window_cluster = 0x0, last_ack_count = 0, ack_count = 0, window_full_sleep_nsec = 0, last_ack_rate_time = {tv_sec = 0, tv_nsec = 0}, 
    ack_tracker_factory = 0x10060da0, ack_tracker = 0x0, wakeup = 0xff4fd48 <log_reader_wakeup>, schedule_dynamic_window_realloc = 0xff4fd60 <_schedule_dynamic_window_realloc>}, proto = 0x0, immediate_check = 0, control = 0x100bb480, 
  options = 0x10069cc8, poll_events = 0x0, peer_addr = 0x10062160, local_addr = 0x100621e0, max_message_size = 0x0, average_messages_size = 0x0, CPS = 0x0, restart_task = {cookie = 0x100bde80, 
    handler = 0xff5095c <log_reader_io_handle_in>, pad = {0x100bdf88, 0x100bdf88, 0x21, 0x0, 0x0, 0x0}}, schedule_wakeup = {cookie = 0x100bde80, handler = 0xff50860 <log_reader_wakeup_triggered>, owner = 0x100270f0, list = {
      next = 0x100bdfac, prev = 0x100bdfac}}, io_job = {engage = 0xff4e344 <log_pipe_ref>, work = 0xff50218 <log_reader_work_perform>, completion = 0xff5087c <log_reader_work_finished>, release = 0xff4e3c4 <log_pipe_unref>, 
    user_data = 0x100bde80, working = 0, cond = 0, work_item = {cookie = 0x100bdfb4, work = 0xff575ec <_work>, completion = 0xff57588 <_complete>, list = {next = 0x0, prev = 0x0}}}, watches_running = 0, suspended = 0, 
  realloc_window_after_fetch = 0, notify_code = 0, pending_close = 0, pending_close_cond = {p = 0x0, i = {0, 0}}, pending_close_lock = {p = 0x0, i = {0, 0}}, idle_timer = {expires = {tv_sec = 0, tv_nsec = 0}, cookie = 0x100bde80, 
    handler = 0xff509f8 <log_reader_idle_timeout>, pad = {0x0, 0x0, 0xffffffff, 0x0}}}

bazsi · 2022-04-29T04:40:45Z

The LogReader instance seems to have been freed, so this is a use-after-free. Thanks for the details. Going to look into it.

…

On Fri, Apr 29, 2022, 03:14 Scott Parlane ***@***.***> wrote: For comparison, I used the official syslog-ng-3.36.1.tar.gz in place of our syslog-ng, and I see the same crash Core was generated by `/usr/sbin/syslog-ng -F -f /etc/syslog-ng.conf'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0ff5093c in log_proto_server_reset_error (s=0x0) at ./lib/logproto/logproto-server.h:164 164 ./lib/logproto/logproto-server.h: No such file or directory. [Current thread is 1 (Thread 0x77fdb020 (LWP 383))] (gdb) bt full #0 0x0ff5093c in log_proto_server_reset_error (s=0x0) at ./lib/logproto/logproto-server.h:164 No locals. #1 log_reader_work_finished (s=0x100bde80) at lib/logreader.c:402 self = 0x100bde80 #2 0x0fc65b98 in iv_fd_poll_and_run ***@***.***=0x100270f0, abs=<optimized out>) at iv_fd.c:219 fd = 0x100bdb9c active = {next = 0x7fd9b208, prev = 0x7fd9b208} run_timers = <optimized out> #3 0x0fc670b4 in iv_main () at iv_main_posix.c:112 _abs = {tv_sec = 0, tv_nsec = 0} abs = <optimized out> st = 0x100270f0 run_timers = <optimized out> #4 0x0ff560e0 in main_loop_run (self=0xfff69f8 <main_loop>) at lib/mainloop.c:668 No locals. #5 0x10001270 in main (argc=<optimized out>, argv=<optimized out>) at syslog-ng/main.c:311 rc = 0 ctx = <optimized out> error = 0x0 main_loop = 0xfff69f8 <main_loop> exit_before_main_loop_run = 0 (gdb) frame 1 #1 log_reader_work_finished (s=0x100bde80) at lib/logreader.c:402 402 lib/logreader.c: No such file or directory. (gdb) p *s Attempt to dereference a generic pointer. (gdb) p *self $1 = {super = {super = {ref_cnt = {counter = 65725}, flags = -1730795671, queue = 0xff522d4 <log_source_queue>, cfg = 0x10045170, expr_node = 0x0, pipe_next = 0x100bb480, discarded_messages = 0x0, persist_name = 0x0, plugin_name = 0x0, signal_slot_connector = 0x100bb590, pre_init = 0x0, init = 0xff506bc <log_reader_init>, deinit = 0xff502d8 <log_reader_deinit>, post_deinit = 0x0, on_config_inited = 0x0, generate_persist_name = 0x0, arcs = 0xff4e098 <_arcs>, clone = 0x0, free_fn = 0xff4fde8 <log_reader_free>, notify = 0x0, info = 0x0}, options = 0x10069ccc, threaded = 0, name = 0x100bdbe0 "\020\n\265ݘ\326'ieam,AF_UNIX(anonymous)", stats_id = 0x100bb390 "\020\n\350;\230\326'i\020\006", <incomplete sequence \340>, stats_instance = 0x100bbea0 "\020\n뻘\326'ieam,anonymous", window_size = {counter = {value = 100}}, dynamic_window = {pool = 0x0, stat = {n = 0, sum = 0}}, window_initialized = 1, initial_window_size = 100, full_window_size = 100, window_size_to_be_reclaimed = {value = 0}, pending_reclaimed = {value = 0}, stat_window_size = 0x0, stat_full_window = 0x0, last_message_seen = 0x0, recvd_messages = 0x0, stat_window_size_cluster = 0x0, stat_full_window_cluster = 0x0, last_ack_count = 0, ack_count = 0, window_full_sleep_nsec = 0, last_ack_rate_time = {tv_sec = 0, tv_nsec = 0}, ack_tracker_factory = 0x10060da0, ack_tracker = 0x0, wakeup = 0xff4fd48 <log_reader_wakeup>, schedule_dynamic_window_realloc = 0xff4fd60 <_schedule_dynamic_window_realloc>}, proto = 0x0, immediate_check = 0, control = 0x100bb480, options = 0x10069cc8, poll_events = 0x0, peer_addr = 0x10062160, local_addr = 0x100621e0, max_message_size = 0x0, average_messages_size = 0x0, CPS = 0x0, restart_task = {cookie = 0x100bde80, handler = 0xff5095c <log_reader_io_handle_in>, pad = {0x100bdf88, 0x100bdf88, 0x21, 0x0, 0x0, 0x0}}, schedule_wakeup = {cookie = 0x100bde80, handler = 0xff50860 <log_reader_wakeup_triggered>, owner = 0x100270f0, list = { next = 0x100bdfac, prev = 0x100bdfac}}, io_job = {engage = 0xff4e344 <log_pipe_ref>, work = 0xff50218 <log_reader_work_perform>, completion = 0xff5087c <log_reader_work_finished>, release = 0xff4e3c4 <log_pipe_unref>, user_data = 0x100bde80, working = 0, cond = 0, work_item = {cookie = 0x100bdfb4, work = 0xff575ec <_work>, completion = 0xff57588 <_complete>, list = {next = 0x0, prev = 0x0}}}, watches_running = 0, suspended = 0, realloc_window_after_fetch = 0, notify_code = 0, pending_close = 0, pending_close_cond = {p = 0x0, i = {0, 0}}, pending_close_lock = {p = 0x0, i = {0, 0}}, idle_timer = {expires = {tv_sec = 0, tv_nsec = 0}, cookie = 0x100bde80, handler = 0xff509f8 <log_reader_idle_timeout>, pad = {0x0, 0x0, 0xffffffff, 0x0}}} — Reply to this email directly, view it on GitHub <#3985 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFOK5VCLABTZR2CBVUAN3DVHMZVZANCNFSM5TMKCQIQ> . You are receiving this because you commented.Message ID: ***@***.***>

bazsi · 2022-04-29T05:37:27Z

Do you have threading enabled? E.g. threaded flag or threaded(no) in the global config.

If you have disabled threading then I have a theory.

Non threaded execution is performed by invoking the MainloopIOJob callbacks directly. In that case MainloooIOJob->engage and release does not take a reference on the LogReader, causing exactly this crash.

Add a log_pipe_ref/unref pair around the invocation of log_reader_work_perform/finished, in the function
log_reader_io_handle_in()

This should close this issue. I am writing this on my phone and am unable to submit a patch at the moment. But if you could confirm this theory that'd be great.

Thanks

bazsi · 2022-04-29T17:48:43Z

I have now reproduced this locally, not the crash but the use-after-free situation. Probably the crash does not happen on some of your platforms as either the flags member is not overwritten when LogReader is freed, OR the self->proto becomes non-NULL during the free.

In any case, both after use-after-free conditions that eventually could lead to a crash. This patch seems to fix it:

diff --git a/lib/logreader.c b/lib/logreader.c
index 135032ce7..3f1821843 100644
--- a/lib/logreader.c
+++ b/lib/logreader.c
@@ -572,8 +572,10 @@ log_reader_io_handle_in(gpointer s)
        */
       if (!main_loop_worker_job_quit())
         {
+          log_pipe_ref(&self->super.super);
           log_reader_work_perform(s, G_IO_IN);
           log_reader_work_finished(s);
+          log_pipe_unref(&self->super.super);
         }
     }
 }

This code is only executing in the non-threaded case. While threading is enabled, this is fixed by MainLoopIOJob->engage/release methods which take and drop these very refs.

As a workaround, you could re-enable threading, that should fix the crash as well. I am opening a PR from this patch in a moment.

sparlane · 2022-05-01T21:47:42Z

Hi,

There is no reference to threading anywhere in our config, so I think it is turned off.

Adding the log_ref/unref as described also resolves this crash.

sparlane · 2022-05-01T23:35:22Z

Hi @bazsi,

Looking at some of the things we have previously tried to fix, I think the same bug exists in log_writer_io_handler where it has the same code pattern.

From our bug history I see that they are related to having a debug output that is connected to the local console and then the user logging out without stopping the debug output.

bazsi · 2022-05-02T06:01:10Z

Hi, Good point on the LogWriter case. It's interesting that threading is not referenced in your config. It has been enabled by default since 3.5 I think. But from the gdb output I clearly saw it disabled and the code part that the fix changes is also the non-threaded case. I am adding a fix to LogWriter too. Thanks a lot for your fix and efforts.

…

On Mon, May 2, 2022, 01:35 Scott Parlane ***@***.***> wrote: Hi @bazsi <https://github.com/bazsi>, Looking at some of the things we have previously tried to fix, I think the same bug exists in log_writer_io_handler where it has the same code pattern. From our bug history I see that they are related to having a debug output that is connected to the local console and then the user logging out without stopping the debug output. — Reply to this email directly, view it on GitHub <#3985 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFOK5RMLVGKUFW423FFCWLVH4IMLANCNFSM5TMKCQIQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

bazsi · 2022-05-03T07:27:28Z

Closing this in favour of #3997

MrAnno self-requested a review April 14, 2022 09:07

bazsi mentioned this pull request Apr 29, 2022

logreader: fix use-after-free bug in non-threaded mode #3997

Merged

bazsi closed this May 3, 2022

sparlane deleted the logreader_segv_fix branch May 3, 2022 09:26

MrAnno removed their request for review May 3, 2022 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logreader: stop processing if the connection is closed #3985

logreader: stop processing if the connection is closed #3985

sparlane commented Apr 14, 2022

kira-syslogng commented Apr 14, 2022

kira-syslogng commented Apr 14, 2022

bazsi commented Apr 14, 2022

kira-syslogng commented Apr 14, 2022

bazsi commented Apr 14, 2022

sparlane commented Apr 20, 2022

bazsi commented Apr 20, 2022 via email

bazsi commented Apr 27, 2022 via email

bazsi commented Apr 28, 2022

sparlane commented Apr 29, 2022 •

edited

Loading

sparlane commented Apr 29, 2022

bazsi commented Apr 29, 2022 via email

bazsi commented Apr 29, 2022 •

edited

Loading

bazsi commented Apr 29, 2022

sparlane commented May 1, 2022

sparlane commented May 1, 2022

bazsi commented May 2, 2022 via email

bazsi commented May 3, 2022

logreader: stop processing if the connection is closed #3985

logreader: stop processing if the connection is closed #3985

Conversation

sparlane commented Apr 14, 2022

kira-syslogng commented Apr 14, 2022

kira-syslogng commented Apr 14, 2022

bazsi commented Apr 14, 2022

kira-syslogng commented Apr 14, 2022

bazsi commented Apr 14, 2022

sparlane commented Apr 20, 2022

bazsi commented Apr 20, 2022 via email

bazsi commented Apr 27, 2022 via email

bazsi commented Apr 28, 2022

sparlane commented Apr 29, 2022 • edited Loading

sparlane commented Apr 29, 2022

bazsi commented Apr 29, 2022 via email

bazsi commented Apr 29, 2022 • edited Loading

bazsi commented Apr 29, 2022

sparlane commented May 1, 2022

sparlane commented May 1, 2022

bazsi commented May 2, 2022 via email

bazsi commented May 3, 2022

sparlane commented Apr 29, 2022 •

edited

Loading

bazsi commented Apr 29, 2022 •

edited

Loading