-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logreader: stop processing if the connection is closed #3985
Conversation
If the connection is closed and log_reader_work_finished was running, calling log_pipe_notify will result in the LogReader object being no longer usable because it was cleaned up. So to prevent SEGV, don't allow execution to continue into the next block when the connection is marked as NC_CLOSE It seems more likely for this to occur when the log reader is a socket that was opened by "logger" from busybox. We started experiencing this after updating from glibc 2.29 to 2.35 and gcc from 8.3 to 11.2. Signed-off-by: Scott Parlane <scott.parlane@alliedtelesis.co.nz>
This user does not have permission to start the build. Can one of the admins verify this patch and start the build? |
1 similar comment
This user does not have permission to start the build. Can one of the admins verify this patch and start the build? |
@kira-syslogng ok to test; |
Build FAILURE |
Can you show a backtrace that this fixes? The "self" pointer at that point should still be referenced, so at the very least the pointer should still be valid. The NC_CLOSE handler in afsocket-source.c will deinit the pipe (afsocket_sd_close_connection) prior to returning, which means that the condition right after the log_pipe_notify() call (in log_reader_work_finished) should not be entered. So I think your change should not actually change behaviour. |
From syslog-ng 3.21.1: Thread 1 (Thread 0x77fa5020 (LWP 405)): I added debug print statements to the start of every function in logreader.c, and most of the branches Yes, I know 3.21.1 is old, but that is what we currently use, I did test with the latest release and I get an identical backtrace (with different line numbers) There are some limitations on where we see this:
|
Thanks for the dump. I am looking into this with v3.21 sources.
…On Wed, Apr 20, 2022, 03:01 Scott Parlane ***@***.***> wrote:
From syslog-ng 3.21.1:
Thread 1 (Thread 0x77fa5020 (LWP 405)):
#0 0x0ff59618 in log_proto_server_reset_error (s=0x0) at
./lib/logproto/logproto-server.h:164
#1 <#1>
log_reader_work_finished (s=0x10053800) at lib/logreader.c:426
#2 <#2> 0x0fc75b98 in
iv_fd_poll_and_run ***@***.***=0x100270a0, abs=) at iv_fd.c:219
#3 <#3> 0x0fc770b4 in iv_main
() at iv_main_posix.c:112
#4 <#4> 0x0ff5e060 in
main_loop_run (self=0xfff629c <main_loop>) at lib/mainloop.c:657
#5 <#5> 0x10001264 in main
(argc=, argv=) at syslog-ng/main.c:316
I added debug print statements to the start of every function in
logreader.c, and most of the branches
pending_close wasn't set, but notify_code was set to NC_CLOSE, and it
called the close/free functions while it was processing the message.
Yes, I know 3.21.1 is old, but that is what we currently use, I did test
with the latest release and I get an identical backtrace (with different
line numbers)
There are some limitations on where we see this:
- if init is upstart we see this, if it's systemd we don't (I suspect
because of the way that systemd interferes with syslog)
- we don't see this on our x86_64 target, but do see it on at least
armv7 and ppc
- I think it's triggered specifically by the use of logger, which
starts opens the the socket to syslog, writes and message and then closes
the socket.
- I can get this to happen often if I replace our syslog-ng-ctl
reload/reopen calls with restarting the syslog-ng job, I suspect this is
because syslog-ng needs to be busy to have both the close and a message at
the same time.
—
Reply to this email directly, view it on GitHub
<#3985 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFOK5X3X5MOBCRHUA6TYITVF5JMVANCNFSM5TMKCQIQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Sorry, this scrolled out of my attention span in my INBOX. I've glimpsed at
the backtrace you sent and it's genuinely useful. Thanks for that.
I need to get going again :( but will keep you posted as I look into it
again.
It might actually be related to CPU architectures, self->proto is
manipulated in parallel threads and in that sense, ARM and ppc might be
more sensitive than x86.
…On Wed, Apr 20, 2022 at 3:01 AM Scott Parlane ***@***.***> wrote:
From syslog-ng 3.21.1:
Thread 1 (Thread 0x77fa5020 (LWP 405)):
#0 0x0ff59618 in log_proto_server_reset_error (s=0x0) at
./lib/logproto/logproto-server.h:164
#1 <#1>
log_reader_work_finished (s=0x10053800) at lib/logreader.c:426
#2 <#2> 0x0fc75b98 in
iv_fd_poll_and_run ***@***.***=0x100270a0, abs=) at iv_fd.c:219
#3 <#3> 0x0fc770b4 in iv_main
() at iv_main_posix.c:112
#4 <#4> 0x0ff5e060 in
main_loop_run (self=0xfff629c <main_loop>) at lib/mainloop.c:657
#5 <#5> 0x10001264 in main
(argc=, argv=) at syslog-ng/main.c:316
I added debug print statements to the start of every function in
logreader.c, and most of the branches
pending_close wasn't set, but notify_code was set to NC_CLOSE, and it
called the close/free functions while it was processing the message.
Yes, I know 3.21.1 is old, but that is what we currently use, I did test
with the latest release and I get an identical backtrace (with different
line numbers)
There are some limitations on where we see this:
- if init is upstart we see this, if it's systemd we don't (I suspect
because of the way that systemd interferes with syslog)
- we don't see this on our x86_64 target, but do see it on at least
armv7 and ppc
- I think it's triggered specifically by the use of logger, which
starts opens the the socket to syslog, writes and message and then closes
the socket.
- I can get this to happen often if I replace our syslog-ng-ctl
reload/reopen calls with restarting the syslog-ng job, I suspect this is
because syslog-ng needs to be busy to have both the close and a message at
the same time.
—
Reply to this email directly, view it on GitHub
<#3985 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFOK5X3X5MOBCRHUA6TYITVF5JMVANCNFSM5TMKCQIQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Bazsi
|
I was delving into the sources and have so fair failed to find a scenario which leads to this crash. Certainly I am missing something. If you still have a core file on the crash, can you please run these gdb commands and record their results:
These are things I've checked:
In summary the PIF_INITIALIZED flag in LogReader->flags prevents the crash, it is cleared in the same process that sets proto to NULL. The only way this could happen, if log_reader_work_finished() would reenter a second time, maybe as an invocation from log_pipe_notify() but work_finished is only called as a completion callback (either by ivykis in threaded mode, or ourselves if threading is disabled). The backtrace shows, that work_finished is called by ivykis, so threading indeed seems to be enabled. Also NOTE that the entire list above is executed in series by the main thread. The asynchronous closing that happens in wildcard-file() is not usedin the case of afsocket. With the extra information that you can hopefully send me, could pinpoint where I am not right in the analysis above. |
Hi, I had to regenerate it, my exact code is here:
The output from my additional debug for this logreader is:
|
For comparison, I used the official syslog-ng-3.36.1.tar.gz in place of our syslog-ng, and I see the same crash
|
The LogReader instance seems to have been freed, so this is a
use-after-free. Thanks for the details. Going to look into it.
…On Fri, Apr 29, 2022, 03:14 Scott Parlane ***@***.***> wrote:
For comparison, I used the official syslog-ng-3.36.1.tar.gz in place of
our syslog-ng, and I see the same crash
Core was generated by `/usr/sbin/syslog-ng -F -f /etc/syslog-ng.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0ff5093c in log_proto_server_reset_error (s=0x0) at ./lib/logproto/logproto-server.h:164
164 ./lib/logproto/logproto-server.h: No such file or directory.
[Current thread is 1 (Thread 0x77fdb020 (LWP 383))]
(gdb) bt full
#0 0x0ff5093c in log_proto_server_reset_error (s=0x0) at ./lib/logproto/logproto-server.h:164
No locals.
#1 log_reader_work_finished (s=0x100bde80) at lib/logreader.c:402
self = 0x100bde80
#2 0x0fc65b98 in iv_fd_poll_and_run ***@***.***=0x100270f0, abs=<optimized out>) at iv_fd.c:219
fd = 0x100bdb9c
active = {next = 0x7fd9b208, prev = 0x7fd9b208}
run_timers = <optimized out>
#3 0x0fc670b4 in iv_main () at iv_main_posix.c:112
_abs = {tv_sec = 0, tv_nsec = 0}
abs = <optimized out>
st = 0x100270f0
run_timers = <optimized out>
#4 0x0ff560e0 in main_loop_run (self=0xfff69f8 <main_loop>) at lib/mainloop.c:668
No locals.
#5 0x10001270 in main (argc=<optimized out>, argv=<optimized out>) at syslog-ng/main.c:311
rc = 0
ctx = <optimized out>
error = 0x0
main_loop = 0xfff69f8 <main_loop>
exit_before_main_loop_run = 0
(gdb) frame 1
#1 log_reader_work_finished (s=0x100bde80) at lib/logreader.c:402
402 lib/logreader.c: No such file or directory.
(gdb) p *s
Attempt to dereference a generic pointer.
(gdb) p *self
$1 = {super = {super = {ref_cnt = {counter = 65725}, flags = -1730795671, queue = 0xff522d4 <log_source_queue>, cfg = 0x10045170, expr_node = 0x0, pipe_next = 0x100bb480, discarded_messages = 0x0, persist_name = 0x0, plugin_name = 0x0,
signal_slot_connector = 0x100bb590, pre_init = 0x0, init = 0xff506bc <log_reader_init>, deinit = 0xff502d8 <log_reader_deinit>, post_deinit = 0x0, on_config_inited = 0x0, generate_persist_name = 0x0, arcs = 0xff4e098 <_arcs>,
clone = 0x0, free_fn = 0xff4fde8 <log_reader_free>, notify = 0x0, info = 0x0}, options = 0x10069ccc, threaded = 0, name = 0x100bdbe0 "\020\n\265ݘ\326'ieam,AF_UNIX(anonymous)",
stats_id = 0x100bb390 "\020\n\350;\230\326'i\020\006", <incomplete sequence \340>, stats_instance = 0x100bbea0 "\020\n뻘\326'ieam,anonymous", window_size = {counter = {value = 100}}, dynamic_window = {pool = 0x0, stat = {n = 0,
sum = 0}}, window_initialized = 1, initial_window_size = 100, full_window_size = 100, window_size_to_be_reclaimed = {value = 0}, pending_reclaimed = {value = 0}, stat_window_size = 0x0, stat_full_window = 0x0,
last_message_seen = 0x0, recvd_messages = 0x0, stat_window_size_cluster = 0x0, stat_full_window_cluster = 0x0, last_ack_count = 0, ack_count = 0, window_full_sleep_nsec = 0, last_ack_rate_time = {tv_sec = 0, tv_nsec = 0},
ack_tracker_factory = 0x10060da0, ack_tracker = 0x0, wakeup = 0xff4fd48 <log_reader_wakeup>, schedule_dynamic_window_realloc = 0xff4fd60 <_schedule_dynamic_window_realloc>}, proto = 0x0, immediate_check = 0, control = 0x100bb480,
options = 0x10069cc8, poll_events = 0x0, peer_addr = 0x10062160, local_addr = 0x100621e0, max_message_size = 0x0, average_messages_size = 0x0, CPS = 0x0, restart_task = {cookie = 0x100bde80,
handler = 0xff5095c <log_reader_io_handle_in>, pad = {0x100bdf88, 0x100bdf88, 0x21, 0x0, 0x0, 0x0}}, schedule_wakeup = {cookie = 0x100bde80, handler = 0xff50860 <log_reader_wakeup_triggered>, owner = 0x100270f0, list = {
next = 0x100bdfac, prev = 0x100bdfac}}, io_job = {engage = 0xff4e344 <log_pipe_ref>, work = 0xff50218 <log_reader_work_perform>, completion = 0xff5087c <log_reader_work_finished>, release = 0xff4e3c4 <log_pipe_unref>,
user_data = 0x100bde80, working = 0, cond = 0, work_item = {cookie = 0x100bdfb4, work = 0xff575ec <_work>, completion = 0xff57588 <_complete>, list = {next = 0x0, prev = 0x0}}}, watches_running = 0, suspended = 0,
realloc_window_after_fetch = 0, notify_code = 0, pending_close = 0, pending_close_cond = {p = 0x0, i = {0, 0}}, pending_close_lock = {p = 0x0, i = {0, 0}}, idle_timer = {expires = {tv_sec = 0, tv_nsec = 0}, cookie = 0x100bde80,
handler = 0xff509f8 <log_reader_idle_timeout>, pad = {0x0, 0x0, 0xffffffff, 0x0}}}
—
Reply to this email directly, view it on GitHub
<#3985 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFOK5VCLABTZR2CBVUAN3DVHMZVZANCNFSM5TMKCQIQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Do you have threading enabled? E.g. threaded flag or threaded(no) in the global config. If you have disabled threading then I have a theory. Non threaded execution is performed by invoking the MainloopIOJob callbacks directly. In that case MainloooIOJob->engage and release does not take a reference on the LogReader, causing exactly this crash. Add a log_pipe_ref/unref pair around the invocation of log_reader_work_perform/finished, in the function This should close this issue. I am writing this on my phone and am unable to submit a patch at the moment. But if you could confirm this theory that'd be great. Thanks |
I have now reproduced this locally, not the crash but the use-after-free situation. Probably the crash does not happen on some of your platforms as either the flags member is not overwritten when LogReader is freed, OR the self->proto becomes non-NULL during the free. In any case, both after use-after-free conditions that eventually could lead to a crash. This patch seems to fix it:
This code is only executing in the non-threaded case. While threading is enabled, this is fixed by MainLoopIOJob->engage/release methods which take and drop these very refs. As a workaround, you could re-enable threading, that should fix the crash as well. I am opening a PR from this patch in a moment. |
Hi, There is no reference to threading anywhere in our config, so I think it is turned off. Adding the log_ref/unref as described also resolves this crash. |
Hi @bazsi, Looking at some of the things we have previously tried to fix, I think the same bug exists in log_writer_io_handler where it has the same code pattern. From our bug history I see that they are related to having a debug output that is connected to the local console and then the user logging out without stopping the debug output. |
Hi,
Good point on the LogWriter case.
It's interesting that threading is not referenced in your config. It has
been enabled by default since 3.5 I think. But from the gdb output I
clearly saw it disabled and the code part that the fix changes is also the
non-threaded case.
I am adding a fix to LogWriter too.
Thanks a lot for your fix and efforts.
…On Mon, May 2, 2022, 01:35 Scott Parlane ***@***.***> wrote:
Hi @bazsi <https://github.com/bazsi>,
Looking at some of the things we have previously tried to fix, I think the
same bug exists in log_writer_io_handler where it has the same code pattern.
From our bug history I see that they are related to having a debug output
that is connected to the local console and then the user logging out
without stopping the debug output.
—
Reply to this email directly, view it on GitHub
<#3985 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFOK5RMLVGKUFW423FFCWLVH4IMLANCNFSM5TMKCQIQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Closing this in favour of #3997 |
If the connection is closed and log_reader_work_finished was running,
calling log_pipe_notify will result in the LogReader object being no longer
usable because it was cleaned up.
So to prevent SEGV, don't allow execution to continue into the next block
when the connection is marked as NC_CLOSE
It seems more likely for this to occur when the log reader is a socket
that was opened by "logger" from busybox.
We started experiencing this after updating from glibc 2.29 to 2.35
and gcc from 8.3 to 11.2.
Signed-off-by: Scott Parlane scott.parlane@alliedtelesis.co.nz