Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault caused by retry #123

Closed
nokute78 opened this issue Oct 31, 2016 · 4 comments
Closed

Segfault caused by retry #123

nokute78 opened this issue Oct 31, 2016 · 4 comments
Assignees

Comments

@nokute78
Copy link
Collaborator

nokute78 commented Oct 31, 2016

How to reproduce

  1. $ bin/fluent-bit -i cpu -o forward -f 1
  2. (wait a minute)

This issue also happens with in_http out_http.
The point is "offline" not to communicate fluentd/http server.

gdb log

out_th is NULL when segfault is happened.

src/flb_task.c:81
81	    o_ins = out_th->o_ins;
$ gdb bin/fluent-bit 
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-90.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/taka/git/oss/pull_req/fluentbit_env/fluent-bit/build/bin/fluent-bit...done.
(gdb) run -i cpu -o http -vvv -f 1
Starting program: /home/taka/git/oss/pull_req/fluentbit_env/fluent-bit/build/bin/fluent-bit -i cpu -o http -vvv -f 1
[Thread debugging using libthread_db enabled]
[New Thread 0x2aaaaacc1700 (LWP 28315)]
Fluent-Bit v0.9.0
Copyright (C) Treasure Data

[2016/10/31 22:14:01] [ info] [engine] started
[2016/10/31 22:14:01] [error] [TLS] Invalid CA file: /etc/ssl/certs/ca-certificates.crt
[2016/10/31 22:14:01] [debug] [router] default match rule cpu.0:http.0
[2016/10/31 22:14:02] [trace] [in_cpu] CPU 1.00%
[2016/10/31 22:14:03] [debug] [task] created task=0x6ea280 OK
[2016/10/31 22:14:03] [trace] [engine dispatch] task #0 created 0x6ea280
[2016/10/31 22:14:03] [trace] [thread 0x6ea320] created (custom data at 0x6ea348, size=64
[2016/10/31 22:14:03] [trace] [upstream] connection in process
[2016/10/31 22:14:03] [trace] [in_cpu] CPU 5.00%
[2016/10/31 22:14:03] [trace] [engine] resuming thread=0x6ea320
[2016/10/31 22:14:03] [error] [io] TCP connection failed: 127.0.0.1:80
[2016/10/31 22:14:03] [error] [out_http] no upstream connections available
[2016/10/31 22:14:03] [trace] [engine] [task event] task_id=0 thread_id=0 return=RETRY
[2016/10/31 22:14:03] [trace] [thread] destroy thread=0x6ea320
[2016/10/31 22:14:03] [debug] [sched] retry=0x6f2620 0 in 6 seconds
[2016/10/31 22:14:04] [debug] [task] created task=0x6f0610 OK
[2016/10/31 22:14:04] [trace] [engine dispatch] task #1 created 0x6f0610
[2016/10/31 22:14:04] [trace] [thread 0x6f05a0] created (custom data at 0x6f05c8, size=64
[2016/10/31 22:14:04] [trace] [upstream] connection in process
[2016/10/31 22:14:04] [trace] [in_cpu] CPU 6.00%
[2016/10/31 22:14:04] [trace] [engine] resuming thread=0x6f05a0
[2016/10/31 22:14:04] [error] [io] TCP connection failed: 127.0.0.1:80
[2016/10/31 22:14:04] [error] [out_http] no upstream connections available
[2016/10/31 22:14:04] [trace] [engine] [task event] task_id=1 thread_id=0 return=RETRY
[2016/10/31 22:14:04] [trace] [thread] destroy thread=0x6f05a0
[2016/10/31 22:14:04] [debug] [sched] retry=0x6f0740 1 in 2 seconds
[2016/10/31 22:14:05] [debug] [task] created task=0x6f0800 OK
[2016/10/31 22:14:05] [trace] [engine dispatch] task #2 created 0x6f0800
[2016/10/31 22:14:05] [trace] [thread 0x6f06d0] created (custom data at 0x6f06f8, size=64
[2016/10/31 22:14:05] [trace] [upstream] connection in process
[2016/10/31 22:14:05] [trace] [in_cpu] CPU 3.00%
[2016/10/31 22:14:05] [trace] [engine] resuming thread=0x6f06d0
[2016/10/31 22:14:05] [error] [io] TCP connection failed: 127.0.0.1:80
[2016/10/31 22:14:05] [error] [out_http] no upstream connections available
[2016/10/31 22:14:05] [trace] [engine] [task event] task_id=2 thread_id=0 return=RETRY
[2016/10/31 22:14:05] [trace] [thread] destroy thread=0x6f06d0
[2016/10/31 22:14:05] [debug] [sched] retry=0x6f0950 2 in 5 seconds
[2016/10/31 22:14:06] [trace] [thread 0x6f08e0] created (custom data at 0x6f0908, size=64
[2016/10/31 22:14:06] [trace] [upstream] connection in process
[2016/10/31 22:14:06] [debug] [task] created task=0x6f0b50 OK
[2016/10/31 22:14:06] [trace] [engine dispatch] task #3 created 0x6f0b50
[2016/10/31 22:14:06] [trace] [thread 0x6f05a0] created (custom data at 0x6f05c8, size=64
[2016/10/31 22:14:06] [trace] [upstream] connection in process
[2016/10/31 22:14:06] [trace] [in_cpu] CPU 3.00%
[2016/10/31 22:14:06] [trace] [engine] resuming thread=0x6f08e0
[2016/10/31 22:14:06] [error] [io] TCP connection failed: 127.0.0.1:80
[2016/10/31 22:14:06] [error] [out_http] no upstream connections available
[2016/10/31 22:14:06] [trace] [engine] resuming thread=0x6f05a0
[2016/10/31 22:14:06] [error] [io] TCP connection failed: 127.0.0.1:80
[2016/10/31 22:14:06] [error] [out_http] no upstream connections available
[2016/10/31 22:14:06] [trace] [engine] [task event] task_id=1 thread_id=1 return=RETRY
[2016/10/31 22:14:06] [trace] [thread] destroy thread=0x6f08e0
[2016/10/31 22:14:06] [debug] [sched] retry=0x6f0740 1 in 20 seconds
[2016/10/31 22:14:06] [trace] [engine] [task event] task_id=3 thread_id=0 return=RETRY
[2016/10/31 22:14:06] [trace] [thread] destroy thread=0x6f05a0
[2016/10/31 22:14:06] [debug] [sched] retry=0x6f0a60 3 in 9 seconds

Program received signal SIGSEGV, Segmentation fault.
0x000000000041a615 in flb_task_retry_create (task=0x6ea280, data=0x0) at /home/taka/git/oss/pull_req/fluentbit_env/fluent-bit/src/flb_task.c:81
81	    o_ins = out_th->o_ins;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.192.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) 
@edsiper edsiper self-assigned this Oct 31, 2016
@edsiper
Copy link
Member

edsiper commented Oct 31, 2016

troubleshooting.

edsiper added a commit that referenced this issue Nov 1, 2016
When extracting the task_id in the engine, the macro FLB_TASK_ID(x)
was doing some bits overlap messing up the final value due to a wrong
enforcement of uint16_t. Changing to uint32_t fixes the issue.

Signed-off-by: Eduardo Silva <eduardo@treasure-data.com>
@edsiper
Copy link
Member

edsiper commented Nov 1, 2016

@nokute78 thanks for the report (it took me the whole day to isolate the problem!)

would you please re-check in your side if this change is the right fix ?

@nokute78
Copy link
Collaborator Author

nokute78 commented Nov 1, 2016

@edsiper Thanks! This issue was fixed.
However, this patch reveals hiding leak issue. So, I will open another issue.

This issue also happens with in_http.

Oops, it was out_http.

@nokute78 nokute78 closed this as completed Nov 1, 2016
@edsiper
Copy link
Member

edsiper commented Nov 1, 2016

thanks for the confirmation, I will check that.

edsiper added a commit that referenced this issue Nov 7, 2016
When extracting the task_id in the engine, the macro FLB_TASK_ID(x)
was doing some bits overlap messing up the final value due to a wrong
enforcement of uint16_t. Changing to uint32_t fixes the issue.

Signed-off-by: Eduardo Silva <eduardo@treasure-data.com>
fujimotos pushed a commit to fujimotos/fluent-bit that referenced this issue Jul 22, 2019
Signed-off-by: Brandon DuRette <brandon.durette@wpengine.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants