Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flood of connection timed out message and accordingly side effect #3192

Closed
ginobiliwang opened this issue Mar 9, 2021 · 8 comments
Closed

Comments

@ginobiliwang
Copy link
Contributor

Bug Report

Describe the bug
Similar things reported in the following issue entry. In my test, the failure of recycling these connections would have side effect on reconnecting to remote side.

#2950
#2800

Behavior is like, we could spot large amount of log information as below.

[2021/03/09 17:20:18] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #30 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #31 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:20:18] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds

To Reproduce

[SERVICE]
    flush     1
    log_level info

[INPUT]
    name      random

[OUTPUT]
    name      tcp
    match     *
    host      127.0.0.1
    port      9090
    format    json_lines
    # Networking Setup
    net.connect_timeout         5
    net.source_address          127.0.0.1
    net.keepalive               on
    net.keepalive_idle_timeout  10
nc -l 9090
  1. start all the accordingly part.
  2. teminate "nc -l 9090" process and wait for more than 5 seconds.
  3. After step 2), we could observe log information similar as below
[2021/03/09 17:20:18] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
  1. start up "nc -l 9090" again, log information from step 3) would not disappear, and sometimes tcp connection to newly started "nc -l 9090" would be closed by fluent-bit side.
[root@sdw1 build]# bin/fluent-bit   -c   /home/fwang/fluent-bit/conf/out_tcp.conf
Fluent Bit v1.7.2
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/03/09 17:44:52] [ info] [engine] started (pid=18006)
[2021/03/09 17:44:52] [ info] [storage] version=1.1.1, initializing...
[2021/03/09 17:44:52] [ info] [storage] in-memory
[2021/03/09 17:44:52] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/03/09 17:44:52] [ info] [sp] stream processor started
[2021/03/09 17:45:24] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:24] [ warn] [engine] failed to flush chunk '18006-1615283123.473993948.flb', retry in 9 seconds: task_id=0, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:25] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:25] [ warn] [engine] failed to flush chunk '18006-1615283124.473983475.flb', retry in 11 seconds: task_id=1, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:26] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:26] [ warn] [engine] failed to flush chunk '18006-1615283125.473986861.flb', retry in 7 seconds: task_id=2, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:27] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:27] [ warn] [engine] failed to flush chunk '18006-1615283126.473984404.flb', retry in 9 seconds: task_id=3, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:28] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:28] [ warn] [engine] failed to flush chunk '18006-1615283127.473986223.flb', retry in 11 seconds: task_id=4, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:29] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:29] [ warn] [engine] failed to flush chunk '18006-1615283128.473987995.flb', retry in 7 seconds: task_id=5, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:30] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:30] [ warn] [engine] failed to flush chunk '18006-1615283129.473983731.flb', retry in 9 seconds: task_id=6, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:31] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:31] [ warn] [engine] failed to flush chunk '18006-1615283130.473985313.flb', retry in 6 seconds: task_id=7, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:32] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:32] [ warn] [engine] failed to flush chunk '18006-1615283131.473999367.flb', retry in 11 seconds: task_id=8, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:33] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:33] [ warn] [engine] chunk '18006-1615283123.473993948.flb' cannot be retried: task_id=0, input=random.0 > output=tcp.0
[2021/03/09 17:45:34] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:34] [ warn] [engine] failed to flush chunk '18006-1615283133.474119852.flb', retry in 6 seconds: task_id=0, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:35] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:35] [ warn] [engine] failed to flush chunk '18006-1615283134.473970396.flb', retry in 8 seconds: task_id=10, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:36] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:36] [ warn] [engine] chunk '18006-1615283126.473984404.flb' cannot be retried: task_id=3, input=random.0 > output=tcp.0
[2021/03/09 17:45:37] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:37] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:37] [ warn] [engine] chunk '18006-1615283130.473985313.flb' cannot be retried: task_id=7, input=random.0 > output=tcp.0
[2021/03/09 17:45:37] [ warn] [engine] failed to flush chunk '18006-1615283136.474177656.flb', retry in 11 seconds: task_id=3, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:38] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:38] [ warn] [engine] failed to flush chunk '18006-1615283137.474038986.flb', retry in 6 seconds: task_id=7, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:38] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:38] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:39] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:39] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:39] [ warn] [engine] chunk '18006-1615283129.473983731.flb' cannot be retried: task_id=6, input=random.0 > output=tcp.0
[2021/03/09 17:45:39] [ warn] [engine] failed to flush chunk '18006-1615283138.473983275.flb', retry in 8 seconds: task_id=12, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:40] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:40] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:40] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:40] [ warn] [engine] chunk '18006-1615283133.474119852.flb' cannot be retried: task_id=0, input=random.0 > output=tcp.0
[2021/03/09 17:45:41] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:41] [ warn] [engine] failed to flush chunk '18006-1615283140.474075151.flb', retry in 7 seconds: task_id=0, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:41] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:41] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:41] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:41] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:41] [error] [upstream] connection #30 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:42] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:42] [ warn] [engine] failed to flush chunk '18006-1615283141.473995131.flb', retry in 9 seconds: task_id=13, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:43] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:43] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:43] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:43] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:43] [error] [upstream] connection #30 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:43] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:43] [ warn] [engine] chunk '18006-1615283134.473970396.flb' cannot be retried: task_id=10, input=random.0 > output=tcp.0
[2021/03/09 17:45:44] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:44] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:44] [ warn] [engine] chunk '18006-1615283137.474038986.flb' cannot be retried: task_id=7, input=random.0 > output=tcp.0
[2021/03/09 17:45:44] [ warn] [engine] failed to flush chunk '18006-1615283143.474170574.flb', retry in 6 seconds: task_id=10, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:44] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:44] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:44] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:44] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:44] [error] [upstream] connection #30 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:44] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:45] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:45] [ warn] [engine] failed to flush chunk '18006-1615283144.474032208.flb', retry in 9 seconds: task_id=7, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:46] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:46] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:46] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:46] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:46] [error] [upstream] connection #30 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:46] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:46] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:47] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:47] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:47] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:47] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:47] [error] [upstream] connection #30 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:47] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:47] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:48] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:48] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:48] [ warn] [engine] chunk '18006-1615283136.474177656.flb' cannot be retried: task_id=3, input=random.0 > output=tcp.0
[2021/03/09 17:45:48] [ warn] [engine] chunk '18006-1615283140.474075151.flb' cannot be retried: task_id=0, input=random.0 > output=tcp.0
[2021/03/09 17:45:49] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:49] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:49] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:49] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:49] [error] [upstream] connection #30 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:49] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:49] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:49] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:49] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:49] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:49] [ warn] [engine] failed to flush chunk '18006-1615283148.474102906.flb', retry in 9 seconds: task_id=0, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:50] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:50] [ warn] [engine] chunk '18006-1615283143.474170574.flb' cannot be retried: task_id=10, input=random.0 > output=tcp.0
[2021/03/09 17:45:50] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:50] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:50] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:50] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:50] [error] [upstream] connection #30 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:50] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:50] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:50] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:50] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:51] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:51] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:51] [ warn] [engine] chunk '18006-1615283141.473995131.flb' cannot be retried: task_id=13, input=random.0 > output=tcp.0
[2021/03/09 17:45:51] [ warn] [engine] failed to flush chunk '18006-1615283150.474029120.flb', retry in 7 seconds: task_id=10, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:52] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:52] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:52] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:52] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:52] [error] [upstream] connection #30 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:52] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:52] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:52] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:52] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:52] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:52] [ warn] [engine] failed to flush chunk '18006-1615283151.474048319.flb', retry in 8 seconds: task_id=13, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:53] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:53] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:53] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:53] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:53] [error] [upstream] connection #30 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:53] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:53] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:53] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:53] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:53] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:54] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:54] [ warn] [engine] chunk '18006-1615283144.474032208.flb' cannot be retried: task_id=7, input=random.0 > output=tcp.0
[2021/03/09 17:45:55] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:55] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:55] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:55] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:55] [error] [upstream] connection #30 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:55] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:55] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:55] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:55] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:55] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:55] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:55] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:55] [ warn] [engine] failed to flush chunk '18006-1615283154.474018800.flb', retry in 10 seconds: task_id=7, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:56] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:56] [ warn] [engine] failed to flush chunk '18006-1615283155.474118634.flb', retry in 7 seconds: task_id=16, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:56] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:56] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:56] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:56] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:56] [error] [upstream] connection #30 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:56] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:56] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:56] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:56] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:56] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:56] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:57] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:57] [ warn] [engine] failed to flush chunk '18006-1615283156.473984224.flb', retry in 8 seconds: task_id=17, input=random.0 > output=tcp.0 (out_id=0)
^C[2021/03/09 17:45:57] [engine] caught signal (SIGINT)
[2021/03/09 17:45:57] [ warn] [engine] service will stop in 5 seconds
[2021/03/09 17:45:57] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:57] [ warn] [engine] failed to flush chunk '18006-1615283157.473990416.flb', retry in 10 seconds: task_id=18, input=random.0 > output=tcp.0 (out_id=0)
[2021/03/09 17:45:58] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:58] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:58] [error] [upstream] connection #27 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:58] [error] [upstream] connection #28 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:58] [error] [upstream] connection #30 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:58] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:58] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:58] [error] [upstream] connection #26 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:58] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:58] [error] [upstream] connection #29 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:58] [error] [upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
[2021/03/09 17:45:58] [error] [plugins/out_tcp/tcp.c:96 errno=32] Broken pipe
[2021/03/09 17:45:58] [ warn] [engine] chunk '18006-1615283148.474102906.flb' cannot be retried: task_id=0, input=random.0 > output=tcp.0
^C[2021/03/09 17:45:58] [engine] caught signal (SIGINT)

server side console output

[root@sdw1 test]# nc -l 9090
{"date":1615283092.473937,"rand_value":1285084913116582723}
{"date":1615283093.474056,"rand_value":706660627927302812}
{"date":1615283094.473994,"rand_value":3704338600195479074}
{"date":1615283095.473986,"rand_value":17095855764903658792}
{"date":1615283096.47399,"rand_value":7842598343596208331}
{"date":1615283097.473993,"rand_value":4362390089072302447}
{"date":1615283098.473995,"rand_value":6652645624104986502}
{"date":1615283099.47399,"rand_value":3581011594381766063}
{"date":1615283100.473993,"rand_value":16459559446784195288}
{"date":1615283101.473995,"rand_value":16141356126191053967}
{"date":1615283102.473989,"rand_value":5989458118657268858}
{"date":1615283103.473993,"rand_value":4856406829741834798}
{"date":1615283104.473976,"rand_value":6940182786333866320}
{"date":1615283105.474022,"rand_value":4471548515391682141}
{"date":1615283106.473974,"rand_value":15433252269251954261}
{"date":1615283107.473976,"rand_value":12627735645341877346}
{"date":1615283108.474024,"rand_value":16465996599057260106}
{"date":1615283109.473985,"rand_value":11514791939982660984}
{"date":1615283110.473973,"rand_value":2151607519942081971}
{"date":1615283111.473977,"rand_value":7198831955256565843}
{"date":1615283112.473988,"rand_value":1423950559169635628}
{"date":1615283113.473998,"rand_value":17595765539719629032}
{"date":1615283114.473976,"rand_value":14992446626818240571}
{"date":1615283115.473994,"rand_value":11825150518800120865}
{"date":1615283116.473982,"rand_value":3603028392084371916}
{"date":1615283117.47399,"rand_value":1223756529858979982}
{"date":1615283118.47399,"rand_value":6571688381837859308}
{"date":1615283119.473993,"rand_value":14536352375531712450}
{"date":1615283120.47399,"rand_value":11812271360670705010}
{"date":1615283121.473981,"rand_value":15298463488351757564}
{"date":1615283122.473995,"rand_value":9046845770145351060}
^C
[root@sdw1 test]#
[root@sdw1 test]#
[root@sdw1 test]# nc -l 9090
{"date":1615283145.473995,"rand_value":17483685325212821202}
{"date":1615283138.473981,"rand_value":3053923529242874845}
{"date":1615283146.474005,"rand_value":458201242110091235}
[root@sdw1 test]# nc -l 9090
{"date":1615283152.474081,"rand_value":17647157287411215369}
[root@sdw1 test]#

Expected behavior

  1. When server side resumes, fluent-bit would resume data transfer normally.
  2. Failure connections would be recycled and error information like below should disappear.
[upstream] connection #25 to 127.0.0.1:9090 timed out after 5 seconds
  1. I have did some analysis for the issue and a PR will be submitted soon.
ginobiliwang added a commit to ginobiliwang/fluent-bit that referenced this issue Mar 10, 2021
Signed-off-by: Fenggang <ginobiliwang@gmail.com>
ginobiliwang added a commit to ginobiliwang/fluent-bit that referenced this issue Mar 10, 2021
Signed-off-by: Fenggang <ginobiliwang@gmail.com>
ginobiliwang added a commit to ginobiliwang/fluent-bit that referenced this issue Mar 10, 2021
Signed-off-by: Fenggang <ginobiliwang@gmail.com>
@ginobiliwang ginobiliwang mentioned this issue Mar 10, 2021
3 tasks
ginobiliwang added a commit to ginobiliwang/fluent-bit that referenced this issue Mar 13, 2021
Signed-off-by: Fenggang <ginobiliwang@gmail.com>
@pssa
Copy link

pssa commented Mar 16, 2021

Rolling back to v1.7.0 resolves issue for us in all clusters.

mreiger added a commit to metal-stack/audit-forwarder that referenced this issue Mar 18, 2021
@ginobiliwang
Copy link
Contributor Author

ginobiliwang commented Mar 19, 2021

Rolling back to v1.7.0 resolves issue for us in all clusters.

@pssa Rolling back to v1.7.0 is just a temporary walk-around.

@edsiper has pushed this commit for "lib: monkey: sync event epoll changes on closing timeout fd".

With the above commit, connection fd would be recycled right after the retry tasks ends. As a result, libevent library would trigger connection timeout after "net.connect_timeout", and that is where "[upstream] connection xxxxx to xxxxxxx timed out after xxxxx seconds" comes from.

Accordingly PR #3202 could just to resolve the issue.

@edsiper @nokute78 correct me if there is any misunderstanding.

@panaji
Copy link

panaji commented Mar 24, 2021

@ginobiliwang ... i'm facing the same problem ... any chance this will make it to 1.7.3?

@JeffLuoo
Copy link
Contributor

@panaji Have you tried with the #3202 ?

@owenthereal
Copy link

Just run into the issue by streaming logs to LogDNA. There are tons of such errors in the log:

[2021/03/25 20:35:42] [error] [upstream] connection #31 to logs.logdna.com:443 timed out after 10 seconds

Hope #3202 can make it to 1.7.3 so that I don't need to drop to 1.7.0.

@fabricev
Copy link

fabricev commented Mar 25, 2021

Hit by this one as well, rolled back to 1.7.0 as per the comments, and confirm we don't see tons of timeouts errors anymore after that.

@nokute78
Copy link
Collaborator

The patch 1938ec8 will be merged from v1.7.3 (not released yet.)

@nokute78 nokute78 mentioned this issue Mar 31, 2021
@ginobiliwang
Copy link
Contributor Author

close this issue, fix is by 1938ec8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants