-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Description
Bug Report
Describe the bug
When stopping / restarting Fluent Bit service, occasionally it enters a flush loop and hangs (In the experiment we observed it for 10+ minutes. It is still trying to flush log entries, even though we leave the grace period as 5 seconds. For the context, we are using file based buffering, so flush_at_shutdown
should default to false, which means it is not expected for Fluent Bit to try flushing log entries prior to shutdown.
Expected behavior
Fluent Bit stops within a reasonable time (aka close to the grace
config value) after getting
service stop` signal.
Sample logs
[2021/11/09 02:17:13] [ warn] [engine] service will stop in 5 seconds
[2021/11/09 02:17:18] [ info] [engine] service stopped
[2021/11/09 02:17:18] [ warn] [engine] shutdown delayed, grace period has finished but some tasks are still running.
[2021/11/09 02:17:18] [ info] [task] winlog/winlog.0 has 0 pending task(s):
[2021/11/09 02:17:18] [ info] [task] tail/tail.1 has 1 pending task(s):
[2021/11/09 02:17:18] [ info] [task] task_id=0 still running on route(s): stackdriver/stackdriver.1
[2021/11/09 02:17:18] [ info] [task] storage_backlog/storage_backlog.2 has 0 pending task(s):
[2021/11/09 02:17:18] [ info] [task] emitter/emitter_for_rewrite_tag.6 has 2 pending task(s):
[2021/11/09 02:17:18] [ info] [task] task_id=1 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:18] [ info] [task] task_id=2 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:18] [ warn] [engine] service will stop in 5 seconds
[2021/11/09 02:17:23] [error] [upstream] connection #2096 to logging.googleapis.com:443 timed out after 10 seconds
[2021/11/09 02:17:23] [ info] [engine] service stopped
[2021/11/09 02:17:23] [ warn] [engine] shutdown delayed, grace period has finished but some tasks are still running.
[2021/11/09 02:17:23] [ info] [task] winlog/winlog.0 has 0 pending task(s):
[2021/11/09 02:17:23] [ info] [task] tail/tail.1 has 1 pending task(s):
[2021/11/09 02:17:23] [ info] [task] task_id=0 still running on route(s): stackdriver/stackdriver.1
[2021/11/09 02:17:23] [ info] [task] storage_backlog/storage_backlog.2 has 0 pending task(s):
[2021/11/09 02:17:23] [ info] [task] emitter/emitter_for_rewrite_tag.6 has 2 pending task(s):
[2021/11/09 02:17:23] [ info] [task] task_id=1 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:23] [ info] [task] task_id=2 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:23] [ warn] [engine] service will stop in 5 seconds
[2021/11/09 02:17:28] [ info] [engine] service stopped
[2021/11/09 02:17:28] [ warn] [engine] shutdown delayed, grace period has finished but some tasks are still running.
[2021/11/09 02:17:28] [ info] [task] winlog/winlog.0 has 0 pending task(s):
[2021/11/09 02:17:28] [ info] [task] tail/tail.1 has 1 pending task(s):
[2021/11/09 02:17:28] [ info] [task] task_id=0 still running on route(s): stackdriver/stackdriver.1
[2021/11/09 02:17:28] [ info] [task] storage_backlog/storage_backlog.2 has 0 pending task(s):
[2021/11/09 02:17:28] [ info] [task] emitter/emitter_for_rewrite_tag.6 has 2 pending task(s):
[2021/11/09 02:17:28] [ info] [task] task_id=1 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:28] [ info] [task] task_id=2 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:28] [ warn] [engine] service will stop in 5 seconds
[2021/11/09 02:17:33] [ info] [engine] service stopped
[2021/11/09 02:17:33] [ warn] [engine] shutdown delayed, grace period has finished but some tasks are still running.
[2021/11/09 02:17:33] [ info] [task] winlog/winlog.0 has 0 pending task(s):
[2021/11/09 02:17:33] [ info] [task] tail/tail.1 has 1 pending task(s):
[2021/11/09 02:17:33] [ info] [task] task_id=0 still running on route(s): stackdriver/stackdriver.1
[2021/11/09 02:17:33] [ info] [task] storage_backlog/storage_backlog.2 has 0 pending task(s):
[2021/11/09 02:17:33] [ info] [task] emitter/emitter_for_rewrite_tag.6 has 2 pending task(s):
[2021/11/09 02:17:33] [ info] [task] task_id=1 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:33] [ info] [task] task_id=2 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:33] [ warn] [engine] service will stop in 5 seconds
[2021/11/09 02:17:38] [ info] [engine] service stopped
[2021/11/09 02:17:38] [ warn] [engine] shutdown delayed, grace period has finished but some tasks are still running.
[2021/11/09 02:17:38] [ info] [task] winlog/winlog.0 has 0 pending task(s):
[2021/11/09 02:17:38] [ info] [task] tail/tail.1 has 1 pending task(s):
[2021/11/09 02:17:38] [ info] [task] task_id=0 still running on route(s): stackdriver/stackdriver.1
[2021/11/09 02:17:38] [ info] [task] storage_backlog/storage_backlog.2 has 0 pending task(s):
[2021/11/09 02:17:38] [ info] [task] emitter/emitter_for_rewrite_tag.6 has 2 pending task(s):
[2021/11/09 02:17:38] [ info] [task] task_id=1 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:38] [ info] [task] task_id=2 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:38] [ warn] [engine] service will stop in 5 seconds
[2021/11/09 02:17:43] [ info] [engine] service stopped
[2021/11/09 02:17:43] [ warn] [engine] shutdown delayed, grace period has finished but some tasks are still running.
[2021/11/09 02:17:43] [ info] [task] winlog/winlog.0 has 0 pending task(s):
[2021/11/09 02:17:43] [ info] [task] tail/tail.1 has 1 pending task(s):
[2021/11/09 02:17:43] [ info] [task] task_id=0 still running on route(s): stackdriver/stackdriver.1
[2021/11/09 02:17:43] [ info] [task] storage_backlog/storage_backlog.2 has 0 pending task(s):
[2021/11/09 02:17:43] [ info] [task] emitter/emitter_for_rewrite_tag.6 has 2 pending task(s):
[2021/11/09 02:17:43] [ info] [task] task_id=1 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:43] [ info] [task] task_id=2 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:43] [ warn] [engine] service will stop in 5 seconds
[2021/11/09 02:17:48] [ info] [engine] service stopped
[2021/11/09 02:17:48] [ warn] [engine] shutdown delayed, grace period has finished but some tasks are still running.
[2021/11/09 02:17:48] [ info] [task] winlog/winlog.0 has 0 pending task(s):
[2021/11/09 02:17:48] [ info] [task] tail/tail.1 has 1 pending task(s):
[2021/11/09 02:17:48] [ info] [task] task_id=0 still running on route(s): stackdriver/stackdriver.1
[2021/11/09 02:17:48] [ info] [task] storage_backlog/storage_backlog.2 has 0 pending task(s):
[2021/11/09 02:17:48] [ info] [task] emitter/emitter_for_rewrite_tag.6 has 2 pending task(s):
[2021/11/09 02:17:48] [ info] [task] task_id=1 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:48] [ info] [task] task_id=2 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:48] [ warn] [engine] service will stop in 5 seconds
[2021/11/09 02:17:53] [ info] [engine] service stopped
[2021/11/09 02:17:53] [ warn] [engine] shutdown delayed, grace period has finished but some tasks are still running.
[2021/11/09 02:17:53] [ info] [task] winlog/winlog.0 has 0 pending task(s):
[2021/11/09 02:17:53] [ info] [task] tail/tail.1 has 1 pending task(s):
[2021/11/09 02:17:53] [ info] [task] task_id=0 still running on route(s): stackdriver/stackdriver.1
[2021/11/09 02:17:53] [ info] [task] storage_backlog/storage_backlog.2 has 0 pending task(s):
[2021/11/09 02:17:53] [ info] [task] emitter/emitter_for_rewrite_tag.6 has 2 pending task(s):
[2021/11/09 02:17:53] [ info] [task] task_id=1 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:53] [ info] [task] task_id=2 still running on route(s): stackdriver/stackdriver.0
[2021/11/09 02:17:53] [ warn] [engine] service will stop in 5 seconds
Your Environment
- Version used: 1.8.4 & 1.8.9 & also custom build from https://github.com/fluent/fluent-bit/tree/leonardo-1.8-issue-4262-fix
- Configuration: https://github.com/GoogleCloudPlatform/ops-agent/blob/master/confgenerator/testdata/valid/linux/all-built_in_config/golden_fluent_bit_main.conf
- Environment name and version (e.g. Kubernetes? What version?): GCE Linux and Windows VMs
- Operating System and version: Windows 2016 (most frequently seen)
Additional context
Metadata
Metadata
Assignees
Labels
No labels