Fluent Bit 1.6 ignores Grace Period #2715

PettitWesley · 2020-10-21T22:58:40Z

Bug Report

Describe the bug

Grace period is entirely ignored and fluent bit quits instantly. This happens with both 1.6.0 and 1.6.1

To Reproduce

[SERVICE]
    Log_Level debug
    Grace 30

[INPUT]
    Name dummy
    Tag  dummy

[OUTPUT]
    Name cloudwatch_logs
    Match *
    log_stream_prefix hi-hello-fluent-bitttt-
    log_group_name fluent
    region us-west-2

Send Fluent Bit a sigterm and on shutdown you see:

[2020/10/21 22:50:00] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent events to hi-hello-fluent-bitttt-0da87724773c
[2020/10/21 22:50:00] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Sent 50 events to CloudWatch
[2020/10/21 22:50:00] [debug] [task] destroy task=0x7f5ff00cb2b0 (task_id=0)
[2020/10/21 22:50:03] [engine] caught signal (SIGTERM)
[2020/10/21 22:50:03] [ info] [input] pausing forward.0

It also stops immediately, there is no grace period, no wait, it just stops.

Expected behavior
Prior to 1.6, on SigTerm there was a message and then Fluent Bit waited:

[2020/10/21 22:57:25] [debug] [http_client] server 169.254.169.254:80 will close connection #16
[2020/10/21 22:57:25] [debug] [router] match rule forward.0:cloudwatch_logs.0
[2020/10/21 22:57:25] [ info] [sp] stream processor started
[engine] caught signal (SIGTERM)
[2020/10/21 22:57:34] [ warn] [engine] service will stop in 30 seconds
[2020/10/21 22:58:04] [ info] [engine] service stopped

And it will actually wait a full 30 seconds...

Your Environment

Version used: 1.6.0, 1.6.1, latest 1.6 branch as well

Additional context

aws/aws-for-fluent-bit#81

The text was updated successfully, but these errors were encountered:

edsiper · 2020-10-24T03:19:27Z

I cannot reproduce the problem

PettitWesley · 2020-10-24T06:43:06Z

@edsiper I can reproduce it with a simple config with the stdout output:

[SERVICE]
     Grace 30

[INPUT]
     Name dummy
     Tag dummy

[OUTPUT]
     Name stdout
     Match *

I built Fluent Bit from the latest commit on branch 1.6 on a fresh EC2 instance:

./bin/fluent-bit -c ~/fluent-bit.conf

In another window, send it a SIGTERM:

$ ps -e | grep flu
27799 pts/0    00:00:00 fluent-bit
$ kill -15 27799

Fluent Bit quits immediately, the 30 second grace period is ignored.

I built branch 1.5 on the same instance, and went through the same process, and the Grace period is respected when I send Fluent Bit a SIGTERM.

Output continues and you get a message saying that Fluent Bit will quit soon (in 1.6 this message no longer appears):

[0] dummy: [1603521497.377335622, {"message"=>"dummy"}]
[1] dummy: [1603521498.377261063, {"message"=>"dummy"}]
[2] dummy: [1603521499.377258684, {"message"=>"dummy"}]
[3] dummy: [1603521500.377250324, {"message"=>"dummy"}]
[4] dummy: [1603521501.377252626, {"message"=>"dummy"}]
[engine] caught signal (SIGTERM)
[0] dummy: [1603521502.377294435, {"message"=>"dummy"}]
[2020/10/24 06:38:23] [ warn] [engine] service will stop in 30 seconds
[0] dummy: [1603521503.377260033, {"message"=>"dummy"}]
[1] dummy: [1603521504.377255466, {"message"=>"dummy"}]
...
[3] dummy: [1603521525.377258357, {"message"=>"dummy"}]
[4] dummy: [1603521526.377261706, {"message"=>"dummy"}]
[2020/10/24 06:38:52] [ info] [engine] service stopped

I went through commits to 1.6 and checked whether or not they have this issue; this commit added it: 70771be

And I checked that if you remove the call to _exit on 1.6 then the grace period returns. However, the "double free" issue on shut down returns (which I originally alerted you too before the 1.6.0 release).

We have a large number of AWS customers who have been unable to upgrade to 1.6.x because of this issue.

edsiper · 2020-10-29T03:05:36Z

@PettitWesley I am trying to reproduce the issue described with 1.6 branch (v1.6.2) and this is the output:

$ ../../build/bin/fluent-bit -c conf 
Fluent Bit v1.6.2
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/10/28 21:01:40] [ info] [engine] started (pid=1719983)
[2020/10/28 21:01:40] [ info] [storage] version=1.0.6, initializing...
[2020/10/28 21:01:40] [ info] [storage] in-memory
[2020/10/28 21:01:40] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/10/28 21:01:40] [ info] [sp] stream processor started
[0] dummy: [1603940501.354723113, {"message"=>"dummy"}]
[1] dummy: [1603940502.354722239, {"message"=>"dummy"}]
[2] dummy: [1603940503.354688760, {"message"=>"dummy"}]
[3] dummy: [1603940504.354833207, {"message"=>"dummy"}]
[2020/10/28 21:01:48] [engine] caught signal (SIGTERM)
[0] dummy: [1603940505.354638619, {"message"=>"dummy"}]
[1] dummy: [1603940506.356208382, {"message"=>"dummy"}]
[2] dummy: [1603940507.354874306, {"message"=>"dummy"}]
[3] dummy: [1603940508.354687246, {"message"=>"dummy"}]
[2020/10/28 21:01:48] [ warn] [engine] service will stop in 30 seconds
[2020/10/28 21:02:18] [ info] [engine] service stopped

message was triggered 21:01:48 and service stopped at 21:02:18, 30 seconds exactly.

which distro and compiler version are you testing this on ?

github-actions · 2021-03-12T02:13:35Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

giuliano-sider · 2023-04-11T20:46:05Z

If I docker run fluent/fluent-bit:1.8.12 with the following config,

[SERVICE]
    Grace         30
    Log_Level  debug

[INPUT]
    Name dummy
    Tag dummy

[INPUT]
    Name             tail
    Tag              plain_old_text.txt
    Path             /testdata/plain_old_text.txt
    DB               /plain_old_text.db
    Skip_Long_Lines  On
    Refresh_Interval 5
    Read_from_Head   True

[OUTPUT]
    Name stdout
    Match *

in which /testdata/plain_old_text.txt is a file where I write "hi\n" every second, I get the following after sending a termination signal:

^C[2023/04/11 20:38:31] [engine] caught signal (SIGINT)
[2023/04/11 20:38:31] [ info] [input] pausing tail.1
[2023/04/11 20:38:31] [debug] [task] created task=0x7fb146c37a00 id=0 OK
[0] dummy: [1681245511.154544603, {"message"=>"dummy"}]
[2023/04/11 20:38:31] [ warn] [engine] service will shutdown in max 30 seconds
[2023/04/11 20:38:31] [debug] [out coro] cb_destroy coro_id=8
[2023/04/11 20:38:31] [debug] [task] destroy task=0x7fb146c37a00 (task_id=0)
[2023/04/11 20:38:32] [ info] [engine] service has stopped (0 pending tasks)
[2023/04/11 20:38:32] [debug] [input:tail:tail.1] inode=8280729 removing file name /testdata/plain_old_text.txt
[2023/04/11 20:38:32] [ info] [input:tail:tail.1] inotify_fs_remove(): inode=8280729 watch_fd=1

It pauses the input plugin immediately after getting a signal. It exits immediately after flushing whatever has been ingested (it doesn't wait for the 30s grace period, claiming that there are "0 pending tasks". Is this expected behavior?

When I docker run fluent/fluent-bit:1.6.1, I get similar behavior: it pauses the input plugin. It does wait the 30s grace period, rather uselessly since it's not ingesting logs anymore. Is that expected behavior as well? Why is there a difference between the 2?

This was referenced Oct 28, 2020

Fluent Bit 1.6 - ES Plugin: Failed to source credential on Amazon EKS IAM Roles for Service Account #2714

Closed

Failed to source credential on Amazon EKS IAM Roles for Service Account aws/aws-for-fluent-bit#83

Closed

github-actions bot added the Stale label Mar 12, 2021

PettitWesley closed this as completed Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluent Bit 1.6 ignores Grace Period #2715

Fluent Bit 1.6 ignores Grace Period #2715

PettitWesley commented Oct 21, 2020

edsiper commented Oct 24, 2020

PettitWesley commented Oct 24, 2020

edsiper commented Oct 29, 2020

github-actions bot commented Mar 12, 2021

giuliano-sider commented Apr 11, 2023 •

edited

Loading

Fluent Bit 1.6 ignores Grace Period #2715

Fluent Bit 1.6 ignores Grace Period #2715

Comments

PettitWesley commented Oct 21, 2020

Bug Report

edsiper commented Oct 24, 2020

PettitWesley commented Oct 24, 2020

edsiper commented Oct 29, 2020

github-actions bot commented Mar 12, 2021

giuliano-sider commented Apr 11, 2023 • edited Loading

giuliano-sider commented Apr 11, 2023 •

edited

Loading