Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost event with fluentd logging driver #43690

Open
mdraganovic opened this issue Jun 3, 2022 · 1 comment
Open

Lost event with fluentd logging driver #43690

mdraganovic opened this issue Jun 3, 2022 · 1 comment
Labels
area/logging kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/20.10

Comments

@mdraganovic
Copy link

Description
Lost event while using fluentd as the logging driver configured with "fluentd-async": "true".

Steps to reproduce the issue:
Test configuration incudes docker configured with fluentd as the logging driver (async môde), fluent bit installed on the same host as a log aggregator and a sample app that actually generates some logs.

For the sample app we use a simple nginx web container. This works perfectly since it simply generates a log event for every received HTTP request:

docker run -it --rm -d -p 8080:80 --name web nginx

We start by verify that our app is alive and reachable. A couple of initial requests are visible in container logs and fluent bit output aswell:

# Some external client:
[root@localhost ~]# curl http://centos7:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

[root@centos7 ~]# docker logs -f web
...
192.168.56.2 - - [03/Jun/2022:13:53:21 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/7.79.1" "-"        <<<
192.168.56.2 - - [03/Jun/2022:13:53:22 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/7.79.1" "-"        <<<
192.168.56.2 - - [03/Jun/2022:13:53:24 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/7.79.1" "-"        <<<

[root@centos7 ~]# tail -f /var/log/fluent-bit/fluent-bit.log
...

{"container_id":"7f12d43f3485d591be30b1ce9e669fa442b4920941fda9042a9b97c1763ab957","container_name":"/web","source":"stdout","log":"192.168.56.2 - - [03/Jun/2022:13:53:21 +0000] \"GET / HTTP/1.1\" 200 615 \"-\" \"curl/7.79.1\" \"-\"\r"}      <<<
{"log":"192.168.56.2 - - [03/Jun/2022:13:53:22 +0000] \"GET / HTTP/1.1\" 200 615 \"-\" \"curl/7.79.1\" \"-\"\r","container_id":"7f12d43f3485d591be30b1ce9e669fa442b4920941fda9042a9b97c1763ab957","container_name":"/web","source":"stdout"}      <<<
{"container_id":"7f12d43f3485d591be30b1ce9e669fa442b4920941fda9042a9b97c1763ab957","container_name":"/web","source":"stdout","log":"192.168.56.2 - - [03/Jun/2022:13:53:24 +0000] \"GET / HTTP/1.1\" 200 615 \"-\" \"curl/7.79.1\" \"-\"\r"}      <<<

Next, we stop fluent bit and send only one request which is, as expected, visible in container logs, but doesn't get propagated to fluent bit:

[root@centos7 ~]# systemctl stop fluent-bit

[root@localhost ~]# curl http://centos7:8080
...

[root@centos7 ~]# docker logs -f web
...

192.168.56.2 - - [03/Jun/2022:13:54:53 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/7.79.1" "-"        <<<

If we look at tcpdump output we can see a reset packet as the last step of commumnication. This is, of course, expected since fluent bit is down:

[root@centos7 ~]# tcpdump -nn -i any port 24224
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
...
15:54:53.782412 IP 127.0.0.1.50798 > 127.0.0.1.24224: Flags [P.], seq 700:933, ack 2, win 342, options [nop,nop,TS val 418717 ecr 408104], length 233
15:54:53.782423 IP 127.0.0.1.24224 > 127.0.0.1.50798: Flags [R], seq 2881160003, win 0, length 0        <<<

At this point, if no other requests/log events are generated, everything maintains a status quo. Tcpdump will show no traces of any other traffic from docker, even if we restart fluent bit.

Let's continue by reactivating fluent bit. Again, nothing happens until we generate new request(s). After we do generate another request, we can see that it's regularly logged, but that one event, which was sent after fluent bit was stopped, seems to be lost:

[root@centos7 ~]# docker logs -f web
...
192.168.56.2 - - [03/Jun/2022:13:54:53 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/7.79.1" "-"        # Doesn't reach fluent bit

192.168.56.2 - - [03/Jun/2022:13:57:42 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/7.79.1" "-"        <<< This one does

[root@centos7 ~]# tail -f /var/log/fluent-bit/fluent-bit.log 
...

{"container_id":"7f12d43f3485d591be30b1ce9e669fa442b4920941fda9042a9b97c1763ab957","container_name":"/web","source":"stdout","log":"192.168.56.2 - - [03/Jun/2022:13:57:42 +0000] \"GET / HTTP/1.1\" 200 615 \"-\" \"curl/7.79.1\" \"-\"\r"}        <<<

Detailed tcpdump output confirms that only one log message is sent from docker's end.

Furthermore, if we repeat previous steps, but send multiple requests while fluent bit is offline, by looking at the tcpdump output we can see that docker starts to periodically probe our log destination while trying to resend logs which where buffered, just as we assume it would do:

[root@centos7 ~]# tcpdump -nn -i any port 24224
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
...
15:59:04.772410 IP 127.0.0.1.50802 > 127.0.0.1.24224: Flags [P.], seq 234:467, ack 2, win 342, options [nop,nop,TS val 669707 ecr 659995], length 233
15:59:04.772430 IP 127.0.0.1.24224 > 127.0.0.1.50802: Flags [R], seq 3160989152, win 0, length 0
15:59:16.894567 IP6 ::1.57082 > ::1.24224: Flags [S], seq 548473875, win 43690, options [mss 65476,sackOK,TS val 681829 ecr 0,nop,wscale 7], length 0
15:59:16.894588 IP6 ::1.24224 > ::1.57082: Flags [R.], seq 0, ack 548473876, win 0, length 0
15:59:16.894670 IP 127.0.0.1.50806 > 127.0.0.1.24224: Flags [S], seq 1148747117, win 43690, options [mss 65495,sackOK,TS val 681829 ecr 0,nop,wscale 7], length 0
15:59:16.894678 IP 127.0.0.1.24224 > 127.0.0.1.50806: Flags [R.], seq 0, ack 1148747118, win 0, length 0
15:59:16.894800 IP6 ::1.57086 > ::1.24224: Flags [S], seq 2186123725, win 43690, options [mss 65476,sackOK,TS val 681829 ecr 0,nop,wscale 7], length 0
15:59:16.894805 IP6 ::1.24224 > ::1.57086: Flags [R.], seq 0, ack 2186123726, win 0, length 0
15:59:16.894834 IP 127.0.0.1.50810 > 127.0.0.1.24224: Flags [S], seq 1548338296, win 43690, options [mss 65495,sackOK,TS val 681829 ecr 0,nop,wscale 7], length 0
15:59:17.895655 IP6 ::1.57090 > ::1.24224: Flags [S], seq 1446273335, win 43690, options [mss 65476,sackOK,TS val 682831 ecr 0,nop,wscale 7], length 0
15:59:17.895673 IP6 ::1.24224 > ::1.57090: Flags [R.], seq 0, ack 1446273336, win 0, length 0
15:59:17.895746 IP 127.0.0.1.50814 > 127.0.0.1.24224: Flags [S], seq 3221056790, win 43690, options [mss 65495,sackOK,TS val 682831 ecr 0,nop,wscale 7], length 0
15:59:17.895760 IP 127.0.0.1.24224 > 127.0.0.1.50814: Flags [R.], seq 0, ack 3221056791, win 0, length 0
15:59:18.896754 IP6 ::1.57094 > ::1.24224: Flags [S], seq 1103498856, win 43690, options [mss 65476,sackOK,TS val 683831 ecr 0,nop,wscale 7], length 0
15:59:18.896807 IP6 ::1.24224 > ::1.57094: Flags [R.], seq 0, ack 1103498857, win 0, length 0
15:59:18.896929 IP 127.0.0.1.50818 > 127.0.0.1.24224: Flags [S], seq 4128375179, win 43690, options [mss 65495,sackOK,TS val 683832 ecr 0,nop,wscale 7], length 0
15:59:18.896945 IP 127.0.0.1.24224 > 127.0.0.1.50818: Flags [R.], seq 0, ack 4128375180, win 0, length 0
...

After we activate fluent bit again, all buffered logs are resent and captured as expected, except the one that was generated immediately after fluent bit was stopped.

So, it seems that there is this one log message that actually does get lost if our logging destination is unavailable for whatever reason. It doesn't look as much, but I'd assume that this is not expected behaviour.

Describe the results you received:
Lost event in case of log destination outage.

Describe the results you expected:
Docker buffers all log events and preserves them while log destionation is unavailable.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

[root@centos7 ~]# docker version
Client: Docker Engine - Community
 Version:           20.10.16
 API version:       1.41
 Go version:        go1.17.10
 Git commit:        aa7e414
 Built:             Thu May 12 09:19:45 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.16
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.10
  Git commit:       f756502
  Built:            Thu May 12 09:18:08 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.4
  GitCommit:        212e8b6fa2f44b9c21b2798135fc6fb7c53efc16
 runc:
  Version:          1.1.1
  GitCommit:        v1.1.1-0-g52de29d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

[root@centos7 ~]# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.8.2-docker)
  compose: Docker Compose (Docker Inc., v2.5.0)
  scan: Docker Scan (Docker Inc., v0.17.0)

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 20.10.16
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: fluentd
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16
 runc version: v1.1.1-0-g52de29d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-1160.66.1.el7.x86_64
 Operating System: CentOS Linux 7 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 990.9MiB
 Name: centos7.local
 ID: B5VF:LLMI:7FBJ:XJJE:ZB6H:YFH2:CG7G:BAXV:MPJK:LHI3:LMNZ:SW3T
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http://192.168.56.2:3128
 HTTPS Proxy: http://192.168.56.2:3128
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Docker deamon settings:

[root@centos7 ~]# cat /etc/docker/daemon.json 
 {
   "log-driver": "fluentd",
   "log-opts": {
     "fluentd-address": "localhost:24224",
     "fluentd-async": "true",
     "fluentd-buffer-limit": "1000000"
   }
 }

Fluent bit configuration:

[root@centos7 ~]# tail -n25 /etc/fluent-bit/fluent-bit.conf 
    # not delivered and are still in the storage layer, these are called
    # backlog data. This option configure a hint of maximum value of memory
    # to use when processing these records.
    #
    # storage.backlog.mem_limit 5M

[INPUT]
    Name forward
    Listen 0.0.0.0
    port 24224

[FILTER]
    Name parser
    Match *
    Parser docker
    Key_Name log
    Reserve_Data On

[OUTPUT]
    Name file
    Match *
    Path /var/log/fluent-bit
    File fluent-bit.log
    Format plain

Additional environment details (AWS, VirtualBox, physical, etc.):
VMware, VirtualBox

@thaJeztah thaJeztah added area/logging version/20.10 kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. labels Jun 30, 2022
@minamijoyo
Copy link
Contributor

Hi @thaJeztah, This issue has been labelled version/20.10, though, I can reproduce the similar problem even with the current latest Docker 23.0.1 + fluent-bit v2.0.10 with --log-opt fluentd-request-ack=true.

There was an issue with older versions of the fluent-bit implementation not handling SIGTERM properly, but this has been fixed in fluent-bit v1.8.10. (see: fluent/fluent-bit#2610)
However, I can still see some records being lost on shutdown and I think there is another problem on the side of Docker's fluentd log driver.

Step to reproduce:

# docker version
# docker version
Client: Docker Engine - Community
 Version:           23.0.1
 API version:       1.42
 Go version:        go1.19.5
 Git commit:        a5ee5b1
 Built:             Thu Feb  9 19:47:01 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.1
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.5
  Git commit:       bc3805a
  Built:            Thu Feb  9 19:47:01 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.19
  GitCommit:        1e1ea6e986c6c86565bc33d52e34b81b3e2bc71f
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

# cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Start a fluent-bit container:

# docker run -it --rm --name flb -p 24224:24224 fluent/fluent-bit:2.0.10 /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
# docker run -it --rm --name flb -p 24224:24224 fluent/fluent-bit:2.0.10 /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
Fluent Bit v2.0.10
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/03/28 05:45:26] [ info] Configuration:
[2023/03/28 05:45:26] [ info]  flush time     | 1.000000 seconds
[2023/03/28 05:45:26] [ info]  grace          | 5 seconds
[2023/03/28 05:45:26] [ info]  daemon         | 0
[2023/03/28 05:45:26] [ info] ___________
[2023/03/28 05:45:26] [ info]  inputs:
[2023/03/28 05:45:26] [ info]      forward
[2023/03/28 05:45:26] [ info] ___________
[2023/03/28 05:45:26] [ info]  filters:
[2023/03/28 05:45:26] [ info] ___________
[2023/03/28 05:45:26] [ info]  outputs:
[2023/03/28 05:45:26] [ info]      stdout.0
[2023/03/28 05:45:26] [ info] ___________
[2023/03/28 05:45:26] [ info]  collectors:
[2023/03/28 05:45:26] [ info] [fluent bit] version=2.0.10, commit=da2275435c, pid=1
[2023/03/28 05:45:26] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2023/03/28 05:45:26] [ info] [storage] ver=1.4.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/03/28 05:45:26] [ info] [cmetrics] version=0.5.8
[2023/03/28 05:45:26] [ info] [ctraces ] version=0.2.7
[2023/03/28 05:45:26] [ info] [input:forward:forward.0] initializing
[2023/03/28 05:45:26] [ info] [input:forward:forward.0] storage_strategy='memory' (memory only)
[2023/03/28 05:45:26] [debug] [forward:forward.0] created event channels: read=21 write=22
[2023/03/28 05:45:26] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2023/03/28 05:45:26] [debug] [downstream] listening on 0.0.0.0:24224
[2023/03/28 05:45:26] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2023/03/28 05:45:26] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2023/03/28 05:45:26] [ info] [sp] stream processor started
[2023/03/28 05:45:26] [ info] [output:stdout:stdout.0] worker #0 started
[2023/03/28 05:45:38] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:45:38] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:45:39] [debug] [task] created task=0x7f9160648c30 id=0 OK
[2023/03/28 05:45:39] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] a8f7bf62f610: [1679982338.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e"}] "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"Tue Mar 28 05:45:38 UTC 2023
[1] a8f7bf62f610: [1679982338.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e"}] "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2023/03/28 05:45:39] [debug] [out flush] cb_destroy coro_id=0
[2023/03/28 05:45:39] [debug] [task] destroy task=0x7f9160648c30 (task_id=0)
[2023/03/28 05:45:39] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:45:39] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:45:40] [debug] [task] created task=0x7f9160648c30 id=0 OK
[2023/03/28 05:45:40] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] a8f7bf62f610: [1679982339.000000000, {"container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"Tue Mar 28 05:45", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2"}]
[1] a8f7bf62f610: [1679982339.000000000, {"container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"flb (172.17.0.2:", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2"}]
[2023/03/28 05:45:40] [debug] [out flush] cb_destroy coro_id=1
[2023/03/28 05:45:40] [debug] [task] destroy task=0x7f9160648c30 (task_id=0)
[2023/03/28 05:45:40] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:45:40] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:45:41] [debug] [task] created task=0x7f9160648c30 id=0 OK
[2023/03/28 05:45:41] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch"}]
[1] a8f7bf62f610: [1679982340.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e"}] "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2023/03/28 05:45:41] [debug] [out flush] cb_destroy coro_id=2
[2023/03/28 05:45:41] [debug] [task] destroy task=0x7f9160648c30 (task_id=0)
[2023/03/28 05:45:41] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:45:41] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:45:42] [debug] [task] created task=0x7f9160648c30 id=0 OK
[2023/03/28 05:45:42] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch"}]
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch", "source"=>"stdout"}]
[2023/03/28 05:45:42] [debug] [out flush] cb_destroy coro_id=3
[2023/03/28 05:45:42] [debug] [task] destroy task=0x7f9160648c30 (task_id=0)
[2023/03/28 05:45:42] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:45:42] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:45:43] [debug] [task] created task=0x7f9160648c30 id=0 OK
[2023/03/28 05:45:43] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] a8f7bf62f610: [1679982342.000000000, {"container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"Tue Mar 28 05:45", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2"}]
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch"}]
[2023/03/28 05:45:43] [debug] [out flush] cb_destroy coro_id=4
[2023/03/28 05:45:43] [debug] [task] destroy task=0x7f9160648c30 (task_id=0)
[2023/03/28 05:45:43] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:45:43] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:45:44] [debug] [task] created task=0x7f9160648c30 id=0 OK
[2023/03/28 05:45:44] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch"}]
[1] a8f7bf62f610: [1679982343.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e"}] "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2023/03/28 05:45:44] [debug] [out flush] cb_destroy coro_id=5
[2023/03/28 05:45:44] [debug] [task] destroy task=0x7f9160648c30 (task_id=0)
[2023/03/28 05:45:44] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:45:44] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:45:45] [debug] [task] created task=0x7f9160648c30 id=0 OK
[2023/03/28 05:45:45] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] a8f7bf62f610: [1679982344.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e"}] "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"Tue Mar 28 05:45:44 UTC 2023
[1] a8f7bf62f610: [1679982344.000000000, {"container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"flb (172.17.0.2:", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2"}]
[2023/03/28 05:45:45] [debug] [out flush] cb_destroy coro_id=6
[2023/03/28 05:45:45] [debug] [task] destroy task=0x7f9160648c30 (task_id=0)
[2023/03/28 05:45:45] [engine] caught signal (SIGTERM)
[2023/03/28 05:45:45] [ warn] [engine] service will shutdown in max 5 seconds
[2023/03/28 05:45:45] [ info] [input] pausing forward.0
[2023/03/28 05:45:46] [ info] [engine] service has stopped (0 pending tasks)
[2023/03/28 05:45:46] [ info] [input] pausing forward.0
[2023/03/28 05:45:46] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2023/03/28 05:45:46] [ info] [output:stdout:stdout.0] thread worker #0 stopped

Start a nc container with --log-opt fluentd-request-ack=true:

# docker run -it --rm --log-driver=fluentd --log-opt fluentd-address=127.0.0.1:24224 --log-opt fluentd-request-ack=true --link flb busybox sh -c 'while true; do date; nc -vz flb 24224; sleep 1; done'
# docker run -it --rm --log-driver=fluentd --log-opt fluentd-address=127.0.0.1:24224 --log-opt fluentd-request-ack=true --link flb busybox sh -c 'while true; do date; nc -vz flb 24224; sleep 1; done'
Tue Mar 28 05:45:38 UTC 2023
flb (172.17.0.2:24224) open
Tue Mar 28 05:45:39 UTC 2023
flb (172.17.0.2:24224) open
Tue Mar 28 05:45:40 UTC 2023
flb (172.17.0.2:24224) open
Tue Mar 28 05:45:41 UTC 2023
flb (172.17.0.2:24224) open
Tue Mar 28 05:45:42 UTC 2023
flb (172.17.0.2:24224) open
Tue Mar 28 05:45:43 UTC 2023
flb (172.17.0.2:24224) open
Tue Mar 28 05:45:44 UTC 2023
flb (172.17.0.2:24224) open
Tue Mar 28 05:45:45 UTC 2023
flb (172.17.0.2:24224) open
Tue Mar 28 05:45:46 UTC 2023
nc: flb (172.17.0.2:24224): No route to host
Tue Mar 28 05:46:55 UTC 2023
nc: flb (172.17.0.2:24224): No route to host
Tue Mar 28 05:46:59 UTC 2023
nc: flb (172.17.0.2:24224): No route to host
Tue Mar 28 05:47:02 UTC 2023
nc: flb (172.17.0.2:24224): No route to host
Tue Mar 28 05:47:05 UTC 2023
flb (172.17.0.2:24224) open
Tue Mar 28 05:47:07 UTC 2023
flb (172.17.0.2:24224) open

Stop the fluent-bit container:

# date -u; time docker stop -t 10 flb; date -u
# date -u; time docker stop -t 10 flb; date -u
Tue Mar 28 05:45:45 UTC 2023
flb

real    0m1.131s
user    0m0.035s
sys     0m0.040s
Tue Mar 28 05:45:46 UTC 2023

Start a new fluent-bit container:

# docker run -it --rm --name flb -p 24224:24224 fluent/fluent-bit:2.0.10 /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
# docker run -it --rm --name flb -p 24224:24224 fluent/fluent-bit:2.0.10 /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
Fluent Bit v2.0.10
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/03/28 05:47:05] [ info] Configuration:
[2023/03/28 05:47:05] [ info]  flush time     | 1.000000 seconds
[2023/03/28 05:47:05] [ info]  grace          | 5 seconds
[2023/03/28 05:47:05] [ info]  daemon         | 0
[2023/03/28 05:47:05] [ info] ___________
[2023/03/28 05:47:05] [ info]  inputs:
[2023/03/28 05:47:05] [ info]      forward
[2023/03/28 05:47:05] [ info] ___________
[2023/03/28 05:47:05] [ info]  filters:
[2023/03/28 05:47:05] [ info] ___________
[2023/03/28 05:47:05] [ info]  outputs:
[2023/03/28 05:47:05] [ info]      stdout.0
[2023/03/28 05:47:05] [ info] ___________
[2023/03/28 05:47:05] [ info]  collectors:
[2023/03/28 05:47:05] [ info] [fluent bit] version=2.0.10, commit=da2275435c, pid=1
[2023/03/28 05:47:05] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2023/03/28 05:47:05] [ info] [storage] ver=1.4.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/03/28 05:47:05] [ info] [cmetrics] version=0.5.8
[2023/03/28 05:47:05] [ info] [ctraces ] version=0.2.7
[2023/03/28 05:47:05] [ info] [input:forward:forward.0] initializing
[2023/03/28 05:47:05] [ info] [input:forward:forward.0] storage_strategy='memory' (memory only)
[2023/03/28 05:47:05] [debug] [forward:forward.0] created event channels: read=21 write=22
[2023/03/28 05:47:05] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2023/03/28 05:47:05] [debug] [downstream] listening on 0.0.0.0:24224
[2023/03/28 05:47:05] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2023/03/28 05:47:05] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2023/03/28 05:47:05] [ info] [sp] stream processor started
[2023/03/28 05:47:05] [ info] [output:stdout:stdout.0] worker #0 started
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=181
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=181
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=181
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=181
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:36] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:37] [debug] [input chunk] update output instances with new chunk size diff=164
[2023/03/28 05:47:37] [debug] [input chunk] update output instances with new chunk size diff=163
[2023/03/28 05:47:37] [debug] [task] created task=0x7f387a448c30 id=0 OK
[2023/03/28 05:47:37] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch"}]
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch", "source"=>"stdout"}]
[2] a8f7bf62f610: [1679982456.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e"}] "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"Tue Mar 28 05:45:46 UTC 2023
[3] a8f7bf62f610: [1679982456.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e"}] "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"nc: flb (172.17.0.2:24224): No route to host
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch"}]
[5] a8f7bf62f610: [1679982456.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e"}] "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"nc: flb (172.17.0.2:24224): No route to host
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch"}]
[7] a8f7bf62f610: [1679982456.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e"}] "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"nc: flb (172.17.0.2:24224): No route to host
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch", "source"=>"stdout"}]
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch", "source"=>"stdout"}]
[10] a8f7bf62f610: [1679982456.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8"}], "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"Tue Mar 28 05:47:05 UTC 2023
[11] a8f7bf62f610: [1679982456.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8"}], "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch"}]
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch", "source"=>"stdout"}]
[14] a8f7bf62f610: [1679982456.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8"}], "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"Tue Mar 28 05:47:08 UTC 2023
[15] a8f7bf62f610: [1679982456.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8"}], "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch", "source"=>"stdout"}]
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch", "source"=>"stdout"}]
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch", "source"=>"stdout"}]

According to the above output, the Docker tried to stop the fluent-bit container at 05:45:45. The fluent-bit caught signal (SIGTERM) at 05:45:45, and stopped at 05:45:46.
The timestamp of the last record received before stop was 05:45:44.

[0] a8f7bf62f610: [1679982344.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e"}] "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"Tue Mar 28 05:45:44 UTC 2023
[1] a8f7bf62f610: [1679982344.000000000, {"container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"flb (172.17.0.2:", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2"}]

And the timestamp of the earliest record received after the restart was 05:45:46.

[2] a8f7bf62f610: [1679982456.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e"}] "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"Tue Mar 28 05:45:46 UTC 2023
[3] a8f7bf62f610: [1679982456.000000000, {"container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e"}] "container_name"=>"/awesome_kirch", "source"=>"stdout", "log"=>"nc: flb (172.17.0.2:24224): No route to host
", "container_id"=>"a8f7bf62f610d656cf85424c123c81eafea5b98ce0f6421f7b1bfc10f3a3c8e2", "container_name"=>"/awesome_kirch"}]

We can see that the record with timestamp 05:45:45 has been lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/20.10
Projects
None yet
Development

No branches or pull requests

3 participants