Add FIFO queue persistent buffering for fluent bit output plugin #2142

dojci · 2020-05-27T21:21:20Z

What this PR does / why we need it:
Buffering refers to the ability to store the records somewhere, and while they are processed and delivered, still be able to store more. Loki output plugin in ceratain situation can be blocked by loki client because of design:

BatchSize is over limit
Loki server is unreachable (retry 429s, 500s and connection-level errors)

The blocking state with some of the input plugins is not acceptable because it can have a undesirable side effects on the part that generates the logs. Fluent Bit implements buffering mechanism that is based on parallel processing and it cannot send logs in order.

This PR implements buffering mechanism based on dque which is compatible with loki server strict time ordering and can be set up by configuration flag:

[Output]
    Name loki
    Match *
    Url http://localhost:3100/loki/api/v1/push
    Buffer true
    BufferType dque
    DqueSegmentSize 8096
    DqueSync false
    DqueDir /tmp/flb-storage/buffer
    DqueName loki.0

Which issue(s) this PR fixes:
none

Special notes for your reviewer:
CPU and memory consumption was tested with high throughput/volume logging without observable overhead.

TBD: Tests (any ideas for acceptable test ?)

Checklist

Documentation added
Tests updated

CLAassistant · 2020-05-27T21:21:25Z

All committers have signed the CLA.

codecov-commenter · 2020-05-27T21:34:09Z

Codecov Report

Merging #2142 into master will decrease coverage by 0.40%.
The diff coverage is 13.54%.

@@            Coverage Diff             @@
##           master    #2142      +/-   ##
==========================================
- Coverage   61.25%   60.85%   -0.41%     
==========================================
  Files         146      149       +3     
  Lines       11196    11291      +95     
==========================================
+ Hits         6858     6871      +13     
- Misses       3793     3871      +78     
- Partials      545      549       +4

Impacted Files	Coverage Δ
cmd/fluent-bit/buffer.go	`0.00% <0.00%> (ø)`
cmd/fluent-bit/client.go	`0.00% <0.00%> (ø)`
cmd/fluent-bit/dque.go	`0.00% <0.00%> (ø)`
cmd/fluent-bit/loki.go	`82.75% <0.00%> (ø)`
cmd/fluent-bit/out_loki.go	`5.40% <0.00%> (-0.40%)`	⬇️
cmd/fluent-bit/config.go	`78.76% <41.93%> (-13.93%)`	⬇️

cmd/fluent-bit/README.md

cmd/fluent-bit/client.go

cmd/fluent-bit/config.go

cmd/fluent-bit/dque.go

cyriltovena · 2020-06-01T13:32:20Z

Can you remove the binary file located here please : cmd/fluent-bit/fluent-bit

cmd/fluent-bit/dque.go

cyriltovena

LGTM.

I've left some minor comments, if you take a look and fix them that would be great.

I'm wondering if we should also make those available in the helm chart located at /production/helm/fluent-bit/. I also feel like we might need to mount a host path if this feature is used so that the queue is restarted back from where it was left.

WDYT ?

dojci · 2020-06-01T15:05:32Z

LGTM.

I've left some minor comments, if you take a look and fix them that would be great.

Done

I'm wondering if we should also make those available in the helm chart located at /production/helm/fluent-bit/. I also feel like we might need to mount a host path if this feature is used so that the queue is restarted back from where it was left.

WDYT ?

I don't know if it makes sense to allow buffering within default setup. In the production helm chart, fluent-bit use tail as input plugin:

loki/production/helm/fluent-bit/templates/configmap.yaml

Lines 21 to 27 in 12c7eab

    
               [INPUT] 
        
                   Name           tail 
        
                   Tag            kube.* 
        
                   Path           /var/log/containers/*.log 
        
                   Parser         docker 
        
                   DB             /run/fluent-bit/flb_kube.db 
        
                   Mem_Buf_Limit  5MB

DB flag is used to keep track of monitored files and offsets. So when loki is down or another unexpected condition occurs there is no need to use buffering at all (maybe some edge cases ?). Buffering is primarily intended for a plugin such as stdin where blocking is usually undesirable. Anyway, I can also adjust the helm chart configuration to be able to use buffering.

IMHO host path is already set for the FLB:

loki/production/helm/fluent-bit/templates/daemonset.yaml

Lines 69 to 71 in 12c7eab

    
           - name: run 
        
             hostPath: 
        
               path: /run/fluent-bit

cmd/fluent-bit/Dockerfile

cyriltovena

Really thank you for this !

Add FIFO queue persistent buffering for fluent bit output plugin

b7eb05e

dojci mentioned this pull request May 27, 2020

Add Loki output plugin fluent/fluent-bit#994

Closed

Fix configuration names in logging

09cfc56

cyriltovena self-assigned this Jun 1, 2020