Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FIFO queue persistent buffering for fluent bit output plugin #2142

Merged
merged 3 commits into from
Jun 1, 2020

Conversation

dojci
Copy link
Contributor

@dojci dojci commented May 27, 2020

What this PR does / why we need it:
Buffering refers to the ability to store the records somewhere, and while they are processed and delivered, still be able to store more. Loki output plugin in ceratain situation can be blocked by loki client because of design:

  • BatchSize is over limit
  • Loki server is unreachable (retry 429s, 500s and connection-level errors)

The blocking state with some of the input plugins is not acceptable because it can have a undesirable side effects on the part that generates the logs. Fluent Bit implements buffering mechanism that is based on parallel processing and it cannot send logs in order.

This PR implements buffering mechanism based on dque which is compatible with loki server strict time ordering and can be set up by configuration flag:

[Output]
    Name loki
    Match *
    Url http://localhost:3100/loki/api/v1/push
    Buffer true
    BufferType dque
    DqueSegmentSize 8096
    DqueSync false
    DqueDir /tmp/flb-storage/buffer
    DqueName loki.0

Which issue(s) this PR fixes:
none

Special notes for your reviewer:
CPU and memory consumption was tested with high throughput/volume logging without observable overhead.

TBD: Tests (any ideas for acceptable test ?)

Checklist

  • Documentation added
  • Tests updated

@CLAassistant
Copy link

CLAassistant commented May 27, 2020

CLA assistant check
All committers have signed the CLA.

@codecov-commenter
Copy link

codecov-commenter commented May 27, 2020

Codecov Report

Merging #2142 into master will decrease coverage by 0.40%.
The diff coverage is 13.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2142      +/-   ##
==========================================
- Coverage   61.25%   60.85%   -0.41%     
==========================================
  Files         146      149       +3     
  Lines       11196    11291      +95     
==========================================
+ Hits         6858     6871      +13     
- Misses       3793     3871      +78     
- Partials      545      549       +4     
Impacted Files Coverage Δ
cmd/fluent-bit/buffer.go 0.00% <0.00%> (ø)
cmd/fluent-bit/client.go 0.00% <0.00%> (ø)
cmd/fluent-bit/dque.go 0.00% <0.00%> (ø)
cmd/fluent-bit/loki.go 82.75% <0.00%> (ø)
cmd/fluent-bit/out_loki.go 5.40% <0.00%> (-0.40%) ⬇️
cmd/fluent-bit/config.go 78.76% <41.93%> (-13.93%) ⬇️

@cyriltovena cyriltovena self-assigned this Jun 1, 2020
cmd/fluent-bit/dque.go Outdated Show resolved Hide resolved
cmd/fluent-bit/dque.go Outdated Show resolved Hide resolved
cmd/fluent-bit/dque.go Outdated Show resolved Hide resolved
@cyriltovena
Copy link
Contributor

Can you remove the binary file located here please : cmd/fluent-bit/fluent-bit

cmd/fluent-bit/dque.go Outdated Show resolved Hide resolved
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I've left some minor comments, if you take a look and fix them that would be great.

I'm wondering if we should also make those available in the helm chart located at /production/helm/fluent-bit/. I also feel like we might need to mount a host path if this feature is used so that the queue is restarted back from where it was left.

WDYT ?

@dojci
Copy link
Contributor Author

dojci commented Jun 1, 2020

LGTM.

I've left some minor comments, if you take a look and fix them that would be great.

Done

I'm wondering if we should also make those available in the helm chart located at /production/helm/fluent-bit/. I also feel like we might need to mount a host path if this feature is used so that the queue is restarted back from where it was left.

WDYT ?

I don't know if it makes sense to allow buffering within default setup. In the production helm chart, fluent-bit use tail as input plugin:

[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /run/fluent-bit/flb_kube.db
Mem_Buf_Limit 5MB

DB flag is used to keep track of monitored files and offsets. So when loki is down or another unexpected condition occurs there is no need to use buffering at all (maybe some edge cases ?). Buffering is primarily intended for a plugin such as stdin where blocking is usually undesirable. Anyway, I can also adjust the helm chart configuration to be able to use buffering.

IMHO host path is already set for the FLB:

- name: run
hostPath:
path: /run/fluent-bit

@dojci dojci requested a review from cyriltovena June 1, 2020 15:22
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really thank you for this !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants