Skip to content
This repository has been archived by the owner on Apr 24, 2023. It is now read-only.

Tweak memory limits for out_kafka example #18

Merged
merged 4 commits into from
Jan 30, 2018

Conversation

solsson
Copy link
Contributor

@solsson solsson commented Jan 22, 2018

WIP based on discussion in #16

@StevenACoffman
Copy link

StevenACoffman commented Jan 24, 2018

I figured out one problem I was having with the tight memory and cpu limits:

From @leahnp on May 17, 2017 0:6

Add memory limit to deployment yaml. Test special case: in long-running clusters with lots of pre-existing logs, deploy Fluent-bit, initial workload is very heavy then it evens out. If it hits the memory limit in this initial processing it will continually be killed and re-created.

Copied from original issue: samsung-cnct/kraken-logging-fluent-bit-daemonset#5

Moved to samsung-cnct/chart-fluent-bit#9

in particular at pod start on nodes with unprocessed logs
@solsson
Copy link
Contributor Author

solsson commented Jan 26, 2018

@edsiper I think @StevenACoffman's observation above is interesting input to the lack of "check" in out_kafka. Spikes in memory use at pod start are impractical. Can log processing be halted when kafka buffers hit a size limit? Would it be possible to add output buffer size to prometheus metrics?

In addition I've found a tentative explanation for log messages Receive failed: Disconnected. See Yolean/kubernetes-kafka#132 (comment). Can be ignored, as not an issue with fluent-bit. But, like with request.required.acks, I would like to be able to set librdkafka properties.

@solsson
Copy link
Contributor Author

solsson commented Mar 1, 2018

I had a case now with 0.9 and current memory limit (60Mi) where I one pod went crashlooping. Raising to 100Mi didn't help but to 200Mi did. The crashes happened too fast after start for me to get any meaningful metrics out of it. Now that everything is up and running again Prometheus has no memory use value above 50Mi.

At info log level there was nothing out of the ordinary in pod logs. At debug level the last lines before container exit were:

[2018/03/01 20:00:20] [debug] [out_kafka] enqueued message (1171 bytes) for topic 'ops.kube-logs-fluentbit.stream.json.001'
[2018/03/01 20:00:20] [debug] [out_kafka] message delivered (1133 bytes, partition 0)
[2018/03/01 20:00:20] [debug] [in_tail] file=/var/log/containers/integrations-59c6f5bd46-9n8pb_essity_integrations-32fdb2298f4d670dd1e1b5d0b2f0ae7745bd2a25456c7e07ab5e43be82424522.log event
[2018/03/01 20:00:20] [debug] [input tail.0] [mem buf] size = 2137648
[2018/03/01 20:00:20] [debug] [in_tail] file=/var/log/containers/logs-fluentbit-5d55d88694-d7fwf_test-kafka_testcase-d079b114d53aad4f0f894437c1b53a494afa752fc733c22d19838287d8b13c2d.log read=32693 lines=20

Unfortunately I think the log file got truncated before I managed to pull it out from the node, because I see no entries in it from the time of the crash.

I've restored the 60 Mi limit. Let's see if this happens again. It's the only unexpected pod restart I've had since this PR was merged.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants