-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiline filter crashes after buffer limit is reached causing high cpu and memory usage #4940
Comments
Same issue, i generate test log with about 10MB/S, fluent-bit will log bounch of error log. Is there any way to work around? @edsiper |
Hi, i may found this issue is intruduced by PR:#4383. |
We are experiencing exactly the same issue, with the main problem being that we actually need the functionality of the previously mentioned PR for the multiline logs. @PettitWesley could you have a look if there are any issues in your implementation or if there are any workarounds? |
I have added this on my personal backlog, which unfortunately right now is very long. My recommendation is to take a look at past issues involving the rewrite_tag filter, with the new code, the multiline filter with buffer is very similar to rewrite_tag. This same issue should have arisen with that filter as well. |
So when I send multiline logs at a very high throughput, all I get is this:
Which can be fixed by increasing the mem_buf_limit for the emitter:
|
Please see: #5235 If you have more than one multiline filter definition and they match the same records, it can cause all sorts of trouble. |
So I want to direct everyone to this issue again: #5235 It is not clear to me that there is a widespread use case that truly requires multiple filter definitions for the same logs. Please post in that issue if you do not understand or you think you have a use case that requires multiple filter definitions that match the same logs. |
it is not clear to me what do you mean multiple filter definitions. Do you mean more of these here: or multiple config like the above in different section:
[FILTER]
` I am using the above config because if i try to add the line to the INPUT section like this [INPUT]
` So this is why i am doing it in 2 stages:
|
@staniondaniel Multiple fitler means like I show in the issue like this:
Instead, the CORRECT way is what you show, to have multiple parsers in a comma list.
that's interesting. In this case what you are doing is fine, you only have one filter definition. It's fine to use both the filter and tail input |
I found there is a new
|
@erenming With buffer off, the filter will only work with a few inputs, mainly tail. As noted in the docs. |
@PettitWesley Really thanks for your remind. Actually, we only use tail, so i think with buffer off will be okay.
and sample log:
Is there any mistake in my configuration? |
@PettitWesley as an update to the original problem. After updating to the latest Fluentbit version I noticed that using only the " multiline.parser docker,cri" in the INPUT section without any [FILTER] config like bellow:
the pods don't crash anymore due to high MEMORY usage As soon as I add the [FILTER] config as bellow:
The problem reoccurs and the memory slowly rises until it hits the buffer limit which causes the logs to be flooded with : From what I noticed this is simply an issue of the multiline filter not being able to process the data fast enough or just not releasing the buffer once the data is processed which causes high cpu usage and leads to a crash eventually |
@PettitWesley We have been facing the same issue with aws-for-fluent-bit:2.25.0. Our custom fluentbit configuration is below. [SERVICE] Firelens configuration logConfiguration = { Logs do not show any error. It just fails with segmentation fault. `---------------------------------------------------------------------------------------------------------------------------------------------------------
|
Sorry folks, I may not be able to take a look at this immediately. If someone is willing to try these techniques to get a stack trace for me that would help speed up the fix: https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#segfaults-and-crashes-sigsegv |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the |
Not stale. |
unstale |
Having same issue here. Container use all request causing high CPU load. service: |
[SERVICE]
Daemon Off
Flush 1
Log_Level info
Parsers_File parsers.conf
Parsers_File custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
inputs: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
DB /var/log/flb_kube.db
Mem_Buf_Limit 128MB
Skip_Long_Lines On
Refresh_Interval 10
Buffer_Chunk_Size 256k
Buffer_Max_Size 100MB
multiline.parser docker, cri
filters: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log Off
Merge_Log_Key log_processado
Merge_Log_Trim On
Keep_Log Off
K8S-Logging.Parser Off
K8S-Logging.Exclude Off
Labels On
Annotations Off
[FILTER]
Name multiline
Match kube.*
multiline.key_content log
multiline.parser go, java, python
buffer On
emitter_mem_buf_limit 64M
fluentbit 1.9.9 |
I'm facing the same issue when working with the multiline parser with the rewrite-tag filter. I'm using the tail input:
And using the re-write tag like this:
I can't really use the exclude filter in order to filter out these logs as I need them but I want them to go to other output than the other logs. This is my complete fluentbit configuration:
|
I know it's ugly, but since i really need this working, i did setup a cronjob to rollout restart the fluent-bit pods every day at 03:00am. |
@PettitWesley Any idea if similar solution is supported for AWS ECS fargate as suggested by @jujubetsz ? |
I'm currently making some unrelated improvements that involve modifying the multi line core machinery so if anyone is able to put together a somewhat reliable reproduction (it would be awesome if it was minimized but understandable if that's not possible) with sample input data I could look into this. If you need help putting together the reproduction case Just tag me or here or talk to me in slack. |
@rajeev-netomi I don't recommend restarting FLB at a certain time if its crashing... I'm not sure of a good solution here but I think you want to restart after it crashes. Or see if you can find an older version which doesn't crash. Sorry! |
Same issue: upgraded to 2.1.2 from 2.0.9 immediately high CPU, pod restarts and "error registering chunk with tag" messages. Issue is gone once downgrading back to 2.0.9. I have just one multiline [FILTER] with custom [MULTILINE_PARSER] in the config. |
Hi terence. |
Tried fluentbit v2.1.9 - don't see the "error registering chunk" errors anymore. But I still have the high CPU problem. |
Is there any update on the status of this issue . I would say this is pretty critical. |
Hello everyone. I still get an error on version 2.1.9. I request assistance in resolving this issue. I've tried various configurations, and the problem occurred after upgrading from 1.8.6 to 2.1.x. |
I must say this is strange, that no one seems to be looking into this. |
I installed version 2.1.10 but I still have the problem of high CPU consumption. I downgraded to 1.9.10 |
Hi Luciano. From what I test, the problem started after 2.0.9 so if you like you can try that version. |
Hi Ryan, I confirm that 2.0.9 version doesn't have the CPU issue. |
Same here, 2.0.9 works , tried 2.1.x without success, didn't tried 2.2.x yet |
2.2.2 is still not working |
I'm absolutely shocked that no one is handling this issue. Quite worrying, We simply cant upgrade. |
Any workaround or update on above issue. |
I took a look at this issue today. The root problem: the internal This problem exists for both multiline and rewrite_tag, as their usage of emitter (just adding records directly to it) is the same. I opened a PR shown above that cuts out the middleman and checks for the emitter to be paused before even trying to write to it. This will at least stop the spamming error log. Technically this issue is about high CPU, and this PR does not solve it. The multiline processor is very CPU intensive in general, so when it's run as a filter on the main thread it's no wonder it takes a whole CPU. I'm not sure what the solution to that is. As a user, it is basically necessary with the design of this plugin to take steps to make the plugin resilient to higher throughputs. This is actually relatively the same as what you'd need to do to any input plugins, it's just a bit odd because you wouldn't think of needing to use those kinds of strategies on a filter plugin. However, to use this plugin effectively, it is essentially necessary to understand that under-the-hood detail of the Here are the solutions I would recommend, which are not dissimilar to any input plugin:
|
@braydonk : I switched emitter to filesystem buffering but running into another issues where Kubernetes Fluent Bit not recovering after Fluentd restart ,chunks were stuck in storage.
I repeated the above same steps with one more addition of restarting the fluentbit service and saw chunks were processed in less than 10 minutes. |
@saurabhgcect This sounds unrelated to this issue. I'd recommend opening a separate issue with config and logs included. |
#8473 is the solution that should be used for this. I've closed my PR in favour of it. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the |
This issue was closed because it has been stalled for 5 days with no activity. |
I believe this should be fixed as of v3.0.3 which has #8473. I'll re-open and close this properly as the issue should be fixed, but if it is not then feel free to re-open with details. |
Bug Report
Describe the bug
Hello
Multiline filter is crashing on pods that generate a large amount of logs after reaching Emitter_Mem_Buf_Limit . On pods with a normal/low number of logs it works without problems
To Reproduce
This is my configuration (i left only the relevant parts):
Your Environment
Additional context
Fluentbit container keeps crashing after it gets to the memory limit configured for that container. Also a lot of logs like
[error] [input:emitter:emitter_for_multiline.0] error registering chunk with tag:
are flooding fluentbit logs
The text was updated successfully, but these errors were encountered: