Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentd have big Recv-Queue #3911

Closed
zvlb opened this issue Oct 6, 2022 · 1 comment
Closed

Fluentd have big Recv-Queue #3911

zvlb opened this issue Oct 6, 2022 · 1 comment

Comments

@zvlb
Copy link

zvlb commented Oct 6, 2022

Describe the bug

I have K8s cluster, where I deploy:

  • FluentBit DaemonSet.
  • Fluentd StatefulSet
    (I'm using Logging-operator to deploy it)

FluentBit send all logs to Fluentd. Fluentd process logs and sends all to elastic.
In my installation, I have 50 pods of FLuentd.
In fluentBit logs periodically I see:

[2022/10/01 06:02:46] [error] [upstream] connection #1158 to fluentd:24240 timed out after 10 seconds
[2022/10/01 06:02:46] [error] [output:forward:forward.0] no upstream connections available
[2022/10/01 06:02:46] [ warn] [engine] failed to flush chunk '1-1664603612.450502668.flb', retry in 8 seconds: task_id=658, input=tail.0 > output=forward.0 (out_id=0)

When I check FluentD I see a big Recv-Q:

$ netstat -ntpl
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:24444         0.0.0.0:*               LISTEN      7/ruby
tcp        0      0 0.0.0.0:24231           0.0.0.0:*               LISTEN      166916/ruby
tcp     1022      0 0.0.0.0:24240           0.0.0.0:*               LISTEN      166916/ruby
tcp        0      0 :::9533                 :::*                    LISTEN      -

and sometimes FluetnD stops listening port 24240
How can I fix It?

To Reproduce

Install Logging-Operator with many Flows (more, them 3000)

Expected behavior

All work without Recv-Queue and errors in Fluentbit

Your Environment

- Fluentd version: 1.14.6
- Operating system: Alpine Linux v3.14
- Kernel version: 5.4.0-105-generic

Your Configuration

I have very big fluentd.conf (more, then 10MB)

Your Error Log

[2022/10/01 06:02:46] [error] [upstream] connection #1158 to fluentd:24240 timed out after 10 seconds
[2022/10/01 06:02:46] [error] [output:forward:forward.0] no upstream connections available
[2022/10/01 06:02:46] [ warn] [engine] failed to flush chunk '1-1664603612.450502668.flb', retry in 8 seconds: task_id=658, input=tail.0 > output=forward.0 (out_id=0)


### Additional context

_No response_
@fujimotos
Copy link
Member

When I check FluentD I see a big Recv-Q:

This is very probably a deployment issue. Fluentd is just being too overtaxed.

You need to distribute the load by adding more instances, or control the
incoming flow so thaf Fluentd can catch up with the data voulme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants