Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the remote connection is available in the in_forward plugin #2352



Copy link

@akihiro17 akihiro17 commented Mar 28, 2019

Which issue(s) this PR fixes:

What this PR does / why we need it:

I'd like to detect half open connections and prevent them from consuming too many file descriptors because this can lead to a "No file descriptors available" error.

example stacktrace

"Unexpected error raised. Stopping the timer. title=:child_process_execute error_class=Errno::EMFILE error=\"No file descriptors available - ruby\"",
"/usr/lib/ruby/2.4.0/open3.rb:199:in `spawn'",
"/usr/lib/ruby/2.4.0/open3.rb:199:in `popen_run'",
"/usr/lib/ruby/2.4.0/open3.rb:95:in `popen3'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/fluentd-1.2.2/lib/fluent/plugin_helper/child_process.rb:265:in `child_process_execute_once'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/fluentd-1.2.2/lib/fluent/plugin_helper/child_process.rb:96:in `block in child_process_execute'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/fluentd-1.2.2/lib/fluent/plugin_helper/child_process.rb:114:in `block in child_process_execute'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/fluentd-1.2.2/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/ `run_once'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/ `run'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/fluentd-1.2.2/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/fluentd-1.2.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'",

In my case, half open connections happen in the following way

forwarder => AWS Networking Load Balancer(NLB) => aggregator
  1. The forwarder sends a log data to the aggregator through NLB
  2. The forwarder sends a log data to the aggregator after 350 seconds(NLB idle timeout)
  3. The forwarder receives a RST packet from the NLB due to an idle timeout
    • NLB does not send a RST packet to the other side(aggregator)
    • Accoring to the document, This seems to be the expected behavior
    • Connection Idle Timeout
  4. The forwarder reconnects to the aggregator and sends a log data again
  5. Now, the aggregator has two active connections established in step 1 and 4

To fix the issue described above, this commit introduces send_keepalive_packet parameter and checks that the remote connections are still available by sending a keepalive packet.

In this commit, OS level parameters are used to configure how tcp keepalive works.
For example, they are tcp_keepalive_time, tcp_keepalive_probes and tcp_keepalive_intvl on Linux.

$ sysctl -A | grep "net.ipv4.tcp_keepalive"
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_time = 7200


Docs Changes:

need to update

Release Note:

Signed-off-by: akihiro17 <>
@akihiro17 akihiro17 force-pushed the check-remote-connection-is-available branch from 4a838d6 to a79c6d2 Mar 28, 2019
@repeatedly repeatedly self-assigned this Mar 28, 2019
@repeatedly repeatedly merged commit 6172d5c into fluent:master Mar 28, 2019
1 of 3 checks passed
Copy link

@repeatedly repeatedly commented Mar 28, 2019



Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants