Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the remote connection is available in the in_forward plugin #2352

Merged

Conversation

@akihiro17
Copy link
Contributor

commented Mar 28, 2019

Which issue(s) this PR fixes:

What this PR does / why we need it:

I'd like to detect half open connections and prevent them from consuming too many file descriptors because this can lead to a "No file descriptors available" error.

example stacktrace

"Unexpected error raised. Stopping the timer. title=:child_process_execute error_class=Errno::EMFILE error=\"No file descriptors available - ruby\"",
"/usr/lib/ruby/2.4.0/open3.rb:199:in `spawn'",
"/usr/lib/ruby/2.4.0/open3.rb:199:in `popen_run'",
"/usr/lib/ruby/2.4.0/open3.rb:95:in `popen3'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/fluentd-1.2.2/lib/fluent/plugin_helper/child_process.rb:265:in `child_process_execute_once'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/fluentd-1.2.2/lib/fluent/plugin_helper/child_process.rb:96:in `block in child_process_execute'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/fluentd-1.2.2/lib/fluent/plugin_helper/child_process.rb:114:in `block in child_process_execute'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/fluentd-1.2.2/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/cool.io-1.5.3/lib/cool.io/loop.rb:88:in `run_once'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/cool.io-1.5.3/lib/cool.io/loop.rb:88:in `run'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/fluentd-1.2.2/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'",
"/fluentd/etc/vendor/bundle/ruby/2.4.0/gems/fluentd-1.2.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'",

In my case, half open connections happen in the following way

forwarder => AWS Networking Load Balancer(NLB) => aggregator
  1. The forwarder sends a log data to the aggregator through NLB
  2. The forwarder sends a log data to the aggregator after 350 seconds(NLB idle timeout)
  3. The forwarder receives a RST packet from the NLB due to an idle timeout
    • NLB does not send a RST packet to the other side(aggregator)
    • Accoring to the document, This seems to be the expected behavior
    • Connection Idle Timeout
  4. The forwarder reconnects to the aggregator and sends a log data again
  5. Now, the aggregator has two active connections established in step 1 and 4

To fix the issue described above, this commit introduces send_keepalive_packet parameter and checks that the remote connections are still available by sending a keepalive packet.

In this commit, OS level parameters are used to configure how tcp keepalive works.
For example, they are tcp_keepalive_time, tcp_keepalive_probes and tcp_keepalive_intvl on Linux.

$ sysctl -A | grep "net.ipv4.tcp_keepalive"
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_time = 7200

ref

Docs Changes:

need to update

https://github.com/fluent/fluentd-docs/blob/master/docs/v1.0/in_forward.txt

Release Note:

Check the remote connection is available in the in_forward plugin
Signed-off-by: akihiro17 <coolwizard11@gmail.com>

@akihiro17 akihiro17 force-pushed the akihiro17:check-remote-connection-is-available branch from 4a838d6 to a79c6d2 Mar 28, 2019

@repeatedly repeatedly self-assigned this Mar 28, 2019

@repeatedly repeatedly merged commit 6172d5c into fluent:master Mar 28, 2019

1 of 3 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
DCO DCO
Details
@repeatedly

This comment has been minimized.

Copy link
Member

commented Mar 28, 2019

Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.