Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentd fails to start if gelf tcp output cannot connect #28

Open
ivan-mjch opened this issue Jan 2, 2017 · 7 comments
Open

Fluentd fails to start if gelf tcp output cannot connect #28

ivan-mjch opened this issue Jan 2, 2017 · 7 comments

Comments

@ivan-mjch
Copy link

When fluentd is configured with a tcp gelf output, that cannot connect,
fluentd errors out, failing to start.

Centos 7 td-agent init scripts keep restarting fluentd, but it is not the expected behavior, as it halts all fluentd log processing and possibly results in message loss even in unrelated pipelines.

It may be related to #7, as fluentd core seems to be interacting improperly with the gelf library output connection. The problem may be in a difference between what fluentd expects and how gelf library behaves, that the plugin doesn't account for, but I don't know much about them.

2017-01-02 23:09:24 +0100 [error]: unexpected error error_class=Errno::ECONNREFUSED error="Connection refused - connect(2) for \"192.168.36.4\" port 12201"
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `initialize'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `new'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `connect'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `block (2 levels) in addresses='
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `map'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `block in addresses='
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:13:in `tap'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:13:in `addresses='
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:9:in `initialize'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:138:in `new'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:138:in `create_sender'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:36:in `initialize'
  2017-01-02 23:09:24 +0100 [error]: /etc/td-agent/plugin/out_gelf.rb:42:in `new'
  2017-01-02 23:09:24 +0100 [error]: /etc/td-agent/plugin/out_gelf.rb:42:in `start'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/compat/call_super_mixin.rb:42:in `start'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:138:in `block in start'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:117:in `block (2 levels) in lifecycle'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:117:in `block (2 levels) in lifecycle'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:116:in `each'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:116:in `block in lifecycle'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:109:in `each'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:109:in `lifecycle'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:116:in `block in lifecycle'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:113:in `each'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:113:in `lifecycle'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:137:in `start'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/engine.rb:211:in `start'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/engine.rb:175:in `run'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:697:in `run_engine'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:454:in `block in run_worker'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:626:in `call'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:626:in `main_process'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:449:in `run_worker'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/command/fluentd.rb:288:in `<top (required)>'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/bin/fluentd:5:in `<top (required)>'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/bin/fluentd:23:in `load'
  2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/bin/fluentd:23:in `<top (required)>'
  2017-01-02 23:09:24 +0100 [error]: /usr/sbin/td-agent:7:in `load'
  2017-01-02 23:09:24 +0100 [error]: /usr/sbin/td-agent:7:in `<main>'
2017-01-02 23:09:24 +0100 [error]: unexpected error error="Connection refused - connect(2) for \"192.168.36.4\" port 12201"
  2017-01-02 23:09:24 +0100 [error]: suppressed same stacktrace
@pguridi
Copy link

pguridi commented Feb 20, 2018

Hi @ivan-mjch , I also see this issue. I was trying shutting down Graylog to see if the retry works, and kind of worked:

2018-02-20 00:40:59 +0000 [warn]: #0 suppressed same stacktrace
2018-02-20 00:41:01 +0000 [warn]: #0 failed to flush the buffer. retry_time=2 next_retry_seconds=2018-02-20 00:41:01 +0000 chunk="5659a1017df2404f87465b9399491e6f" error_class=Errno::ECONNREFUSED error="Connection refused - connect(2) for \"graylog-staging.xxx.com\" port 12202"

But then I tried restarting Fluentd for a different reason, and started to get the errors you posted. Seems that as long as Fluentd is not restarted, Graylog connection errors will not break it.

@KevDBG
Copy link

KevDBG commented Jun 21, 2018

Hello !

The same problem here, we tried to secure our log. I try to explain, we use graylog to handle our logs so we use the gelf plugin.

We would like to send our logs to graylog in normal situation, and if graylog falls keep the logs in a buffer file during this downtime.

Currently our configuration seems to works:

<match **>
  @type copy
  <store>
    @type gelf
    host "IP_GRAYLOG"
    port XXXXX
    protocol tcp
    #flush_interval 5s
    flush_at_shutdown false
    <buffer>
      @type file
      path /var/log/td-agent/buffer/graylog_host
      flush_mode immediate
      retry_wait 1                       # The wait interval for the first retry.
      retry_exponential_backoff_base 1.1 # Inclease the wait time by a factor of N.
      retry_type exponential_backoff     # Set 'periodic' for constant intervals.
      retry_max_interval 5m              # Cap the wait interval. (see above)
      retry_randomize true               # Apply randomization. (see above)
      retry_timeout 72h                  # Maximum duration before giving up.
      # retry_max_times 17               # Maximum retry count before giving up.
      retry_forever false                # Set 'true' for infinite retry loops.
      retry_secondary_threshold 0.8      # See the "Secondary Output" section in
      total_limit_size 4GB               # The size limitation of this buffer plugin instance
#      compress gzip                      # The option to specify compression of each chunks, during events are buffered
    </buffer>
  </store>

</match>

But we have the same issue reported here. We stop td-agent/fluentd service, we stop graylog and we try to start again the td-agent/fluentd plugin.

When we do that, we have the same behavior (loop) :

2018-06-21 17:46:37 +0200 [error]: #0 unexpected error error_class=Errno::ECONNREFUSED error="Connection refused - connect(2) for \"IP_GRAYLOG\" port XXXXX"
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `initialize'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `new'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `connect'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `block (2 levels) in addresses='
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `map'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `block in addresses='
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:13:in `tap'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:13:in `addresses='
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:9:in `initialize'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:139:in `new'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:139:in `create_sender'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:37:in `initialize'
  2018-06-21 17:46:37 +0200 [error]: #0 /etc/td-agent/plugin/out_gelf.rb:35:in `new'
  2018-06-21 17:46:37 +0200 [error]: #0 /etc/td-agent/plugin/out_gelf.rb:35:in `start'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/compat/call_super_mixin.rb:42:in `start'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:165:in `block in start'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:154:in `block (2 levels) in lifecycle'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:153:in `each'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:153:in `block in lifecycle'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:140:in `each'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:140:in `lifecycle'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:164:in `start'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/engine.rb:274:in `start'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/engine.rb:219:in `run'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:774:in `run_engine'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:523:in `block in run_worker'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:699:in `main_process'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:518:in `run_worker'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/command/fluentd.rb:316:in `<top (required)>'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/bin/fluentd:8:in `<top (required)>'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/bin/fluentd:22:in `load'
  2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/bin/fluentd:22:in `<main>'

Any update on this issue ?

Many thanks,

@KevDBG
Copy link

KevDBG commented Jul 9, 2018

Hello,

This project is dead ?

Regards,

@emsearcy
Copy link
Owner

emsearcy commented Jul 9, 2018

I haven't had much time to work on this ... you may want to look at one of the forks for your immediate needs. I am planning to rename my "parent" repo into an organization so we can get more maintainers involved directly.

@hleroy
Copy link

hleroy commented Jan 4, 2019

Hi @emsearcy, did you have a chance to look at this issue? The fork fluent-plugin-gelf-hs has the same issue.

@shinebayar-g
Copy link

Sadly that fork disabled issues tab.. and this project isn't actively maintained :(

@emsearcy
Copy link
Owner

emsearcy commented Apr 4, 2019

yeah, I wish I had more time! I'm also not using Graylog any more ... I still love Graylog and Fluentd—both awesome tools—but it's hard to put in the time on a project I'm not running and have no live environments to test with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants