Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

undefined method 'key'? #1315

Closed
dtboctor opened this issue Nov 15, 2016 · 45 comments
Closed

undefined method 'key'? #1315

dtboctor opened this issue Nov 15, 2016 · 45 comments
Labels

Comments

@dtboctor
Copy link

I'm having some issues with fluent 0.14.6, it was running okay for a while but a few days ago started giving me the following error on startup

2016-11-15_22:32:30.03238 2016-11-15 14:32:30 -0800 [debug]: plugin/output.rb:787:try_flush: trying flush for a chunk chunk="54135448d0359bdcbd2bad60a78b2525"
2016-11-15_22:32:30.03263 2016-11-15 14:32:30 -0800 [warn]: plugin/out_kafka_buffered.rb:239:rescue in write: Send exception occurred: undefined method `key' for #<Fluent::Plugin::Buffer::FileChunk:0x007fb1fafc4988>

This is on a centos OS. I've found references for that issue in older versions, not sure why I am getting it. Is there anything I can send you guys that might explain what is happening?

@dtboctor
Copy link
Author

Here is the backtrace

2016-11-15_23:57:37.99929 2016-11-15 15:57:37 -0800 [warn]: plugin/out_kafka_buffered.rb:266:rescue in write: Exception Backtrace : /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluent-plugin-kafka-0.4.0/lib/fluent/plugin/out_kafka_buffered.rb:207:in `write'
2016-11-15_23:57:37.99929 /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/compat/output.rb:129:in `write'
2016-11-15_23:57:37.99929 /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin/output.rb:995:in `try_flush'
2016-11-15_23:57:37.99930 /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin/output.rb:1188:in `flush_thread_run'
2016-11-15_23:57:37.99930 /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin/output.rb:393:in `block (2 levels) in start'
2016-11-15_23:57:37.99930 /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin_helper/thread.rb:66:in `block in thread_create'

@repeatedly
Copy link
Member

You said v0.14.6 but your backtrace says you use v0.14.8.
It means you hit this error on both v0.14.6 and v014.8, right?

@dtboctor
Copy link
Author

Yeah, I recently updated to try and fix the error, but no luck.

@repeatedly
Copy link
Member

Could you paste entire log?

@dtboctor
Copy link
Author

016-11-15_23:57:33.07173 2016-11-15 15:57:33 -0800 [info]: fluent/supervisor.rb:504:supervise: starting fluentd-0.14.8
2016-11-15_23:57:33.07182 2016-11-15 15:57:33 -0800 [info]: fluent/supervisor.rb:525:supervise: spawn command to main: /opt/rbenv/versions/2.1.9/bin/ruby -Eascii-8bit:ascii-8bit /opt/rbenv/versions/2.1.9/bin/fluentd -c /etc/fluentd.conf -v --under-supervisor
2016-11-15_23:57:33.34451 2016-11-15 15:57:33 -0800 [info]: fluent/supervisor.rb:656:read_config: reading config file path="/etc/fluentd.conf"
2016-11-15_23:57:33.34647 2016-11-15 15:57:33 -0800 [info]: fluent/supervisor.rb:447:run_worker: starting fluentd-0.14.8 without supervision
2016-11-15_23:57:33.34702 2016-11-15 15:57:33 -0800 [info]: fluent/engine.rb:116:block in configure: gem 'fluent-plugin-kafka' version '0.4.0'
2016-11-15_23:57:33.34702 2016-11-15 15:57:33 -0800 [info]: fluent/engine.rb:116:block in configure: gem 'fluent-plugin-kafka' version '0.3.1'
2016-11-15_23:57:33.34703 2016-11-15 15:57:33 -0800 [info]: fluent/engine.rb:116:block in configure: gem 'fluent-plugin-kafka' version '0.1.3'
2016-11-15_23:57:33.34704 2016-11-15 15:57:33 -0800 [info]: fluent/engine.rb:116:block in configure: gem 'fluentd' version '0.14.8'
2016-11-15_23:57:33.34704 2016-11-15 15:57:33 -0800 [info]: fluent/engine.rb:116:block in configure: gem 'fluentd' version '0.14.6'
2016-11-15_23:57:33.34704 2016-11-15 15:57:33 -0800 [info]: fluent/engine.rb:116:block in configure: gem 'fluentd' version '0.12.25'
2016-11-15_23:57:33.34705 2016-11-15 15:57:33 -0800 [warn]: config/element.rb:115:[]: 'type' is deprecated parameter name. use '@type' instead.
2016-11-15_23:57:33.34707 2016-11-15 15:57:33 -0800 [info]: fluent/agent.rb:123:add_match: adding match pattern="events" type="kafka_buffered"
2016-11-15_23:57:33.40041 2016-11-15 15:57:33 -0800 [info]: plugin/out_kafka_buffered.rb:119:configure: brokers has been set directly: ["analytics-kafka01:9092", "analytics-kafka02:9092", "analytics-kafka03:9092"]
2016-11-15_23:57:33.40056 2016-11-15 15:57:33 -0800 [warn]: config/element.rb:115:[]: 'type' is deprecated parameter name. use '@type' instead.
2016-11-15_23:57:33.40062 2016-11-15 15:57:33 -0800 [info]: fluent/agent.rb:123:add_match: adding match pattern="events.filtered" type="kafka_buffered"
2016-11-15_23:57:33.40203 2016-11-15 15:57:33 -0800 [info]: plugin/out_kafka_buffered.rb:119:configure: brokers has been set directly: ["analytics-kafka01:9092", "analytics-kafka02:9092", "analytics-kafka03:9092"]
2016-11-15_23:57:33.40216 2016-11-15 15:57:33 -0800 [warn]: config/element.rb:115:[]: 'type' is deprecated parameter name. use '@type' instead.
2016-11-15_23:57:33.40220 2016-11-15 15:57:33 -0800 [info]: fluent/root_agent.rb:230:add_source: adding source type="forward"
2016-11-15_23:57:33.40716 2016-11-15 15:57:33 -0800 [warn]: config/element.rb:115:[]: 'type' is deprecated parameter name. use '@type' instead.
2016-11-15_23:57:33.40720 2016-11-15 15:57:33 -0800 [info]: fluent/root_agent.rb:230:add_source: adding source type="http"
2016-11-15_23:57:33.41050 2016-11-15 15:57:33 -0800 [warn]: config/element.rb:115:[]: 'type' is deprecated parameter name. use '@type' instead.
2016-11-15_23:57:33.41056 2016-11-15 15:57:33 -0800 [info]: fluent/root_agent.rb:230:add_source: adding source type="monitor_agent"
2016-11-15_23:57:33.41766 2016-11-15 15:57:33 -0800 [warn]: config/element.rb:115:[]: 'type' is deprecated parameter name. use '@type' instead.
2016-11-15_23:57:33.41776 2016-11-15 15:57:33 -0800 [info]: fluent/root_agent.rb:230:add_source: adding source type="debug_agent"
2016-11-15_23:57:33.43489 2016-11-15 15:57:33 -0800 [info]: fluent/engine.rb:123:configure: using configuration file: <ROOT>
2016-11-15_23:57:33.43491   <source>
2016-11-15_23:57:33.43491     type forward
2016-11-15_23:57:33.43491     port 24224
2016-11-15_23:57:33.43491   </source>
2016-11-15_23:57:33.43491   <source>
2016-11-15_23:57:33.43492     type http
2016-11-15_23:57:33.43492     port 8888
2016-11-15_23:57:33.43492   </source>
2016-11-15_23:57:33.43492   <source>
2016-11-15_23:57:33.43493     type monitor_agent
2016-11-15_23:57:33.43493     port 24220
2016-11-15_23:57:33.43493   </source>
2016-11-15_23:57:33.43493   <source>
2016-11-15_23:57:33.43493     type debug_agent
2016-11-15_23:57:33.43494     bind "127.0.0.1"
2016-11-15_23:57:33.43494     port 24230
2016-11-15_23:57:33.43494   </source>
2016-11-15_23:57:33.43494   <match events>
2016-11-15_23:57:33.43494     type kafka_buffered
2016-11-15_23:57:33.43495     brokers "analytics-kafka01:9092,analytics-kafka02:9092,analytics-kafka03:9092"
2016-11-15_23:57:33.43495     default_topic "events"
2016-11-15_23:57:33.43495     buffer_type "file"
2016-11-15_23:57:33.43495     buffer_path /var/log/evente/fluentd/buffer/
2016-11-15_23:57:33.43496     required_acks 1
2016-11-15_23:57:33.43496     flush_interval 1
2016-11-15_23:57:33.43496     buffer_chunk_limit 64m
2016-11-15_23:57:33.43496     buffer_queue_limit 2048
2016-11-15_23:57:33.43496     <buffer>
2016-11-15_23:57:33.43497       flush_mode interval
2016-11-15_23:57:33.43497       retry_type exponential_backoff
2016-11-15_23:57:33.43497       @type file
2016-11-15_23:57:33.43499       path /var/log/evente/fluentd/buffer/
2016-11-15_23:57:33.43499       flush_interval 1
2016-11-15_23:57:33.43499       chunk_limit_size 64m
2016-11-15_23:57:33.43499       queue_length_limit 2048
2016-11-15_23:57:33.43499     </buffer>
2016-11-15_23:57:33.43500   </match>
2016-11-15_23:57:33.43500   <match events.filtered>
2016-11-15_23:57:33.43500     type kafka_buffered
2016-11-15_23:57:33.43500     brokers "analytics-kafka01:9092,analytics-kafka02:9092,analytics-kafka03:9092"
2016-11-15_23:57:33.43501     default_topic "events_filtered"
2016-11-15_23:57:33.43501     buffer_type "file"
2016-11-15_23:57:33.43502     buffer_path /var/log/evente/fluentd/buffer_filtered/
2016-11-15_23:57:33.43502     required_acks 1
2016-11-15_23:57:33.43502     <buffer>
2016-11-15_23:57:33.43502       flush_mode interval
2016-11-15_23:57:33.43503       retry_type exponential_backoff
2016-11-15_23:57:33.43503       @type file
2016-11-15_23:57:33.43503       path /var/log/evente/fluentd/buffer_filtered/
2016-11-15_23:57:33.43503     </buffer>
2016-11-15_23:57:33.43504   </match>
2016-11-15_23:57:33.43504 </ROOT>
2016-11-15_23:57:33.43597 2016-11-15 15:57:33 -0800 [debug]: plugin/buffer.rb:114:start: buffer started instance=69980472945500 stage_size=0 queue_size=0
2016-11-15_23:57:33.43815 2016-11-15 15:57:33 -0800 [info]: plugin/out_kafka_buffered.rb:99:refresh_client: initialized kafka producer: kafka
2016-11-15_23:57:37.57343 2016-11-15 15:57:37 -0800 [debug]: plugin/buffer.rb:114:start: buffer started instance=69980473147160 stage_size=63364 queue_size=3880621267
2016-11-15_23:57:37.64995 2016-11-15 15:57:37 -0800 [info]: plugin/out_kafka_buffered.rb:99:refresh_client: initialized kafka producer: kafka
2016-11-15_23:57:37.65034 2016-11-15 15:57:37 -0800 [info]: plugin/in_debug_agent.rb:53:start: listening dRuby uri="druby://127.0.0.1:24230" object="Fluent::Engine"
2016-11-15_23:57:37.65759 2016-11-15 15:57:37 -0800 [debug]: plugin/in_monitor_agent.rb:213:start: listening monitoring http server on http://0.0.0.0:24220/api/plugins
2016-11-15_23:57:37.65814 2016-11-15 15:57:37 -0800 [debug]: plugin/in_http.rb:103:start: listening http on 0.0.0.0:8888
2016-11-15_23:57:37.66664 2016-11-15 15:57:37 -0800 [info]: plugin/in_forward.rb:172:listen: listening fluent socket on 0.0.0.0:24224
2016-11-15_23:57:37.99851 2016-11-15 15:57:37 -0800 [debug]: plugin/output.rb:1179:flush_thread_run: flush_thread actually running
2016-11-15_23:57:37.99868 2016-11-15 15:57:37 -0800 [debug]: plugin/buffer.rb:368:dequeue_chunk: dequeueing a chunk instance=69980473147160
2016-11-15_23:57:37.99888 2016-11-15 15:57:37 -0800 [debug]: plugin/buffer.rb:377:block in dequeue_chunk: chunk dequeued instance=69980473147160 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil>
2016-11-15_23:57:37.99902 2016-11-15 15:57:37 -0800 [debug]: plugin/output.rb:967:try_flush: trying flush for a chunk chunk="54135448d0359bdcbd2bad60a78b2525"
2016-11-15_23:57:37.99927 2016-11-15 15:57:37 -0800 [warn]: plugin/out_kafka_buffered.rb:265:rescue in write: Send exception occurred: undefined method `key' for #<Fluent::Plugin::Buffer::FileChunk:0x007f4b3509bb18>
2016-11-15_23:57:37.99929 2016-11-15 15:57:37 -0800 [warn]: plugin/out_kafka_buffered.rb:266:rescue in write: Exception Backtrace : /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluent-plugin-kafka-0.4.0/lib/fluent/plugin/out_kafka_buffered.rb:207:in `write'
2016-11-15_23:57:37.99929 /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/compat/output.rb:129:in `write'
2016-11-15_23:57:37.99929 /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin/output.rb:995:in `try_flush'
2016-11-15_23:57:37.99930 /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin/output.rb:1188:in `flush_thread_run'
2016-11-15_23:57:37.99930 /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin/output.rb:393:in `block (2 levels) in start'
2016-11-15_23:57:37.99930 /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin_helper/thread.rb:66:in `block in thread_create'
2016-11-15_23:57:37.99946 2016-11-15 15:57:37 -0800 [info]: plugin/out_kafka_buffered.rb:99:refresh_client: initialized kafka producer: kafka
2016-11-15_23:57:37.99960 2016-11-15 15:57:37 -0800 [debug]: plugin/output.rb:1001:rescue in try_flush: taking back chunk for errors. plugin_id="object:3fa5992467b4" chunk="54135448d0359bdcbd2bad60a78b2525"

@repeatedly
Copy link
Member

Thanks!

2016-11-15_23:57:37.99888 2016-11-15 15:57:37 -0800 [debug]: plugin/buffer.rb:377:block in dequeue_chunk: chunk dequeued instance=69980473147160 metadata=#

This line is weird. metadata is created with nil values. I will check it why this happens.

@repeatedly
Copy link
Member

2016-11-15_23:57:37.99929 2016-11-15 15:57:37 -0800 [warn]: plugin/out_kafka_buffered.rb:266:rescue in write: Exception Backtrace : /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluent-plugin-kafka-0.4.0/lib/fluent/plugin/out_kafka_buffered.rb:207:in write' 2016-11-15_23:57:37.99929 /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/compat/output.rb:129:inwrite'

This error happens immediately after restart.
Do previous buffer files exist?

@dtboctor
Copy link
Author

There's a large number of buffers when I restart, basically it is blocking the upload of all the buffers so we a ton of buffers stuck there.

@dtboctor
Copy link
Author

The odd thing is that everything was generated with version 14.6, we don't have any buffers left over from a previous or older version.

@dtboctor
Copy link
Author

Is there a way to tell from the error which buffer file is causing the issue?

@repeatedly
Copy link
Member

54135448d0359bdcbd2bad60a78b2525

This is chunk id and you can search which chunk has this problem.

BTW, do you update fluent-plugin-kafka together with fluentd?

@repeatedly
Copy link
Member

I can reproduce this problem:

  1. use older fluent-plugin-kafka which don't use tag for chunk key
  2. update fluent-plugin-kafka to latest verison. latest version uses tag for chunk key
  3. restart fluentd with old buffers.

In this case, same error happens.

@tagomoris
Copy link
Member

Is it a bug (or documentation issue) of fluent-plugin-kafka, right?

@repeatedly
Copy link
Member

We can't judge it because if dtboctor uses recent fluent-plugin-kafka version before,
the cause seems the bug of buffer / output.

@dtboctor
Copy link
Author

When I encountered this problem I was using 0.14.6 of fluentd and fluentd-plugin-kafka version 0.3.1. This was working together correctly for a while and is currently what is running on most of our servers without an issue. This problem began while running with that setup? I tried upgrading to 0.14.8 of fluentd and 0.4.0 of the fluent plugin together and this problem was still occurring.

I also tried just removing this file and the next file came up with the same error?

@repeatedly
Copy link
Member

repeatedly commented Nov 17, 2016

I also tried just removing this file and the next file came up with the same error?

I'm not sure. It depends on your buffer content.
if next buffer also doesn't have a key metadata, same error happens.

@dtboctor
Copy link
Author

Okay, I'm going to look into whether or not there was an issue where the plugins were not quite aligned with each other in terms of version. Is there a way to look at the buffer file and parse it to see what format or version it is using?

@dtboctor
Copy link
Author

I tried backing down to version 0.1.3 of the plugin, which I knew we were using at one time...I get

2016-11-18_21:35:50.91856 2016-11-18 13:35:50 -0800 [warn]: plugin/output.rb:1031:block in try_flush: failed to flush the buffer. plugin_id="object:3f9f08897790" retry_time=0 next_retry=2016-11-18 13:35:51 -0800 chunk="54135448d0359bdcbd2bad60a78b25
25" error_class=NoMethodError error="undefined method `[]' for nil:NilClass"
2016-11-18_21:35:50.91861   2016-11-18 13:35:50 -0800 [warn]: plugin/output.rb:980:try_flush: /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluent-plugin-kafka-0.1.3/lib/fluent/plugin/out_kafka_buffered.rb:151:in `block in write'
2016-11-18_21:35:50.91862   2016-11-18 13:35:50 -0800 [warn]: plugin/output.rb:980:try_flush: /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/event.rb:323:in `each'
2016-11-18_21:35:50.91863   2016-11-18 13:35:50 -0800 [warn]: plugin/output.rb:980:try_flush: /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/event.rb:323:in `block in each'
2016-11-18_21:35:50.91863   2016-11-18 13:35:50 -0800 [warn]: plugin/output.rb:980:try_flush: /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin/buffer/file_chunk.rb:143:in `open'
2016-11-18_21:35:50.91864   2016-11-18 13:35:50 -0800 [warn]: plugin/output.rb:980:try_flush: /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/event.rb:322:in `each'
2016-11-18_21:35:50.91864   2016-11-18 13:35:50 -0800 [warn]: plugin/output.rb:980:try_flush: /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluent-plugin-kafka-0.1.3/lib/fluent/plugin/out_kafka_buffered.rb:148:in `write'
2016-11-18_21:35:50.91864   2016-11-18 13:35:50 -0800 [warn]: plugin/output.rb:980:try_flush: /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/compat/output.rb:129:in `write'
2016-11-18_21:35:50.91865   2016-11-18 13:35:50 -0800 [warn]: plugin/output.rb:980:try_flush: /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin/output.rb:995:in `try_flush'
2016-11-18_21:35:50.91865   2016-11-18 13:35:50 -0800 [warn]: plugin/output.rb:980:try_flush: /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin/output.rb:1188:in `flush_thread_run'
2016-11-18_21:35:50.91866   2016-11-18 13:35:50 -0800 [warn]: plugin/output.rb:980:try_flush: /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin/output.rb:393:in `block (2 levels) in start'
2016-11-18_21:35:50.91866   2016-11-18 13:35:50 -0800 [warn]: plugin/output.rb:980:try_flush: /opt/rbenv/versions/2.1.9/lib/ruby/gems/2.1.0/gems/fluentd-0.14.8/lib/fluent/plugin_helper/thread.rb:66:in `block in thread_create'

@dtboctor
Copy link
Author

Is there anything additional I can send over that would help in understanding what happened here?

@repeatedly
Copy link
Member

You can check your buffer file status with following script:

  • unpack_buffer.rb
require 'msgpack'
require 'fluent/msgpack_factory'

unpacker = Fluent::MessagePackFactory.unpacker

log_path = ARGV.first
meta_path = "#{log_path}.meta"
log_file = File.basename(log_path)
puts "buffer file: #{log_path}"
unpacker.feed(File.read(meta_path)).each { |data|
  puts "meta keys: #{data["variables"]}"
}
unpacker.feed(File.read(log_path)).each { |data|
  puts "one event length: #{data.size}"
  break
}
  • Old buffer structure (kafka plugin v0.1.3 can flush this buffer)

meta keys is empty and one evnet consists of 3 elements.

% ruby unpack_buffer.rb old_buffer/buffer.b54177241e14b3fe471422a83b070bd24.log
buffer file: old_buffer/buffer.b54177241e14b3fe471422a83b070bd24.log
meta keys:
one event length: 3
  • New buffer structure (latest kafka plugin can flush this buffer)

meta keys has key field and one event consists of 2 element.

% ruby unpack_buffer.rb buffer/buffer.b541ac65e35ffd95ff373c322636fe387.log
buffer file: buffer/buffer.b541ac65e35ffd95ff373c322636fe387.log
meta keys: {"key"=>"kafka.test"}
one event length: 2

Latest kafka plugin can't flush old buffers. Check your buffer file structure.

@dtboctor
Copy link
Author

One other question, we had a very large event code through our system that broke a lot of other parts, is there any case where a large message could break fluentd? Like if it were very large in relation to the chunk size for the buffer?

@repeatedly
Copy link
Member

is there any case where a large message could break fluentd?

What does "break" mean? fluentd can handle any size of record.
For example, one user sends 1+MB record to fluentd.

@dtboctor
Copy link
Author

dtboctor commented Nov 21, 2016

I tried to run the code you posted, I have an interesting problem

[/var/log/evente/fluentd] == PRODUCTION lumoslabs@[lumosity-app-production-us-east-1b-i-eb73bc6d.aws.lumoslabs.com] ==
Mon Nov 21 12:11:02 [13:0]$ ls buffer_problem/
buffer.b54135448d0359bdcbd2bad60a78b2525.log.meta  buffer.q54135448d0359bdcbd2bad60a78b2525.log

[/var/log/evente/fluentd] == PRODUCTION lumoslabs@[lumosity-app-production-us-east-1b-i-eb73bc6d.aws.lumoslabs.com] ==
Mon Nov 21 12:11:08 [14:0]$ ruby unpack_buffer.rb buffer_problem/buffer.q54135448d0359bdcbd2bad60a78b2525.log
buffer file: buffer_problem/buffer.q54135448d0359bdcbd2bad60a78b2525.log
unpack_buffer.rb:10:in `read': No such file or directory @ rb_sysopen - buffer_problem/buffer.q54135448d0359bdcbd2bad60a78b2525.log.meta (Errno::ENOENT)
	from unpack_buffer.rb:10:in `<main>'

seems like my files are named differently than what is expected?

@dtboctor
Copy link
Author

I renamed my file to make the script work:

Mon Nov 21 13:45:56 [16:0]$ ruby unpack_buffer.rb buffer_problem/buffer.q54135448d0359bdcbd2bad60a78b2525.log
buffer file: buffer_problem/buffer.q54135448d0359bdcbd2bad60a78b2525.log
meta keys: {"key"=>"events"}
one event length: 2

@dtboctor
Copy link
Author

seems like most of our events are all buffer.q*, but a few are buffer.b*, and a good number of the meta files are...., why would that happen?

@repeatedly
Copy link
Member

That's weird...
"b" and "q" should be updated at the same time.
Hmm... v0.14 buffer may have a race condition between buffer pop and queueing.
We will check.

@repeatedly
Copy link
Member

repeatedly commented Nov 22, 2016

seems like most of our events are all buffer.q*, but a few are buffer.b*, and a good number of the meta files are....,

So in your buffer directory, the number of "b" meta file and the number of "q" buffer file are mismatched, right?

@dtboctor
Copy link
Author

Yes

@dtboctor
Copy link
Author

And it seems that I have other mismatches in there, definitely more than one

@dtboctor
Copy link
Author

So we had some errors before this occurred related to Kafka rejecting our messages due to the size, we eventually corrected that but could there be a case where an error causes it to be half set? Not sure, but this seems to have happened a lot on this machine, though we have many other machines running this exact setup that have had no issues like this whatsoever.

@dtboctor
Copy link
Author

Is there anything I can do to help diagnose what happened? Also, would just renaming these files all to q fix the issue in the interim?

@tagomoris
Copy link
Member

I identified that "undefined method 'key'" for nil occurs in #write when compat plugins are configured to use Zulu time format (to be formatted with "Z" timezone) via investigation of #1319.
It'll be fixed at that pull-request.
Chunk state mismatch is another problem now...

@dtboctor
Copy link
Author

Zulu time format? Any idea why this would be the case or a way for me to see if I match that descriptions? Not sure why we would be using that?

@tagomoris
Copy link
Member

Sorry for late reply, and, I'm sorry too that I misunderstood the problem in this issue. #1319 is another thing, and not related with this issue.

@dtboctor
Copy link
Author

Hey guys, I found the root cause of this error, so I thought I'd share

Basically this happens if you send a message through fluentd that is bigger than the configured chunk size. It basically permanently breaks with this error until you clear out the problematic files. Seems like it is something that should be handled, in the case that the file comes through, basically fluentd stops sending to kafka and the buffer grows indefinitely.

@repeatedly
Copy link
Member

Basically this happens if you send a message through fluentd that is bigger than the configured chunk size.

It means you can reproduce this problem? If so, could you share your configuration and reproducible step?

@dtboctor
Copy link
Author

yeah, I just set my buffer chunk limit to something pretty small, like 10k on our production server, and a >10k event came through, basically it started giving me this error. I'll see if I can get a more specific configuration to you.

@dtboctor
Copy link
Author

<match events.production>
  type kafka_buffered
  brokers <%= node[:fluentd][:kafka][:brokers] %>

  default_topic 'events'
  buffer_type file
  buffer_path '/var/log/evente/fluentd/events_production/'

  required_acks 1
  flush_interval 1

  buffer_chunk_limit 16k
  buffer_queue_limit 2048
</match>

an event came through fluentd that was over a megabyte (perhaps close to 2 megabytes) and it crashed with the above error.

@repeatedly
Copy link
Member

Thanks for the configuration.
We will check it!

@aiwantaozi
Copy link

aiwantaozi commented Jun 11, 2018

Hi, any update on this issue? I also have this problem.
Could you help?

fluentd (1.1.3)
fluent-plugin-kafka (0.7.3)
ruby-kafka (0.6.6)

stack:

 Send exception occurred: undefined method `key' for #<Fluent::Plugin::Buffer::FileChunk:0x007fb01a4a2870>
2018-06-11 09:25:00 +0000 [warn]: #0 Exception Backtrace : /var/lib/gems/2.3.0/gems/fluent-plugin-kafka-0.7.3/lib/fluent/plugin/out_kafka_buffered.rb:286:in `write'
/var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/compat/output.rb:131:in `write'
/var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/plugin/output.rb:1096:in `try_flush'
/var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/plugin/output.rb:1329:in `flush_thread_run'
/var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/plugin/output.rb:439:in `block (2 levels) in start'
/var/lib/gems/2.3.0/gems/fluentd-1.1.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2018-06-11 09:25:00 +0000 [info]: #0 initialized kafka producer: kafka
2018-06-11 09:25:00 +0000 [warn]: #0 failed to flush the buffer. retry_time=3 next_retry_seconds=2018-06-11 09:25:00 +0000 chunk="56e5a522d7b525c95dda43712c992bfb" error_class=NoMethodError error="undefined method `key' for #<Fluent::Plugin::Buffer::FileChunk:0x007fb01a4a2870>"
  2018-06-11 09:25:00 +0000 [warn]: #0 suppressed same stacktrace

config:

<match  mylog.**>

    @type kafka_buffered

    brokers xxxx:9093
    default_topic test
    output_data_type  "json"
    output_include_tag  true
    output_include_time  true
    max_send_retries 3
    ssl_ca_cert /fluentd/etc/ssl/ca.pem
    ssl_client_cert /fluentd/etc/ssl/client-cert.pem
    ssl_client_cert_key /fluentd/etc/ssl/client-key.pem
    <buffer>
      @type file
      path /fluentd/etc/buffer/project.buffer
      flush_interval 3s
      </buffer>

    disable_retry_limit
    num_threads 8
    slow_flush_log_threshold 40.0
</match>

@repeatedly
Copy link
Member

@aiwantaozi Does this problem happens with out_kafka2?
In v1, out_kafka2 is better.

@aiwantaozi
Copy link

So for fluentd 1.x, I should use kafka2 not kafka_buffered? Let me try.

@fujimotos
Copy link
Member

There are really two distinct problems here.

The second problem occurs because of the v0.12 compatibility layer.
Simply put, v0.12-era plugins can't handle a record larger than chunk_limit_size,
and this issue is also being investigated in #2084.

As to the first problem, the exact cause is unknown yet. I have tried
several times, but never succeeded to reproduce the error.

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

@github-actions github-actions bot added the stale label Jan 26, 2021
@github-actions
Copy link

This issue was automatically closed because of stale in 30 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants