Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_tail throws error and crashes process #3327

Closed
davidbhoward opened this issue Apr 12, 2021 · 4 comments · Fixed by #3328
Closed

in_tail throws error and crashes process #3327

davidbhoward opened this issue Apr 12, 2021 · 4 comments · Fixed by #3328
Labels

Comments

@davidbhoward
Copy link

davidbhoward commented Apr 12, 2021

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug
We are seeing an exception being thrown while Fluentd is starting up, which is causing Fluentd process to crash. We suspect these are caused by short-lived, often run, K8s CronJobs where the Docker logs no longer exists, but the symlinks are still there and end up causing in_tail to pick it up and throw this error. We were trying to fix the in_tail bugs discussed in #3239, which is why we are specifically trying to deploy this version.

To Reproduce
Have several CronJobs that run every 1 min on a node and only live for a very short duration.

Expected behavior
Fluentd would continue to run if the in_tail file doesn't exist and this condition is hit that causes this error or bug is fixed.

Your Environment

  • Fluentd or td-agent version: v1.12.2.
    Note: We are building and installing from source following this guide. We have been doing so before this release without issues.
  • Ruby: 2.7.2
  • Operating system: CentOS 7
  • Kernel version: 4.15.0-70-generic

If you hit the problem with older fluentd version, try latest version first.
This happens on the latest version of Fluentd.

Your Configuration

    <source>
      @type tail
      @log_level debug
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      follow_inodes true
      <parse>
        @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

Your Error Log

2021-04-12 16:00:21 -0700 [warn]: /var/log/containers/obfuscated_container_xxx_1.log not found. Continuing without tailing it.
2021-04-12 16:00:21 -0700 [warn]: stat() for /var/log/containers/obfuscated_container_xxx_1.log failed with ENOENT. Drop tail watcher for now.
2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean?  each_slice"
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:428:in `stop_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:422:in `rescue in block in start_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:416:in `block in start_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `each_value'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `start_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:359:in `refresh_watchers'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:234:in `start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:200:in `block in start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:189:in `block (2 levels) in lifecycle'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:188:in `each'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:188:in `block in lifecycle'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:175:in `each'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:175:in `lifecycle'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:199:in `start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/engine.rb:248:in `start'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/engine.rb:147:in `run'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:700:in `block in run_worker'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:951:in `main_process'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:691:in `run_worker'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/command/fluentd.rb:365:in `<top (required)>'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:72:in `require'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:72:in `require'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/bin/fluentd:8:in `<top (required)>'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/bin/fluentd:23:in `load'
  2021-04-12 16:00:21 -0700 [error]: /usr/local/bin/fluentd:23:in `<main>'
2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean?  each_slice"
  2021-04-12 16:00:21 -0700 [error]: suppressed same stacktrace

Additional context

We have hundreds of Fluentd instances running and there is a single node that seems to be hitting this problem. While looking at the node, I suspect it is caused by many short-lived, often run (every 1 min), CronJobs where the log files don't exist because the containers are now gone but symlinks still exist. Please let me know if there is more information I can provide here to help troubleshoot this issue.

@davidbhoward davidbhoward changed the title v1.12.2 in_tail throws error and crashes process in_tail throws error and crashes process Apr 12, 2021
@ashie ashie added the bug label Apr 13, 2021
@davidbhoward
Copy link
Author

davidbhoward commented Apr 13, 2021

After looking at this more, this has been happening since earlier this morning, which is longer than we deployed v1.12.2, so I don't think this is unique to this latest version. I do believe though that many short-lived jobs will cause this error to happen. The mitigation here is to use the exclude_path for the in_tail plugin and just not input any short-lived, often run, CronJobs.

@ashie
Copy link
Member

ashie commented Apr 13, 2021

Thank you for you report!
This crash is definitely from v1.12.2, by #3275 (d066aa2)

@ashie
Copy link
Member

ashie commented Apr 13, 2021

After looking at this more, this has been happening since earlier this morning, which is longer than we deployed v1.12.2, so I don't think this is unique to this latest version.

If your previous version is 1.12.0 or 1.12.1, probably you saw a different bug #3274.

ashie added a commit that referenced this issue Apr 13, 2021
It may occur on catching a short-lived log (#3327).

Signed-off-by: Takuro Ashie <ashie@clear-code.com>
ashie added a commit that referenced this issue Apr 13, 2021
This commit fixes the following error reported at #3327, it will occur
when a file is removed after calling setup_watcher and before
registering the created watcher.

  2021-04-12 16:00:21 -0700 [warn]: stat() for /var/log/containers/obfuscated_container_xxx_1.log failed with ENOENT. Drop tail watcher for now.
  2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean?  each_slice"
    2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:428:in `stop_watchers'
    2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:422:in `rescue in block in start_watchers'
    2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:416:in `block in start_watchers'
    2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `each_value'
    2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `start_watchers'

Signed-off-by: Takuro Ashie <ashie@clear-code.com>
ashie added a commit that referenced this issue Apr 13, 2021
This commit fixes the following error reported at #3327, it will occur
when a file is removed after calling setup_watcher and before
registering the created watcher.

  2021-04-12 16:00:21 -0700 [warn]: stat() for /var/log/containers/obfuscated_container_xxx_1.log failed with ENOENT. Drop tail watcher for now.
  2021-04-12 16:00:21 -0700 [error]: unexpected error error_class=NoMethodError error="undefined method `each_value' for #<Fluent::Plugin::TailInput::TargetInfo:0x00005641d5026828>\nDid you mean?  each_slice"
    2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:428:in `stop_watchers'
    2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:422:in `rescue in block in start_watchers'
    2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:416:in `block in start_watchers'
    2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `each_value'
    2021-04-12 16:00:21 -0700 [error]: /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.12.2/lib/fluent/plugin/in_tail.rb:396:in `start_watchers'

Signed-off-by: Takuro Ashie <ashie@clear-code.com>
@davidbhoward
Copy link
Author

Awesome. Thank you, @ashie, for the quick turn-around here. Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants