Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on XFS might not be able to read local storage pos file after crash #2373

Closed
jim-minter opened this issue Apr 7, 2019 · 2 comments · Fixed by #2409
Closed

on XFS might not be able to read local storage pos file after crash #2373

jim-minter opened this issue Apr 7, 2019 · 2 comments · Fixed by #2409

Comments

@jim-minter
Copy link

td-agent 2.3.6
rhel 7.6, XFS filesystem

    <source>
      @type systemd
      <storage>
        @type local
        path /var/log/journald.pos
      </storage>
      tag journald
    </source>

After a kernel panic and reboot, td-agent refused to start with:

2019-04-07 15:55:50 +0000 [error]: fluent/log.rb:362:call: config error file="/td-agent/config/fluent.conf" error_class=Fluent::ConfigError error="Unexpected error: failed to read data from plugin storage file: '/var/log/journald.pos/worker0/storage.json'"

/var/log/journald.pos/worker0/storage.json existed but was 0 bytes long.

It is good practice to use fdatasync() or fsync(); on XFS it is essential:

I recommend an fdatasync or fsync call is added around here:

open(tmp_path, 'w:utf-8', @mode){ |io| io.write json_string }

It may also be worth considering logging a warning in the event of a 0-length file at

raise Fluent::ConfigError, "Unexpected error: failed to read data from plugin storage file: '#{@path}'"
but continuing startup as if the file did not exist.

@repeatedly
Copy link
Member

repeatedly commented Apr 8, 2019

Thanks for the report.

Adding io.fsync in block is ok for me.

In addition, we need to add option like flush_mode.

  • shutdown: sync data at shutdown
  • each_update: sync data at each put/update operation
  • interval or number: sync data when time/number of operation passed.

Without sync, storage content can't be stored by machine crush.

How about this?

@jim-minter
Copy link
Author

makes sense to me :)

repeatedly added a commit that referenced this issue May 10, 2019
…ref #2373

Signed-off-by: Masahiro Nakagawa <repeatedly@gmail.com>
repeatedly added a commit that referenced this issue May 10, 2019
Signed-off-by: Masahiro Nakagawa <repeatedly@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants