Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows service restart hagning with windows_eventlog2 #59

Closed
nmaludy opened this issue Jun 19, 2020 · 5 comments
Closed

Windows service restart hagning with windows_eventlog2 #59

nmaludy opened this issue Jun 19, 2020 · 5 comments

Comments

@nmaludy
Copy link

nmaludy commented Jun 19, 2020

Hello, i just recently deployed FluentD on Windows to all of our DEV machines and am noticing that when trying to restart the FluentD service, that it is failing to stop the underlying ruby processes causing the restart to hang and eventually fail.

Here is our config:

<ROOT>
  <source>
    channels application,system,security
    @id windows_eventlog2
    parse_description true
    read_existing_events false
    read_interval 2
    render_as_xml true
    tag "windows_eventlog"
    @type windows_eventlog2
    <parse>
      preserve_qualifiers true
      @type "winevt_xml"
    </parse>
    <storage>
      path "C:/opt/td-agent/windows_eventlog_pos"
      persistent true
      @type "local"
    </storage>
  </source>
  <filter windows_eventlog**>
    @type grep
    <exclude>
      key "EventID"
      pattern 4656
    </exclude>
  </filter>
  <filter windows_eventlog**>
    remove_keys Keywords
    @type record_transformer
    <record>
      short_message ${record["DescriptionTitle"]}
    </record>
  </filter>
  <filter **>
    enable_ruby true
    @type record_transformer
    <record>
      host #{Socket.gethostbyname(Socket.gethostname).first}
    </record>
  </filter>
  <match **>
    flush_interval 1s
    host logging.domain.tld
    port 12201
    protocol "udp"
    @type gelf
    use_record_host true
    <buffer>
      flush_mode interval
      retry_type exponential_backoff
      flush_interval 1s
    </buffer>
  </match>
</ROOT>

I don't see anything obvious in the logs other than the worker says it has stopped but the process keeps on running:

2020-06-18 15:27:27 -0400 [info]: starting fluentd-1.10.2 pid=2300 ruby="2.4.10"
2020-06-18 15:27:28 -0400 [info]: spawn command to main:  cmdline=["C:/opt/td-agent/embedded/bin/ruby.exe", "-Eascii-8bit:ascii-8bit", "C:/opt/td-agent/embedded/bin/fluentd", "-c", "C:/opt/td-agent/etc/td-agent/td-agent.conf", "-o", "C:/opt/td-agent/td-agent.log", "-x", "td-agent", "--under-supervisor"]
2020-06-18 15:27:29 -0400 [info]: #0 disable filter chain optimization because [Fluent::Plugin::RecordTransformerFilter, Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` method.
2020-06-18 15:27:30 -0400 [info]: adding filter pattern="windows_eventlog**" type="grep"
2020-06-18 15:27:30 -0400 [info]: adding filter pattern="windows_eventlog**" type="grep"
2020-06-18 15:27:30 -0400 [info]: adding filter pattern="windows_eventlog**" type="grep"
2020-06-18 15:27:30 -0400 [info]: adding filter pattern="windows_eventlog**" type="record_transformer"
2020-06-18 15:27:30 -0400 [info]: adding filter pattern="**" type="record_transformer"
2020-06-18 15:27:30 -0400 [info]: adding match pattern="**" type="gelf"
2020-06-18 15:27:31 -0400 [info]: adding source type="windows_eventlog2"
2020-06-18 15:27:31 -0400 [warn]: #0 define <match fluent.**> to capture fluentd logs in top level is deprecated. Use <label @FLUENT_LOG> instead
2020-06-18 15:27:31 -0400 [info]: #0 starting fluentd worker pid=4484 ppid=2300 worker=0
2020-06-18 15:27:31 -0400 [warn]: #0 [windows_eventlog2] This stored bookmark is incomplete for using. Referring `read_existing_events` parameter to subscribe: <BookmarkList>
</BookmarkList>, channel: application
2020-06-18 15:27:31 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json" tmp="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp, C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json)"
2020-06-18 15:27:31 -0400 [warn]: #0 [windows_eventlog2] This stored bookmark is incomplete for using. Referring `read_existing_events` parameter to subscribe: <BookmarkList Direction='backward'>
</BookmarkList>, channel: system
2020-06-18 15:27:31 -0400 [info]: #0 fluentd worker is now running worker=0
2020-06-18 15:27:33 -0400 [info]: #0 disable filter chain optimization because [Fluent::Plugin::RecordTransformerFilter, Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` method.
2020-06-18 15:27:33 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json" tmp="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp, C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json)"
2020-06-18 15:27:33 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json" tmp="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp, C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json)"
2020-06-18 15:27:37 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json" tmp="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp, C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json)"
....
2020-06-18 18:17:55 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json" tmp="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp, C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json)"
2020-06-18 18:18:25 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json" tmp="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp, C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json)"
2020-06-18 18:18:31 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json" tmp="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp, C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json)"
2020-06-18 18:19:03 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json" tmp="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp, C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json)"
2020-06-18 18:19:43 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json" tmp="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp, C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json)"
2020-06-18 18:21:13 -0400 [info]: Received graceful stop
2020-06-18 18:21:13 -0400 [info]: Worker 0 finished with status 0
2020-06-18 18:21:17 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json" tmp="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp, C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json)"
2020-06-18 18:21:35 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json" tmp="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp, C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json)"
2020-06-18 18:21:55 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json" tmp="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp, C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json)"
...

Anything that i can check or try to help debug and fix this?

@philipsabri
Copy link
Contributor

2020-06-18 18:19:43 -0400 [error]: #0 [windows_eventlog2] failed to save data for plugin storage to file path="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json" tmp="C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp" error_class=Errno::EACCES error="Permission denied @ rb_file_s_rename - (C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json.tmp, C:/opt/td-agent/windows_eventlog_pos/worker0/storage.json)"

Isn't this same as #57 with your AV. Tell the AV to exclude the file.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Jun 20, 2020

Yeah, this issue should be the same root of cause.
Could you specify storing storage.json folder for excluding scan folder by antivirus solution?
Windows antivirus system should be able to specify excluding folder.
If there is no way to specify it, Fluentd shouldn't be able to detect whether not-enough permission or just blocked by antivirus solution.
Because antivirus solution should run higher privileges than ordinary processes.
If there is any workaround permission mechanism without excluding folder, this should be Windows' security hole....

@nmaludy
Copy link
Author

nmaludy commented Jun 20, 2020

@flurreN and @cosmo0920 thanks for the help, however i tried the exclusion like before and then even turning off the AV agent completely and still had the issue.

It turned out my Config Management code (Puppet) was broken (not idempotent) and it was trying to delete the service definition then recreate it, then restart it. When this happened the service would fail to stop, probably something to do with the service definition being deleted and recreated.

Fixing the idempotency in my Puppet code solved the issue and Puppet can now successfully manage the service, restart it with config changes, etc.

Here is the PR with the fix: https://github.com/EncoreTechnologies/puppet-fluentd/pull/5/files

@nmaludy nmaludy closed this as completed Jun 20, 2020
@nmaludy
Copy link
Author

nmaludy commented Jun 22, 2020

@flurreN and @cosmo0920 as a side note... FluentD on Windows is working AMAZING compared to NXLog! We've also been using the client-side parsing and filtering capabilities to take a TON of load off of our central logging servers (Graylog).

Thank you very much for your help, i'm really excited about FluentD!!!

@cosmo0920
Copy link
Contributor

cosmo0920 commented Jun 23, 2020

Direct Hash object mapping is very helpful to reduce resource usage. This is Ruby's magic! 😄 ... And tons of hand writing code here: https://github.com/fluent-plugins-nursery/winevt_c/blob/master/ext/winevt/winevt_utils.cpp 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants