Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows EventLog support #1395

Closed
tdabasinskas opened this issue Dec 10, 2019 · 45 comments · Fixed by #3246
Closed

Windows EventLog support #1395

tdabasinskas opened this issue Dec 10, 2019 · 45 comments · Fixed by #3246
Labels
component/agent help wanted We would love help on these issues. Please come help us! keepalive An issue or PR that will be kept alive and never marked as stale. type/feature Something new we should do

Comments

@tdabasinskas
Copy link

tdabasinskas commented Dec 10, 2019

Is your feature request related to a problem? Please describe.
Windows logs are stored in Event Log (.evtx files), which currently not possible to scrape it via currently available promtail methods.

Describe the solution you'd like
Since we do have systemd journal support for Linux, it would be nice to have support for Event Log on Windows in a similar matter.

Describe alternatives you've considered
Key part of the solution is actually being able to parse the logs. If I haven't missed anything, it seems that there are currently two Golang modules that can do that: github.com/0xrawsec/golang-evtx and github.com/elastic/beats/winlogbeat/eventlog.

@randomchance
Copy link

This would instantly make Loki viable in my environment - fluentd requires ruby, which is no-go, but having a single go executable would be perfect.

Alternatively, being able to accept data from winlogbeat (or beats in general!) and run it through the pipline would be amazing!

@cyriltovena cyriltovena added component/agent help wanted We would love help on these issues. Please come help us! keepalive An issue or PR that will be kept alive and never marked as stale. type/feature Something new we should do labels Dec 13, 2019
@PWSys
Copy link

PWSys commented Jan 3, 2020

+1 I have many Windows systems in my environment.
randomchance's suggestion of leveraging winlogbeat would work as well!

@steenstra
Copy link

This would be great!

I'm currently using InfluxDB/Telegraf as Syslog receiver with NXlog (https://nxlog.co/) to convert Windows Event logs to Syslog, using the im_msvistalog module.

I see that Promtail can be used as Syslog target (https://github.com/grafana/loki/blob/master/docs/clients/promtail/scraping.md#syslog-target), so maybe something like that would be a temporary solution until this is implemented?

@cosmo0920
Copy link
Contributor

cosmo0920 commented Jan 7, 2020

Another alternative is using Fluentd's Windows EventLog plugin.

Fluentd ecosystem has fluent-plugin-windows-eventlog's in_windows_eventlog2 plugin which can consume .evtx format Windows EventLog.
But this workaround requires Ruby as randomchance's suggested.

@pomazanbohdan
Copy link
Contributor

receiver with NXlog (https://nxlog.co/) to convert Windows Event logs to Syslog

If you convert Windows events to a log, then it can already be sent to loki via promtail

@randomchance
Copy link

Just wanted to let interested parties know - winlogbeat (and all the elastic beats) can be configured to output to rolling files instead of logstash, so you can scrape them with promtail!

@cyriltovena
Copy link
Contributor

We're going to add logstash soon, this is nice to know for windows users.

@azawawi
Copy link

azawawi commented Apr 25, 2020

I am currently working on this one (i.e a golang prototype to get windows event logs directly to promtail). So far the executable size is ~2.5MB. Your feedback is appreciated for the following:

  • Output:

    it is writing to win-event.log in the current directory.

  • Log Name:

    It supports the following log names:

    • Application
    • System
    • Security (requires admin privileges)
    • Setup (requires admin privileges)
    • "Forwarded Events". Please note that this is actually missing in the fluentd plugin implementation.
  • Position:

    Should we take Source + EventID + Timestamp to provide tailing? We can read all the logs and we can modify the log query to show events after last read timestamp.

  • Output type / Size:

    On my development machine I have like 8790 log entries. It finishes writing them in less than 3 seconds.

    Windows Event Log renders its event as XML output and fluentd has an option to convert it to JSON. What is the most useful output (XML or JSON) to focus on? I have implemented both so far.

    For ~9K log entries on my windows machine:

    • XML (native): Size is a bit more but it is a bit faster since there is no JSON conversion
    • JSON: is like 392K lines (~7 MB and when prettified ~9 MB)
  • Testing:
    Using the following powershell commandlet to test:

    Write-EventLog -LogName Application -Source "Windows Error Reporting" -EntryType Information -EventId 1 -Message "This is a Test Message"

Kindly provide example production workload numbers. Your feedback is appreciated 😄

@azawawi
Copy link

azawawi commented Apr 25, 2020

And here is the result so far (Promtail / Loki / Windows Event Log prototype on Windows 10):

Please note there is a usability bug in Grafana's explore / query. To get past the error parse error at line 1, col 11: invalid char escape , you need to escape \ with \\.

Screenshot 2020-04-25 125415

@randomchance
Copy link

That's awesome, thanks so much for working on this!

As for feedback:

  • It supports the following log names:

I think this will need to be able to support arbitrary log names, just try all the configured ones and ignore ones that fail. It's pretty standard practice for enterprise applications to create their own logs that need to be monitored, even ignoring all of the other Microsoft ones. Here are a just a few of the ones I would need to monitor:

  • Veeam Agent
  • Windows PowerShell
  • Microsoft-Windows-DSC/Admin
  • Microsoft-Windows-DSC/Operational
  • Microsoft-Windows-DiskDiagnostic/Operational
  • Microsoft-Windows-Hyper-V-VmSwitch-Operational
  • Microsoft-Windows-Kerberos/Operational

You can see the list of ones available in powershell by running:

Get-WinEvent -ListLog *

My desktop has 510 logs registered!

Should we take Source + EventID + Timestamp to provide tailing?

As for tailing, I would think that using logname + source+timestamp or even just logname+timestamp would be ideal, the Event ID's are often repeated or reused, especially by non-Microsoft sources.

it is writing to win-event.log

This would need to be rotated and cleaned up - my personal favorite method to rotate logs that are being consumed is to include the date in the file name, such as always writing to logname.{day}{month}{year}.log and have a configured limit on the number of files retained - this way you are not changing the names of files promtail should read, and you can easily say "Keep 30 days of data". Also, if you don't want to handle cleaning the old files up it's easier for me to have a script that just deletes files that match a pattern with a LastWriteTime older than X days.

in the current directory

This would need to be configurable. The executable will generally be in the program files directory which will require admin access to write to, and for some of our servers I would want to move it to another drive for space considerations - the logs easily reach a couple of gigs a month, and we need to retain them for regulatory reasons.

Thanks again, and I'm happy to provide feedback!

@cyriltovena
Copy link
Contributor

From what I understand you're listening to event and writing them to a temporary files. This is nice, although alternatively later we could read directly from promtail with a new windows target.

Really thank you for contributing this is awesome ! and @azawawi join #loki-dev if you need anything.

@azawawi
Copy link

azawawi commented Apr 28, 2020

@randomchance Thanks for the useful info and feedback. I really appreciate it.

@cyriltovena Yes that sums it up. So from I understand, I need to add a new windows-only build target (i.e. wineventtarget.go, wineventmanager.go and winevent_test.go) inside the targets folder to implement it. Anything else I missed?

@randomchance
Copy link

@azawawi If it's possible, I agree with @cyriltovena that it would be better to send directly to promtail, technically you can already use Winlogbeat to write eventlogs directly to files.

Thanks again!

@randomchance
Copy link

@azawawi You asked for example production workload numbers, so I got some for you!

I just checked one of our installations were we batch logs for retention.

Interval Entries Size
1 Hour (Busy) 140966 69.1 Mb
1 Hour (Calm) 9236 6.5 Mb

... and I just realized this is an aggregate number so not super applicable, but maybe if you divide it by the 9 servers?

@cyriltovena
Copy link
Contributor

@cyriltovena Yes that sums it up. So from I understand, I need to add a new windows-only build target (i.e. wineventtarget.go, wineventmanager.go and winevent_test.go) inside the targets folder to implement it. Anything else I missed?

No that’s the idea you got it right, we can help you of course along the way.

@randomchance
Copy link

Since the eventlog API supports xpath queries, I think that would be a good low hanging fruit for any solution.

@azawawi I did some digging into how DotNet handles persisting the last read location in the EventLog stream. DotNet has an EventBookMark class that you use, but under the hood that is just storing the channel and RecordID.

This means that if you use an XPath query, you can filter it with something like:
Event[System[EventRecordID > 83005]]

Where 83005 is the RecordID of the last record stored.

This query gets records after record 83005 and older than 86400000 milliseconds
Event[System[EventRecordID > 83005 and TimeCreated[timediff(@SystemTime) <= 86400000]]]

Or if you want to get the specific event you left off on:
Event[System[EventRecordID = 83005]]

I think it would be smart to look for the single event first, and if it's no longer there you can assume the the log has dropped it and just start at the beginning, or also store the timestamp and fall back to it.

I don't know if that's helpful, but I was looking into writing something similar in DotNet Core and this was stumping me for a while.

You can test the filters/queries in PowerShell pretty easily:

$query = "Event[System[EventRecordID > 83005  and TimeCreated[timediff(@SystemTime) <= 86400000]]]"
Get-WinEvent -FilterXPath $query -LogName System

@Jacq
Copy link

Jacq commented Jun 5, 2020

I hope to hear more about this, I am currently using winlogbeat with elasticsearch, but if the use of loki for these event logs is possible I could remove the elasticsearch instance and save cpu resources.
I could not find a similar solution with telegraf, it might be in the same situation as loki: influxdata/telegraf#4525
Cheers,
Jacq

@Jacq
Copy link

Jacq commented Jun 25, 2020

This would instantly make Loki viable in my environment - fluentd requires ruby, which is no-go, but having a single go executable would be perfect.

Alternatively, being able to accept data from winlogbeat (or beats in general!) and run it through the pipline would be amazing!

Why not using fluentbin input winlogevent plugin https://docs.fluentbit.io/manual/pipeline/inputs/windows-event-log ?
,and the grafana loki output plugin https://github.com/grafana/loki/tree/master/cmd/fluent-bit
No ruby is required for fluentbit which relies on C and Go for the loki plugin, both combined are below 30MB.

I have both winlogbeat (which is amazing) and the above config with fluent bit and seems to work, only caveat is that I have yet to include eventlog level information (info,error...) in the loki labels.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Jun 26, 2020

fluent-bit winlogevent plugin does not support to retrieve eventlog's description which should be supported with <winevt.h> API but fluent-bit winlogevent plugin does not use the new Windows EventLog API.

@simnv
Copy link
Contributor

simnv commented Aug 19, 2020

I could not find a similar solution with telegraf, it might be in the same situation as loki: influxdata/telegraf#4525

@Jacq Check influxdata/telegraf#8000

@secustor
Copy link

@Jacq Can you share how you have built the Loki FluentD-bit plugin?
I get errors in the syscall package when I'm trying to compile with 'windows/amd64' as target platform. Thanks!

@Ulfy
Copy link

Ulfy commented Aug 26, 2020

I'm also in need of a Loki compliant log shipper for Windows Event Logs. I'm currently trying to get Fluentbit working, since it has a winlog plugin. I'm not sure if I can build a Loki plugin using /cmd/fluent-bit. It just creates a file for *nix usage, not a Windows DLL/library. Is there a way to target a different arch for the file output?

@JacoboDominguez
Copy link

@Jacq Can you share how you have built the Loki FluentD-bit plugin?
I get errors in the syscall package when I'm trying to compile with 'windows/amd64' as target platform. Thanks!

I have to check, I think I tried to build also but finally grab the binary from the online repo.

@randomchance
Copy link

As far as I can tell, all of the fluent options only support the older style logs, not the newer "channels" such as Microsoft-Windows-DiskDiagnostic/Operational which ended up being a deal breaker for me.

Right now I'm using winlogbeat => logstash => loki and while I like winlogbeat, I really dislike running logstash on windows.

@cosmo0920
Copy link
Contributor

As far as I can tell, all of the fluent options only support the older style logs, not the newer "channels" such as Microsoft-Windows-DiskDiagnostic/Operational which ended up being a deal breaker for me.

AFAIK, the newer Windows EventLog should be retrieved with <winevt.h> API.

@secustor
Copy link

I have found the online repo @JacoboDominguez is referencing.
It is the repo which has been used before the plugin has been adopted by the Grafana team. https://github.com/cosmo0920/fluent-bit-go-loki

Further I have opened an issue to supply the fluent-bit loki plugin for windows as binary or add a make command for this. #2563

@carwyn
Copy link

carwyn commented Aug 27, 2020

@azawawi are you able to share the code for the work you are doing? We have many Windows servers I can potentially test this on.

@Ulfy
Copy link

Ulfy commented Sep 1, 2020

Right now I'm using winlogbeat => logstash => loki and while I like winlogbeat, I really dislike running logstash on windows.

@randomchance what kind of mutations are you doing to add labels for loki and drop high cardinality ones? I'm configuring this rn, but the output from winlogbeat is massive...

@randomchance
Copy link

randomchance commented Sep 14, 2020

@Ulfy sorry for the delay - I'm really only doing a couple of things, and taking advantage of the fact that top level fields with multiple values are dropped.

A quick summary:

  1. I only add three labels right now (but that may change) Instance,Level and Weight - Weight {low|audit|normal|high} is just a measure of the importance of the log or source the entry is from and makes surfacing problems faster. I plan to have alerts triggered by high weight error level entries.
  2. I create a header of name=value pairs such as event_id=1753 to make filtering easier and add it as the first line of the message.
  3. Metadata about the entry that I want to save is serialized to JSON and added as the last line of the message.
  4. Other top level fields I rename to something like [loki][temp][field_name] so they are discarded when sent to Loki, but still show up if I send them to a file while debugging.

The guidelines for loki stress how important it is to not add a ton of labels, and point out that the regex filtering is very powerful.
After creating a header to filter on I can say that the search time and low storage cost is impressive, however there are serious pain points.

  • While Grafana parses the header and they show as "Parsed Fields" the built in filter does nothing with them.
  • While you can get a count of errors with "provider=Microsoft-Windows-FailoverClustering-Clusport-Diagnostic" you can not get a list of providers and their counts.
  • Basically, only "true" labels are first class citizens if you are doing anything other then using the "Explore" viewer or searching for pattern you already know.

It's possible that sending logs to promtail first and generating metrics there will help address some of the issues I've met, but I'm probably going to need to add some indication of the log channel as a label, though I'm not sure what that will look like yet.

@calebcoverdale
Copy link

@randomchance would you be willing to share your config file for Logstash to Loki? I got winlogbeat to talk to Logstash, now I am a bit lost in actually formatting the data to get into Loki.

@randomchance
Copy link

randomchance commented Sep 24, 2020

@calebcoverdale
I can't share the my full config, but I have mocked up something similar.
One thing I can't stress enough is to double check your string quotation - the logstash config language is frankly horrible when it comes to rules on quoting identifiers.

Here is a excerpt from a lessons learned KB I put together for my team:

See Accessing Event Data and Fields in the Configuration and Field References Deep Dive

Field names (object properties) MUST NOT be quoted in conditionals, but MUST in other situations - I'm explicitly not going to specify where because you should go re-read the documentation. Again. Every time something does not work as expected.

The string handling in configuration files alone will, eventually, go a long way towards either convincing you to develop test input that fully exercises any conditionals in your configuration, or convincing you to >find another log processing solution, if you can.

Here is full example config, I tried to add explanatory comments:


input{
    beats { 
        port => 5044 
        add_field => {       
            "[loki][header]"  => " "      
            "[weight]"  => "unknown" 
            "[loki][save][agent-version]" => "%{[agent][version]}"
        }
    }
}

filter{
# a lot of the channels have events that just say this :( 
    if  [message] =~ "/^For internal use/" {
        drop{}
    }

  if [agent][type] == "winlogbeat" {

      # create the top level field [level], which becomes a label.
      # create the field [loki] and the nested field [level] in case I want to use it later
    mutate {
      add_field => {
        "[loki][level]" => "%{[log][level]}"
        "[level]" => "%{[log][level]}"
        }
    }

    # If the event is from the "classic" event logs
    # I use the provider name as the "source" for the event and store it in [loki][identifier]
    if  [winlog][channel] =~ "(System|Application|Security|Setup)" {
      mutate {
        add_field => {
          "[loki][identifier]" => "%{[winlog][provider_name]}"
        }
        add_tag => ["classic_log","%{[winlog][provider_name]}"]
      }
    }else {
      mutate {
        add_field => {
        # The newer log channels are generaly single application/system so I use the channel name for those
          "[loki][identifier]" => "%{[winlog][channel]}"
        }
        add_tag => ["channel","%{[winlog][channel]}"]
      }
    }
    
    #################################################################
    # now we can set the weight 
    #################################################################

    # this uses the translate function to pull a weight value from an external json file
    # the if none of the keys match fallback to "normal"
    translate {
        field => "[loki][identifier]"
        destination => "[loki][weight]"
        regex => true
        dictionary_path => "D:/logstash/config/log-weights.json"
        fallback => "normal"    
        # I add tags everywhere so I can follow the event progression while debugging
        add_tag => [ "sys_translate_provider_weight" ] 
      }
    
    #################################################################
    # Now set the category - audits get special treatment first
    #################################################################

    # add an Audit line to security audits and the audit header

    if [winlog][provider_name] =~ "Security-Auditing" {
      mutate{
      replace => { 
        "[loki][header]" => " \n\t type=%{[event][type]} category=%{[event][category]} action=%{[event][action]} outcome=%{[event][outcome]} %{[loki][header]}"  
        "[loki][weight]" => "audit"
        "[loki][category]" => "%{[event][category]}-audit"
        }
        # these fields only exists on audit events
        add_field  => {
          "[loki][event_data][logon]" => "%{[winlog][logon]}"
          "[loki][event_data][keywords]" => "%{[winlog][keywords]}"
          }
          add_tag => [ "sec-audit" ]
      }
    }  else { 
      # having the same field and source means a replace is done, otherwise it only seems to work on new fields
      # correction, replace does not seem to work, contrary to the docs (at least on windows)
      translate {
        field => "[loki][identifier]"
        destination => "[loki][category]"
        dictionary_path => "D:/logstash/config/category-mapping.json"
        fallback => "windows"
        add_tag => [ "channel_translate" ]
      }
    }

    # clean up the channel and provider names
    mutate {
      # provider can have spaces, so replace them
            gsub => [
            # replace all spaces with minus
          "[winlog][provider_name]", "\s", "-"
          "[winlog][channel]", "\s", "-"
          ]
    }

# I'm mostly building a header string here

    # Add the channel and provider to the front of the [loki][header], creating it if it's not there.
    mutate {
      replace => {
        "[loki][header]" => "channel=%{[winlog][channel]} provider=%{[winlog][provider_name]} %{[loki][header]}"
        }
    }
      
    if [host][ip] {
        mutate {   
            rename => {"[host][ip]" => "[loki][event_data][host_ip]"}    
        }
    }

    if [host][mac] {
        mutate {   
            rename => {"[host][mac]" => "[loki][event_data][host_mac]"}    
        }
    }


    # We don't want a User label, so rename it, I'm storing it in a subfield of [loki] called [event_data]
    # I'm going to save [event_data] later
    if [user] {
        mutate {
            rename => {"[user]" => "[loki][event_data][user]"}
        }
    }

    # reboot events get flagged as critical
    if "[winlog][event_id]" {
        mutate {   
            replace => {
              #"message" => "event_id=%{[winlog][event_id]} %{[message]}"
              "[loki][header]" => "event_id=%{[winlog][event_id]} %{[loki][header]}"
              }
            add_field  => {"[loki][event_data][event_id]" => "%{[winlog][event_id]}"}    
        }

         if [winlog][provider_name] =~  "Kernel-(General|Power|Boot)" and  [winlog][event_id] =~ "(12|13|109)" {
          mutate {
            replace => {
            "[loki][category] "=> "boot"
            "[loki][weight]"=> "high"
            }
            
          }
         }
    }
  }  
  # end if-winlogbeat


# this is where I populate the [event_data] with any info I might want later

  if [host][ip] {
      mutate {   
          rename => {"[host][ip]" => "[loki][event_data][host_ip]"}    
      }
  }

  if [host][mac] {
      mutate {   
          rename => {"[host][mac]" => "[loki][event_data][host_mac]"}    
      }
  }
  if [agent][hostname] {
      mutate {   
          rename => {"[agent][hostname]" => "[loki][event_data][computer_name]"}    
      }
  }

# I use the [instance] (which is the computer name) as a label, 
# and it's nice to have consistent casing.
    if [instance] {
      mutate {   
            capitalize => ["[instance]"]  
      }
  }

# might as well keep the oridginal event data

  if [winlog][event_data] {
      mutate {   
          rename => {"[winlog][event_data]" => "[loki][event_data][event_data]"}    
      }
  }

# Add the category to the header

 mutate {
    replace => {
      #"message" => "level=%{[log][level]} %{[message]}"
     "[loki][header]" => "category=%{[loki][category]} %{[loki][header]}"
      }
  }

# Add the level to the header
# this is last so it's first on the log line.

  if [log][level] {
      mutate {   
        #add_field  => {[loki][event_data][level] => "%{[log][level]}"}              
        replace => {
          #"message" => "level=%{[log][level]} %{[message]}"
          "[loki][header]" => "level=%{[log][level]} %{[loki][header]}"
          }    
      }
  }

  mutate {   
    # set the weight to the top level field so it will be a label
    # this replaces / creates [weight] with the content of [loki][weight]
    replace => {
      "weight" =>"%{[loki][weight]}"
    }
  }

  # this serializes the [loki][event_data] object and overwrites it with the json string
  json_encode {
    source => "[loki][event_data]"
  }

   mutate {   
    # add the built header to the message line and push the message to the next line
    # add the event data json string to the end on a new line
    replace => {
      "message" => "%{[loki][header]} \n%{[message]} \n\tEVENTDATA= %{[loki][event_data]}"
    }
    # move the tags array to a sub field so it won't be a label
    # it's not supposed to become one, but I've had some weird things happen
    rename => { "[tags]" => "[temp][tags]"  } 
  }

}
output {

    # I do this to check that the events look like I expect, then swap to the loki output
  file {
    path => "D:\logstash.log"
  }
  # loki {
  #   url => "http://sweet.loki.goodness/loki/api/v1/push"
  # }
}

@cyriltovena cyriltovena added this to To do in Loki Project Oct 27, 2020
@fifofonix
Copy link

This thread was useful getting a working fluentd->loki setup going for Windows EventLog using in_windows_eventlog2 as @cosmo0920 suggested. This supports ?new? channels.

This is a ruby-based solution but @randomchance this does allow you to specify channels like Microsoft-Windows-DiskDiagnostic/Operational, or even include all channels with a separate option.

When specifying channels in the fluentd config the key is no quoting or escaping which tricked me out initially:

<source>
  @type windows_eventlog2
  @id windows_eventlog2
  # Do not quote "" or escape \ characters in channel names...
  channels application,system,security,HardwareEvents,Windows PowerShell, Microsoft-Windows-Diagnosis-PCW/Operational 
</source>

@randomchance
Copy link

@fifofonix That's awesome! The docs do not support that, so I opened an issue to get them updated.

For anyone curious, the documentation says/implies that the standard four logs are the entire set of possible options.

One or more of {'application', 'system', 'setup', 'security'}.

I'm already pretty invested in my current config, but I'll definitely try out a fluentd configuration if I get a chance. Having more options is definitely better.

Now that promtail supports syslog input, using a log shipper that outputs syslog is also an option.
If anyone tries that, be aware promtail is geared towards getting logs into loki, it's not nearly as flexible and does not allow the crazy level of processing / editing that you can do in logstash, which is probably a good thing.

@danfoxley
Copy link

...and Telegraf now has an input plugin for Windows Event Log

https://github.com/influxdata/telegraf/blob/v1.16.0/plugins/inputs/win_eventlog/README.md

@AMoghrabi
Copy link

AMoghrabi commented Nov 3, 2020

Hey @randomchance, thanks a lot for your logstash example, I appreciate it. I'm currently setting up something similar for my company and I had a question about the [loki][header] portion -- what does that provide you exactly?

My understanding is Loki tags must be key, value pairs, and the values cannot be nested. When you add a new field such as "[loki][level]", I'm not sure where that gets shown or used in Loki. I'm testing it out right now and when viewing the logs in Loki, I can't see it as a label nor in the logs that are coming from logstash.

I'm fairly new to Logstash so I'm probably misinterpreting this entirely. I'd appreciate your guidance. Thanks!

Edit: Nevermind -- I had some time today to go through the entire config and I see it gets amended to message.

@Ulfy
Copy link

Ulfy commented Nov 3, 2020

...and Telegraf now has an input plugin for Windows Event Log

https://github.com/influxdata/telegraf/blob/v1.16.0/plugins/inputs/win_eventlog/README.md

@danfoxley Does Telegraf have a Loki plugin to output w/ labels? I'm still looking for something to ship windows event logs to Loki...

@chancez
Copy link
Contributor

chancez commented Jan 12, 2021

Having the logs be represented in JSON would potentially be better than XML as Loki 2.0 has the ability to do parsing at query time for high cardinality data, using JSON and Regex, but not XML.

@cyriltovena cyriltovena moved this from To do to In progress in Loki Project Jan 27, 2021
@cyriltovena cyriltovena mentioned this issue Jan 27, 2021
2 tasks
@cyriltovena cyriltovena moved this from In progress to Review in progress in Loki Project Feb 1, 2021
Loki Project automation moved this from Review in progress to Done Feb 2, 2021
cyriltovena pushed a commit to cyriltovena/loki that referenced this issue Jun 11, 2021
* Use enums to represent the clients (grafana#1395)

Signed-off-by: ChinYing-Li <chinying.li@mail.utoronto.ca>

Update PR number in CHANGELOG.md

* Use the StorageType enums in the case statements (grafana#1395)

Signed-off-by: ChinYing-Li <chinying.li@mail.utoronto.ca>
@cosmo0920
Copy link
Contributor

cosmo0920 commented Oct 20, 2021

🔔 Hear ye, hear ye! 🔔

Sorry for commenting on the graveyard.
We planned and already implemented using the new Windows EventLog subscribing API on Fluent Bit by this PR: fluent/fluent-bit#4179

This is not required Ruby setup. We can use it by only deploying Fluent Bit executable.
Also, our Fluent Bit implementation supported HashMap style consuming.

@RamblingCookieMonster
Copy link

Having the logs be represented in JSON would potentially be better than XML as Loki 2.0 has the ability to do parsing at query time for high cardinality data, using JSON and Regex, but not XML.

Hiyo!

I might have missed it, but, am I correct in that Promtail will not convert the XML to json, nor has an XML processor been added to Loki, meaning folks essentially just struggle by and parse the message field with regex when working with Windows event logs?

Cheers!

@danfoxley
Copy link

https://grafana.com/docs/agent/latest/static/set-up/install/install-agent-on-windows/

the grafana agent has config for events.

@RamblingCookieMonster
Copy link

RamblingCookieMonster commented Sep 7, 2023

To clarify, I have no issues at all processing and sending Windows events. This question is about the format of the data. Promtail (which I am testing already, and which grafana agent embeds, afaik, so using it would not change this) does not process the actual event data, and simply sends a string of XML for the event_data. This is... not helpful. I will open another issue if this is the case, but looking to confirm I am not missing anything as I've only spent an afternoon looking at this.

An example of the data in Grafana Cloud:

{
  "source": "Service Control Manager",
  "channel": "System",
  "computer": "REDACTED.REDACTED",
  "event_id": 7036,
  "level": 4,
  "levelText": "Information",
  "keywords": "Classic",
  "timeCreated": "2023-09-07T09:22:18.282632000Z",
  "eventRecordID": 651039,
  "execution": {
    "processId": 812,
    "threadId": 4156,
    "processName": "services.exe"
  },
  "event_data": "<Data Name='param1'>Windows Update</Data><Data Name='param2'>running</Data><Binary>PROBABLY-NO-NEED-TO-REDACT</Binary>",
  "message": "The Windows Update service entered the running state."

So! Windows events derive significant value from the structured data, in this case, under event_data. For example, if these were Active Directory logs, that's where you would find who did what to what principal, among other essential data. That this data is (1) not processed into JSON, and (2) not processable via some XML processor in Grafana, is a significant gap in functionality. I could absolutely be mistaken, but it appears that folks are resorting to regex parsing the meant-for-human-eyes message field, rather than working with structured data, which... seems like going backwards, and may preclude some folks, including me, from considering promtail/loki for Windows.

With that said, before I open an issue that boils down to "this is a terrible experience and for windows event log users, please consider these alternatives," I want to make sure I'm not missing something obvious.

Cheers!

@danfoxley
Copy link

@RamblingCookieMonster
Copy link

RamblingCookieMonster commented Sep 11, 2023

That probably would help! To be honest though, there's such a wide variety to what could be included in a field that it's probably not viable outside of one-off solutions to use a parser like that post-ingest, at least, IMHO. Here's a synopsis of what I've found, keeping in mind this is mostly superficial level time/effort, so take with a grain of salt:

What I want: (1) structured windows event data, (2) with parsed event data field names, (3) in a format loki can process

  • promtail (thus presumably grafana agent, unless it does more than promtail alone) sends event_data in an XML string.
  • fluent bit sends parsed event data... in an array that requires referencing schema on what index array maps to what field name (which can change over time and across event IDs - not viable)
  • telegraf sends parsed event data with named fields... in a way that looks parsable by logfmt, but is not

Ultimately, this bit of processing, at least superficially, gets telegraf output into a format Loki / logfmt will be happy with. Haven't tested it much, there might be other characters/sequences that break logfmt, but so far so good:

  [[processors.strings]]
    # Duct taping OSS that isn't designed for Windows, so... escape something that
    # will later become an escape character and confuse logfmt (for Loki queries)
    [[processors.strings.replace]]
      field = "*"
      old = '\'
      new = '\\'
    # Duct taping OSS that isn't designed for Windows, so... handle the many cases
    # where a field will have a double quote (command line, script content, cron/task definition, etc.)
    # Telegraf sends key="value", and key="value with "quotes"" is not valid for logfmt (for Loki queries)
    [[processors.strings.replace]]
      field = "*"
      old = '"'
      new = '\"'
    # Duct taping OSS that isn't designed for Windows, so... handle the few event data field names
    # that will have spaces in them, as logfmt (for Loki queries) will be quite confused without this.
    [[processors.strings.replace]]
      field_key = "*"
      old = ' '
      new = '_'
    [[processors.strings.tagpass]]
      __name = 'win_eventlog'

It's not as batteries-included as something like elastic or splunk agents, but, it appears that this will be viable. I do think it would be valuable for promtail to be able to meet the needs I mentioned, IMHO it's absolute-bare-minimum functionality for logging in a windows environment, but I can see why it's not a thing yet (if ever).

Cheers!

@danfoxley
Copy link

@wardbekker please, for Windows Event Log where EventData comes in as XML

<EventData> <Data Name="errorCode">0x80073d02</Data> <Data Name="updateTitle">9NMPJ99VJBWV-Microsoft.YourPhone</Data> <Data Name="updateGuid">{aa7e4763-ca28-461c-a259-334fb85492b9}</Data> <Data Name="updateRevisionNumber">1</Data> <Data Name="serviceGuid">{855e8a7c-ecb4-4ca3-b045-1dfa50104289}</Data> </EventData>

Is there consideration to using Go encoding/xml package to parse XML?

Considering parsing the ingested XML in LOKI is not available today, what are your thoughts / comments .. use pattern parser? Other?

@danfoxley
Copy link

danfoxley commented Sep 15, 2023

@RamblingCookieMonster
Not to drag this on..but:

Filebeat, I guess, can't go straight to Loki. How about using Filebeat to ?? (file, logstash...) then Loki?
https://www.elastic.co/guide/en/beats/filebeat/current/decode-xml-wineventlog.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/agent help wanted We would love help on these issues. Please come help us! keepalive An issue or PR that will be kept alive and never marked as stale. type/feature Something new we should do
Projects
Development

Successfully merging a pull request may close this issue.