Create a "logstreamer" plugin #102

sparrc · 2015-08-09T22:30:54Z

Inspired by issue #48, create a plugin for aggregating and pushing data from log files, allowing user-defined regex filters.

This would behave in a similar manner to heka's logstreamer plugin: https://hekad.readthedocs.org/en/v0.9.2/pluginconfig/logstreamer.html#logstreamerplugin

/cc @steverweber

skynet · 2015-09-26T16:12:05Z

👍

steverweber · 2015-10-11T16:37:25Z

perhaps something simpler, tail a file.
some code like: https://github.com/hpcloud/tail
and add some processing options like.

count match of a regex...
send raw text that a regex matches

this could be used in many ways! lets say you want to know howmany 404 nginx is returning a second. OR perhaps send raw error.log messages.. The log string lines would be nice in grafana when the table plugin is added.

skynet · 2015-10-11T20:41:02Z

Where do we start?

sparrc · 2015-10-11T20:55:20Z

Tail code looks interesting, but it may even be overkill for this situation. A telegraf plugin being able to handle a constant stream of messages is something that I've implemented in the statsd plugin that has a PR open now #237. So it's possible, but I think for this situation we might be able to just cache the position in the file, and then start reading from that position on the next call to Gather()

There is also a plugin in a PR that does exactly as @steverweber described (counting status codes of a webserver log), but I probably won't be merging it because it's very specific to that use-case and the author has not written unit tests for it, see #176.

I think that more ideally this plugin should be a general use-case where a user can input any regex that will be counted when matched (or output a string as @steverweber suggested). I'm thinking configuration would look something like this:

[logstreamer]
    [[logfile]]
    measurement = "bazbars"
    file = "/var/log/foo.log"
    regex = ".*bar.*|.*baz.*"
    # Type of output. Can be "string" or "counter"
    type = "counter"

    [[logfile]]
    measurement = "webserver_404"
    regex = ".*404.*"
    [...]

timgriffiths · 2015-10-11T22:15:18Z

+1

skynet · 2015-10-11T22:16:43Z

👍

steverweber · 2015-10-12T14:05:38Z

keep in mined the logstreamer should recover if a file is

deleted and recreated.
truncated
is partway through a line write.

perhaps make it so multiple logstreamers are not needed for each metric.
/we only want to read log file once/

[streamer]

    [[file]]
    name = "/var/log/nginx/accept.log"
    delimiter = '\n' # default: '\n'

        [[[measurement]]]
        name = "nginx_requests"
        type = "counter" # counter(default)

        [[[measurement]]]
        name = "nginx_404"
        regex = ".*404.*"


    [[file]]
    name = "/var/log/nginx/error.log"

        [[[measurement]]]
        name = "nginx_errors"

        [[[measurement]]]
        name = "nginx_error_msg"
        regex = "<ignore timestamp> (<msg>.*)"
        type = "string"

steverweber · 2015-10-12T15:29:09Z

perhaps file could even be a network stream... this could open up support for syslog:
file = "udp:\\127.0.0.1:4880"

steverweber · 2015-10-12T16:03:54Z

some of the code in heka might be helpfull for udp input:
https://github.com/mozilla-services/heka/blob/dev/plugins/udp/udp_input.go

fyi: i feel telegraf objectives would be further along if it forked or contributed to: https://github.com/mozilla-services/heka - http://hekad.readthedocs.org/en/v0.10.0b1/
options are good tho :)

ekini · 2015-10-13T18:33:02Z

How about (sample config):

[logstreamer]
dirs = ["/tmp/logs"]
    [[logstreamer.group]]
    mask = "^.*log$"
    rules = ['\s\[(?P<date>\d{1,2}/\w*/\d+:\d+:\d+:\d+ [+-]?\d+)\]\s.*?"\s(?P<code>\d{3})\s(?P<size_value>\d+)']
    name = "nginx"
    date_format = "02/Jan/2006:15:04:05 -0700"

The plugin recursively walks the specified directories and looks for all files that match the "mask".
The it starts tailing them.

There are rules to parse and extract data, where regex named groups are used.
The name "date" is special, so it requires date_format (for golang time.Parse) to be properly parsed and translated to timestamp in metrics.
Names that end with _value are metrics. The rest are tags.
So, for example, after parsing nginx log with the rules above we get:

time            code    dc  group   host                size
2015-10-10T08:22:09.169981459Z  200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:24:19.17656864Z   200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:28:59.828478721Z  200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:39:40.812079491Z  200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:42:14.991151971Z  200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:46:19.562880205Z  200                 us-east-1   nginx   c7.local    753832

steverweber · 2015-10-13T19:34:34Z

I like the idea of reading the datetime from the log, however I think it should be optional. Keep in-mind some time offsetting should be included to maintain the order of the log messages if not using the actual timestamps in the log.

also like the idea of including a tag or field name in the regex/rule.

sparrc · 2015-10-13T20:09:03Z

@ekini I'd like if there was an option to add a straight filename in addition to the "mask"

sparrc · 2015-10-13T20:10:57Z

Also, +1 to date parsing being optional, some people are only going to care about a count within an interval, not a point for every single instance of a regex match.

So you should support that as well, as in my original example above

ekini · 2015-10-13T20:57:37Z

Of course, date is optional, as well as date_format. Timestamps will be time.Now() then.
And yes, maybe walking through directories is an overkill.

There is one more concern. If you want to cache position in a file, and parse it to the end at each Gather, what happens if file is big? Also, what happens if telegraf gets restarted?

My test code constantly reads files, and sends parsed content to a buffered channel, and after call to Gather get as much as possible from the channel within specified timeout interval.

steverweber · 2015-10-13T22:08:52Z

what happens if file is big?

tailing/seeking to end of file is often not a problem when its big...
perhaps you are referring to many writes between the timespan of a gather().
should have some limit... perhaps 1mb for a string buffer. the tail code i linked above uses a "leaky-bucket"

Also, what happens if telegraf gets restarted?

it gets restarted and jumps to the end of the file... We don't care if we loose some data between. Keeping state data is kinda overkill.

sparrc · 2015-10-13T22:25:43Z

There is still a question of what to do if file is truncated. One option would be to make a ServicePIugin that has the Tail code that @steverweber running in the background.

This probably wouldn't be possible until I merge the statsd code

steverweber · 2015-10-13T22:27:56Z

the https://github.com/hpcloud/tail code seems to handle this well.
https://github.com/hpcloud/tail/blob/master/cmd/gotail/gotail.go

t, err := tail.TailFile("/var/log/nginx.log", tail.Config{
    Follow: true,
    ReOpen: true,
    Poll: true})
for line := range t.Lines {
    fmt.Println(line.Text)
}

Config.ReOpen is analogous to tail -F (capital F):

-F      The -F option implies the -f option, but tail will also check to see if the file being followed has been
         renamed or rotated.  The file is closed and reopened when tail detects that the filename being read from
         has a new inode number.  The -F option is ignored if reading from standard input rather than a file.

ref: http://stackoverflow.com/questions/10135738/reading-log-files-as-theyre-updated-in-go

sparrc · 2015-10-28T19:14:20Z

@ekini you mentioned you had some working code for this a couple weeks ago, do you happen to have anything I can take a look at? I'm interested in getting something working for this

ekini · 2015-10-28T19:45:58Z

@sparrc yes, I've got something working at ekini/telegraf@04f4b72
It's based on mentioned above hpcloud/tail.
It workd, but there are plenty of sharp edges.

steverweber · 2015-11-06T15:43:20Z

a little trick i been toying with.

cat > /cron_mon_log <<EOFXX
#!/bin/bash
tail -F -n0 /var/log/syslog | while read line; do
    curl -X POST 'http://mon-dev-1.private.xxxx.ca:8086/write?db=db' --data-binary "log_mon,hostname=$(hostname) value=\'$line\'"
done
EOFXX

echo '@reboot  root  /cron_mon_log' >> /etc/crontab

might need work, but thought it worth the share.

tux-00 · 2016-02-01T21:54:13Z

Maybe more simple with Rsyslog ?

rsyslog.conf:
*.* @127.0.0.1:1514

And listen on 1514 port for example.

ruudboon · 2016-02-06T22:13:07Z

Would be great is this could make it to telegraf. 👍

skynet · 2016-02-07T13:16:06Z

👍

sparrc · 2016-02-19T23:29:04Z

This will most likely start as a telegraf tail plugin that will accept the currently-available data input formats.

Recently came across this log analyzer project that looks like it has a pretty solid format for creating templates and parsing arbitrary logfile formats: https://github.com/trustpath/sequence

Right now it's discontinued, but influxdata could probably fork and take over that project if it turns out to be useful.

closes #102 closes #328

closes influxdata#102 closes influxdata#328

sparrc added the enhancement label Aug 9, 2015

sparrc added the plugin request label Aug 24, 2015

steverweber mentioned this issue Oct 12, 2015

Aggregate and Analyze Syslog #83

Closed

sparrc mentioned this issue Oct 13, 2015

added webservercodes plugin #176

Closed

sparrc removed the enhancement label Jan 20, 2016

sparrc mentioned this issue Feb 4, 2016

Log Warning or Error in Influxdb? #631

Closed

sparrc added a commit that referenced this issue Jun 7, 2016

logparser input plugin

a4beadc

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 7, 2016

logparser input plugin

81c7a60

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 7, 2016

logparser input plugin

1bf3b27

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 7, 2016

logparser input plugin

2e957c2

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 7, 2016

logparser input plugin

1481af9

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 7, 2016

logparser input plugin

ec9c7b7

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 9, 2016

logparser input plugin

cb0616b

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 9, 2016

logparser input plugin

03f6387

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 9, 2016

logparser input plugin

2237c13

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 10, 2016

logparser input plugin

45eb0cc

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 10, 2016

logparser input plugin

23977d7

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 10, 2016

logparser input plugin

00ed5a4

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 14, 2016

logparser input plugin

cf5215b

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 14, 2016

logparser input plugin

e13c7d8

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 14, 2016

logparser input plugin

259f2ba

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 14, 2016

logparser input plugin

eda3cf3

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 14, 2016

logparser input plugin

5b559b7

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 15, 2016

logparser input plugin

dc2157f

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 15, 2016

logparser input plugin

3999d3c

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 16, 2016

logparser input plugin

40140ef

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 17, 2016

logparser input plugin

1d95c3d

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 17, 2016

logparser input plugin

985cf7f

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 20, 2016

logparser input plugin

317d1ca

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 20, 2016

logparser input plugin

f42461e

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 21, 2016

logparser input plugin

0a66765

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 21, 2016

logparser input plugin

cba7882

closes #102 closes #328

sparrc added a commit that referenced this issue Jun 21, 2016

logparser input plugin

cb3c54a

closes #102 closes #328

sparrc closed this as completed in #1320 Jun 21, 2016

chebrolus pushed a commit to chebrolus/telegraf that referenced this issue Jun 24, 2016

logparser input plugin

de59a3c

closes influxdata#102 closes influxdata#328

pradig mentioned this issue Dec 27, 2021

[inputs.vsphere] Unable to find metric name for id 355. Skipping! #10350

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a "logstreamer" plugin #102

Create a "logstreamer" plugin #102

sparrc commented Aug 9, 2015

skynet commented Sep 26, 2015

steverweber commented Oct 11, 2015

skynet commented Oct 11, 2015

sparrc commented Oct 11, 2015

timgriffiths commented Oct 11, 2015

skynet commented Oct 11, 2015

steverweber commented Oct 12, 2015

steverweber commented Oct 12, 2015

steverweber commented Oct 12, 2015

ekini commented Oct 13, 2015

steverweber commented Oct 13, 2015

sparrc commented Oct 13, 2015

sparrc commented Oct 13, 2015

ekini commented Oct 13, 2015

steverweber commented Oct 13, 2015

sparrc commented Oct 13, 2015

steverweber commented Oct 13, 2015

sparrc commented Oct 28, 2015

ekini commented Oct 28, 2015

steverweber commented Nov 6, 2015

tux-00 commented Feb 1, 2016

ruudboon commented Feb 6, 2016

skynet commented Feb 7, 2016

sparrc commented Feb 19, 2016

Create a "logstreamer" plugin #102

Create a "logstreamer" plugin #102

Comments

sparrc commented Aug 9, 2015

skynet commented Sep 26, 2015

steverweber commented Oct 11, 2015

skynet commented Oct 11, 2015

sparrc commented Oct 11, 2015

timgriffiths commented Oct 11, 2015

skynet commented Oct 11, 2015

steverweber commented Oct 12, 2015

steverweber commented Oct 12, 2015

steverweber commented Oct 12, 2015

ekini commented Oct 13, 2015

steverweber commented Oct 13, 2015

sparrc commented Oct 13, 2015

sparrc commented Oct 13, 2015

ekini commented Oct 13, 2015

steverweber commented Oct 13, 2015

sparrc commented Oct 13, 2015

steverweber commented Oct 13, 2015

sparrc commented Oct 28, 2015

ekini commented Oct 28, 2015

steverweber commented Nov 6, 2015

tux-00 commented Feb 1, 2016

ruudboon commented Feb 6, 2016

skynet commented Feb 7, 2016

sparrc commented Feb 19, 2016