Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a "logstreamer" plugin #102

Closed
sparrc opened this issue Aug 9, 2015 · 27 comments · Fixed by #1320
Closed

Create a "logstreamer" plugin #102

sparrc opened this issue Aug 9, 2015 · 27 comments · Fixed by #1320

Comments

@sparrc
Copy link
Contributor

sparrc commented Aug 9, 2015

Inspired by issue #48, create a plugin for aggregating and pushing data from log files, allowing user-defined regex filters.

This would behave in a similar manner to heka's logstreamer plugin: https://hekad.readthedocs.org/en/v0.9.2/pluginconfig/logstreamer.html#logstreamerplugin

/cc @steverweber

@skynet
Copy link

skynet commented Sep 26, 2015

👍

@steverweber
Copy link

perhaps something simpler, tail a file.
some code like: https://github.com/hpcloud/tail
and add some processing options like.

  • count match of a regex...
  • send raw text that a regex matches

this could be used in many ways! lets say you want to know howmany 404 nginx is returning a second. OR perhaps send raw error.log messages.. The log string lines would be nice in grafana when the table plugin is added.

@skynet
Copy link

skynet commented Oct 11, 2015

Where do we start?

@sparrc
Copy link
Contributor Author

sparrc commented Oct 11, 2015

Tail code looks interesting, but it may even be overkill for this situation. A telegraf plugin being able to handle a constant stream of messages is something that I've implemented in the statsd plugin that has a PR open now #237. So it's possible, but I think for this situation we might be able to just cache the position in the file, and then start reading from that position on the next call to Gather()

There is also a plugin in a PR that does exactly as @steverweber described (counting status codes of a webserver log), but I probably won't be merging it because it's very specific to that use-case and the author has not written unit tests for it, see #176.

I think that more ideally this plugin should be a general use-case where a user can input any regex that will be counted when matched (or output a string as @steverweber suggested). I'm thinking configuration would look something like this:

[logstreamer]
    [[logfile]]
    measurement = "bazbars"
    file = "/var/log/foo.log"
    regex = ".*bar.*|.*baz.*"
    # Type of output. Can be "string" or "counter"
    type = "counter"

    [[logfile]]
    measurement = "webserver_404"
    regex = ".*404.*"
    [...]

@timgriffiths
Copy link

+1

1 similar comment
@skynet
Copy link

skynet commented Oct 11, 2015

👍

@steverweber
Copy link

keep in mined the logstreamer should recover if a file is

  • deleted and recreated.
  • truncated
  • is partway through a line write.

perhaps make it so multiple logstreamers are not needed for each metric.
/we only want to read log file once/

[streamer]

    [[file]]
    name = "/var/log/nginx/accept.log"
    delimiter = '\n' # default: '\n'

        [[[measurement]]]
        name = "nginx_requests"
        type = "counter" # counter(default)

        [[[measurement]]]
        name = "nginx_404"
        regex = ".*404.*"


    [[file]]
    name = "/var/log/nginx/error.log"

        [[[measurement]]]
        name = "nginx_errors"

        [[[measurement]]]
        name = "nginx_error_msg"
        regex = "<ignore timestamp> (<msg>.*)"
        type = "string"

@steverweber
Copy link

perhaps file could even be a network stream... this could open up support for syslog:
file = "udp:\\127.0.0.1:4880"

@steverweber
Copy link

some of the code in heka might be helpfull for udp input:
https://github.com/mozilla-services/heka/blob/dev/plugins/udp/udp_input.go

fyi: i feel telegraf objectives would be further along if it forked or contributed to: https://github.com/mozilla-services/heka - http://hekad.readthedocs.org/en/v0.10.0b1/
options are good tho :)

@ekini
Copy link
Contributor

ekini commented Oct 13, 2015

How about (sample config):

[logstreamer]
dirs = ["/tmp/logs"]
    [[logstreamer.group]]
    mask = "^.*log$"
    rules = ['\s\[(?P<date>\d{1,2}/\w*/\d+:\d+:\d+:\d+ [+-]?\d+)\]\s.*?"\s(?P<code>\d{3})\s(?P<size_value>\d+)']
    name = "nginx"
    date_format = "02/Jan/2006:15:04:05 -0700"

The plugin recursively walks the specified directories and looks for all files that match the "mask".
The it starts tailing them.

There are rules to parse and extract data, where regex named groups are used.
The name "date" is special, so it requires date_format (for golang time.Parse) to be properly parsed and translated to timestamp in metrics.
Names that end with _value are metrics. The rest are tags.
So, for example, after parsing nginx log with the rules above we get:

time            code    dc  group   host                size
2015-10-10T08:22:09.169981459Z  200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:24:19.17656864Z   200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:28:59.828478721Z  200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:39:40.812079491Z  200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:42:14.991151971Z  200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:46:19.562880205Z  200                 us-east-1   nginx   c7.local    753832

@steverweber
Copy link

I like the idea of reading the datetime from the log, however I think it should be optional. Keep in-mind some time offsetting should be included to maintain the order of the log messages if not using the actual timestamps in the log.

also like the idea of including a tag or field name in the regex/rule.

@sparrc
Copy link
Contributor Author

sparrc commented Oct 13, 2015

@ekini I'd like if there was an option to add a straight filename in addition to the "mask"

@sparrc
Copy link
Contributor Author

sparrc commented Oct 13, 2015

Also, +1 to date parsing being optional, some people are only going to care about a count within an interval, not a point for every single instance of a regex match.

So you should support that as well, as in my original example above

@ekini
Copy link
Contributor

ekini commented Oct 13, 2015

Of course, date is optional, as well as date_format. Timestamps will be time.Now() then.
And yes, maybe walking through directories is an overkill.

There is one more concern. If you want to cache position in a file, and parse it to the end at each Gather, what happens if file is big? Also, what happens if telegraf gets restarted?

My test code constantly reads files, and sends parsed content to a buffered channel, and after call to Gather get as much as possible from the channel within specified timeout interval.

@steverweber
Copy link

what happens if file is big?

tailing/seeking to end of file is often not a problem when its big...
perhaps you are referring to many writes between the timespan of a gather().
should have some limit... perhaps 1mb for a string buffer. the tail code i linked above uses a "leaky-bucket"

Also, what happens if telegraf gets restarted?

it gets restarted and jumps to the end of the file... We don't care if we loose some data between. Keeping state data is kinda overkill.

@sparrc
Copy link
Contributor Author

sparrc commented Oct 13, 2015

There is still a question of what to do if file is truncated. One option would be to make a ServicePIugin that has the Tail code that @steverweber running in the background.

This probably wouldn't be possible until I merge the statsd code

@steverweber
Copy link

the https://github.com/hpcloud/tail code seems to handle this well.
https://github.com/hpcloud/tail/blob/master/cmd/gotail/gotail.go

t, err := tail.TailFile("/var/log/nginx.log", tail.Config{
    Follow: true,
    ReOpen: true,
    Poll: true})
for line := range t.Lines {
    fmt.Println(line.Text)
}

Config.ReOpen is analogous to tail -F (capital F):

-F      The -F option implies the -f option, but tail will also check to see if the file being followed has been
         renamed or rotated.  The file is closed and reopened when tail detects that the filename being read from
         has a new inode number.  The -F option is ignored if reading from standard input rather than a file.

ref: http://stackoverflow.com/questions/10135738/reading-log-files-as-theyre-updated-in-go

@sparrc
Copy link
Contributor Author

sparrc commented Oct 28, 2015

@ekini you mentioned you had some working code for this a couple weeks ago, do you happen to have anything I can take a look at? I'm interested in getting something working for this

@ekini
Copy link
Contributor

ekini commented Oct 28, 2015

@sparrc yes, I've got something working at ekini/telegraf@04f4b72
It's based on mentioned above hpcloud/tail.
It workd, but there are plenty of sharp edges.

@steverweber
Copy link

a little trick i been toying with.

cat > /cron_mon_log <<EOFXX
#!/bin/bash
tail -F -n0 /var/log/syslog | while read line; do
    curl -X POST 'http://mon-dev-1.private.xxxx.ca:8086/write?db=db' --data-binary "log_mon,hostname=$(hostname) value=\'$line\'"
done
EOFXX

echo '@reboot  root  /cron_mon_log' >> /etc/crontab

might need work, but thought it worth the share.

@tux-00
Copy link

tux-00 commented Feb 1, 2016

Maybe more simple with Rsyslog ?

rsyslog.conf:
*.* @127.0.0.1:1514

And listen on 1514 port for example.

@ruudboon
Copy link

ruudboon commented Feb 6, 2016

Would be great is this could make it to telegraf. 👍

@skynet
Copy link

skynet commented Feb 7, 2016

👍

@sparrc
Copy link
Contributor Author

sparrc commented Feb 19, 2016

This will most likely start as a telegraf tail plugin that will accept the currently-available data input formats.

Recently came across this log analyzer project that looks like it has a pretty solid format for creating templates and parsing arbitrary logfile formats: https://github.com/trustpath/sequence

Right now it's discontinued, but influxdata could probably fork and take over that project if it turns out to be useful.

sparrc added a commit that referenced this issue Jun 7, 2016
sparrc added a commit that referenced this issue Jun 7, 2016
sparrc added a commit that referenced this issue Jun 7, 2016
sparrc added a commit that referenced this issue Jun 7, 2016
sparrc added a commit that referenced this issue Jun 7, 2016
sparrc added a commit that referenced this issue Jun 7, 2016
sparrc added a commit that referenced this issue Jun 9, 2016
sparrc added a commit that referenced this issue Jun 9, 2016
sparrc added a commit that referenced this issue Jun 9, 2016
sparrc added a commit that referenced this issue Jun 10, 2016
sparrc added a commit that referenced this issue Jun 10, 2016
sparrc added a commit that referenced this issue Jun 10, 2016
sparrc added a commit that referenced this issue Jun 14, 2016
sparrc added a commit that referenced this issue Jun 14, 2016
sparrc added a commit that referenced this issue Jun 14, 2016
sparrc added a commit that referenced this issue Jun 14, 2016
sparrc added a commit that referenced this issue Jun 14, 2016
sparrc added a commit that referenced this issue Jun 15, 2016
sparrc added a commit that referenced this issue Jun 15, 2016
sparrc added a commit that referenced this issue Jun 16, 2016
sparrc added a commit that referenced this issue Jun 17, 2016
sparrc added a commit that referenced this issue Jun 17, 2016
sparrc added a commit that referenced this issue Jun 20, 2016
sparrc added a commit that referenced this issue Jun 20, 2016
sparrc added a commit that referenced this issue Jun 21, 2016
sparrc added a commit that referenced this issue Jun 21, 2016
sparrc added a commit that referenced this issue Jun 21, 2016
chebrolus pushed a commit to chebrolus/telegraf that referenced this issue Jun 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants