Logparser plugin don't process new lines after telegraf configuration reload. #3573

Isonami · 2017-12-12T11:59:13Z

Bug report

The newly started telegraf works well, but after configuration reload via SIGHUP, new lines in log aren't parsed or sent to output plugin. Other metrics aren't affected.

Relevant telegraf.conf:

/etc/telegraf/telegraf.conf

[global_tags]

[agent]
  hostname = "hostname"
  omit_hostname = false
  interval = "30s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "30s"
  flush_jitter = "0s"
  logfile = ""
  debug = false
  quiet = false

#
# OUTPUTS:
#
[[outputs.influxdb]]
  database = "telegraf"
  insecure_skip_verify = true
  urls = ["https://influx.hostname"]

#
# INPUTS:
#
[[inputs.cpu]]
  percpu = true
  totalcpu = true
[[inputs.disk]]
[[inputs.io]]
[[inputs.mem]]
[[inputs.net]]
[[inputs.swap]]
[[inputs.system]]

/etc/telegraf/telegraf.d/logparser.conf

[[inputs.logparser]]
  files = ["/var/log/nginx/web-*-access.log"]
  from_beginning = false
[inputs.logparser.grok]
  measurement = "nginx_access_log"
  patterns = ["\\[[0-9a-z-]+\\] %{CLIENT:client_ip} - %{NOTSPACE} \\[%{HTTPDATE:timestamp:ts-httpd}\\] \"(?:%{WORD:verb:tag} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version:float})?|%{DATA})\" %{NUMBER:resp_code:tag} (?:%{NUMBER:resp_bytes:int}|-) \"%{NOTSPACE:referer}\" \"[^\"]+\" \"(?:%{CLIENT:forwarded_for}|-)\" \"(?:%{CLIENT:real_ip}|-)\" %{NOTSPACE} rt=%{NUMBER:request_time:float} uht=(?:%{NUMBER:upstream_header_time:float}|-) urt=(?:%{NUMBER:upstream_response_time:float}|-)"]
  timezone = ""

System info:

Centos 7.2
Telegraf 1.4.3,1.4.5

Steps to reproduce:

Start telegraf
Write new line to log file: echo '[-] 192.168.1.1 - - [12/Dec/2017:14:00:47 +0300] "GET /app HTTP/1.1" 501 178 "-" "axios/0.16.2" "-" "127.0.0.1" (.) rt=1.001 uht=1.001 urt=1.001' >> /var/log/nginx/web-api-access.log
Wait 30 seconds and select last value from influx select last("request_time") from "nginx_access_log" limit 1 must be 1.001
Reload telegraf: kill -HUP $PID
Write new line: echo '[-] 192.168.1.1 - - [12/Dec/2017:14:05:47 +0300] "GET /app HTTP/1.1" 502 178 "-" "axios/0.16.2" "-" "127.0.0.1" (.) rt=2.002 uht=2.002 urt=2.002' >> /var/log/nginx/web-api-access.log
Wait 30 seconds and select again. New value must be 2.002, but 1.001 is still here.

Additional info:

Telegraf log output:

Dec 12 14:46:33 web2 [1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
Dec 12 14:46:33 web2 systemd[1]: Starting The plugin-driven server agent for reporting metrics into InfluxDB...
Dec 12 14:46:33 web2 telegraf[24150]: 2017-12-12T11:46:33Z I! Starting Telegraf v1.4.5
Dec 12 14:46:33 web2 telegraf[24150]: 2017-12-12T11:46:33Z I! Loaded outputs: influxdb
Dec 12 14:46:33 web2 telegraf[24150]: 2017-12-12T11:46:33Z I! Loaded inputs: inputs.system inputs.cpu inputs.disk inputs.diskio inputs.mem inputs.net inputs.swap inputs.logparser
Dec 12 14:46:33 web2 telegraf[24150]: 2017-12-12T11:46:33Z I! Tags enabled: host=web2
Dec 12 14:46:33 web2 telegraf[24150]: 2017-12-12T11:46:33Z I! Agent Config: Interval:30s, Quiet:false, Hostname:"web2", Flush Interval:30s
Dec 12 14:51:41 web2 telegraf[24150]: 2017-12-12T11:51:41Z I! Reloading Telegraf config
Dec 12 14:51:41 web2 telegraf[24150]: 2017-12-12T11:51:41Z I! Hang on, flushing any cached metrics before shutdown
Dec 12 14:51:41 web2 systemd[1]: Reloaded The plugin-driven server agent for reporting metrics into InfluxDB.
Dec 12 14:51:41 web2 telegraf[24150]: 2017-12-12T11:51:41Z I! Starting Telegraf v1.4.5
Dec 12 14:51:41 web2 telegraf[24150]: 2017-12-12T11:51:41Z I! Loaded outputs: influxdb
Dec 12 14:51:41 web2 telegraf[24150]: 2017-12-12T11:51:41Z I! Loaded inputs: inputs.net inputs.swap inputs.system inputs.cpu inputs.disk inputs.diskio inputs.mem inputs.logparser
Dec 12 14:51:41 web2 telegraf[24150]: 2017-12-12T11:51:41Z I! Tags enabled: host=web2
Dec 12 14:51:41 web2 telegraf[24150]: 2017-12-12T11:51:41Z I! Agent Config: Interval:30s, Quiet:false, Hostname:"web2", Flush Interval:30s

The text was updated successfully, but these errors were encountered:

bolek2000 · 2018-04-03T10:31:00Z

To work around bug #3906 I do a service telegraf reload in my preporcessing script (that works around bug #3613 ). This leads to the effect that my 2nd logparser stops working.
I have one logparser which uses from_beginning = true and reads a preprocessed file. This one still works after the SIGHUP.
But the second logparser that reads from the live /var/log/nginx/access.log in conjunction with the basicstats aggregator stops working after SIGHUP. Here I use from_beginning = false.
Bothh logparsers use inotify.

danielnelson · 2019-07-02T02:35:53Z

Closing based on #5347 (comment)

sgtsquiggs · 2019-07-02T22:29:15Z

This issue is still valid :(

I have found the issue though. In hpcloud/tail (influxdata/tail) there is a problem:

In this function, it is possible for shared.watchNums[fname] to go negative.

When debugging, I found that it would decrement once because of tomb.Tomb's Dying() triggering:

github.com/influxdata/telegraf/vendor/github.com/influxdata/tail/watch.remove at inotify_tracker.go:102
github.com/influxdata/telegraf/vendor/github.com/influxdata/tail/watch.RemoveWatch at inotify_tracker.go:87
github.com/influxdata/telegraf/vendor/github.com/influxdata/tail/watch.(*InotifyFileWatcher).ChangeEvents.func1 at inotify.go:94
runtime.goexit at asm_amd64.s:1337
 - Async stack trace
github.com/influxdata/telegraf/vendor/github.com/influxdata/tail/watch.(*InotifyFileWatcher).ChangeEvents at inotify.go:77

and again due to tail.Cleanup() being called

github.com/influxdata/telegraf/vendor/github.com/influxdata/tail/watch.remove at inotify_tracker.go:102
github.com/influxdata/telegraf/vendor/github.com/influxdata/tail/watch.RemoveWatch at inotify_tracker.go:87
github.com/influxdata/telegraf/vendor/github.com/influxdata/tail/watch.Cleanup at inotify_tracker.go:129
github.com/influxdata/telegraf/vendor/github.com/influxdata/tail.(*Tail).Cleanup at tail.go:453
github.com/influxdata/telegraf/plugins/inputs/tail.(*Tail).Stop at tail.go:219
github.com/influxdata/telegraf/agent.(*Agent).stopServiceInputs at agent.go:651
github.com/influxdata/telegraf/agent.(*Agent).Run.func1 at agent.go:75
runtime.goexit at asm_amd64.s:1337
 - Async stack trace
github.com/influxdata/telegraf/agent.(*Agent).Run at agent.go:66

* Fixes influxdata#3522 - Not able to read rotated log file without missing lines * Fixes influxdata#3573 - logparser/tail plugin stop after reload (removes unnecessary tailer.Cleanup() calls)

danielnelson added area/tail bug unexpected problem or unintended behavior labels Dec 12, 2017

dugwood mentioned this issue Apr 22, 2018

Tail plugin still tails previously deleted files #3965

Closed

bolek2000 mentioned this issue Aug 30, 2018

'systemctl restart telegraf' failing (related to logparser ?) #4610

Closed

sgtsquiggs mentioned this issue Jan 25, 2019

Update tail and logparser plugins to use a different tail library #5347

Closed

3 tasks

danielnelson closed this as completed Jul 2, 2019

danielnelson reopened this Jul 3, 2019

This was referenced Jul 3, 2019

prevent race condition decrementing watch count to negative influxdata/tail#6

Closed

watch_method = "inotify" doesn't work #5375

Closed

sgtsquiggs mentioned this issue Jul 5, 2019

[tail/logparser] resume from last known offset when reloading #6074

Merged

3 tasks

danielnelson mentioned this issue Jul 8, 2019

Remove tail cleanup call to avoid double decrement #6089

Merged

3 tasks

danielnelson closed this as completed in #6089 Jul 8, 2019

danielnelson added this to the 1.11.2 milestone Jul 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logparser plugin don't process new lines after telegraf configuration reload. #3573

Logparser plugin don't process new lines after telegraf configuration reload. #3573

Isonami commented Dec 12, 2017

bolek2000 commented Apr 3, 2018 •

edited

Loading

danielnelson commented Jul 2, 2019

sgtsquiggs commented Jul 2, 2019

Logparser plugin don't process new lines after telegraf configuration reload. #3573

Logparser plugin don't process new lines after telegraf configuration reload. #3573

Comments

Isonami commented Dec 12, 2017

Bug report

Relevant telegraf.conf:

System info:

Steps to reproduce:

Additional info:

bolek2000 commented Apr 3, 2018 • edited Loading

danielnelson commented Jul 2, 2019

sgtsquiggs commented Jul 2, 2019

bolek2000 commented Apr 3, 2018 •

edited

Loading