grok_exporter v 0.2.6 resident memory rising #55

raokrutarth · 2019-03-14T19:08:19Z

I have a fairly frequently updated log file that I am using grok exporter to count specific log entries. I see that the resident memory usage by the exporter is quite high and growing. This happens 3/4 of the time in my env.

Graph:

The config is:

global:
  config_version: 2
grok:
  additional_patterns:
  - BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
  - DATA .*?
  - NUMBER (?:%{BASE10NUM})
  - DAEMON [a-zA-Z_\-]+
  - NAELOGLEVEL LOG_[\S+]
  - MESSAGE [\S\s\S]?*
  - YEAR (?>\d\d){1,2}
  - HOUR (?:2[0123]|[01]?[0-9])
  - MINUTE (?:[0-5][0-9])
  - SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
  - TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
  - MONTHNUM (?:0?[1-9]|1[0-2])
  - MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
  - ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
  - TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
  - LEVEL (LOG_)?([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|DBG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr|ERR?(?:OR)?|[Cc](rit)+(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
input:
  path: /var/log/messages
  poll_interval_seconds: 5
  type: file
metrics:
- help: all log entries seperated by daemon
  labels:
    daemon: '{{.daemon}}'
  match: '%{TIMESTAMP_ISO8601:time} %{DAEMON:daemon}.*%{LEVEL:level}.*'
  name: benchmark_all_daemon_log_entry_counts
  type: counter
- help: count of panic entries in the log by go daemons
  labels:
    daemon: '{{.daemon}}'
  match: '%{TIMESTAMP_ISO8601:time} %{DAEMON:daemon}.*panic.*'
  name: benchmark_go_daemon_log_panic_counts
  type: counter

[other confidential counters]

The text was updated successfully, but these errors were encountered:

fstab · 2019-03-19T22:12:13Z

This is a bug. Will investigate where this comes from. Thanks for reporting.

fstab · 2019-03-25T21:42:40Z

Found a memory leak in gsub. Pushed a fix and will make a bugfix release soon. Thanks for reporting!
Are you actually using gsub (in your [other confidential counters] config) or is there another memory leak I am not aware of?

raokrutarth · 2019-04-06T09:23:21Z

I'm using summary and gauge metrics. Do those call gsub in any way?

fstab · 2019-04-08T21:52:31Z

I released a bugfix release 0.2.7. Could you check if this solves the issue?

raokrutarth · 2019-04-10T17:47:40Z

I used the 0.2.7 release binary in my env but I am still seeing the leak. Currently, a bash script restarts the exporter every 1.5 hrs.

fstab · 2019-04-11T21:19:25Z

There's a buffer in grok_exporter where new log lines are cached before they are processed by the matchers. The load of this buffer can be monitored with the grok_exporter_line_buffer_peak_load metric. Could you monitor this metric and check if this is constantly growing? This would help me figure out where the memory leak happens.

raokrutarth · 2019-04-16T19:22:32Z

raokrutarth · 2019-04-16T19:25:00Z

I also noticed a different trend over a longer run (~2.5 days), the peaks are not as large later on.

fstab · 2019-04-16T20:18:13Z

Thanks for the data. If I see the colors correctly the grok_exporter_line_buffer_peak_load is constantly increasing. That means log lines are written faster than grok_exporter processes them. The buffer with unprocessed log lines keeps growing, and grok_exporter eventually runs out of memory.

One thing you could do is to figure out why processing takes so long. Sometimes there is a single regular expression taking most of the processing time. The metric grok_exporter_lines_processing_time_microseconds_total gives you the total processing time in microseconds per metric. With a Prometheus query like rate(grok_exporter_lines_processing_time_microseconds_total[5m]) you can evaluate that in the time window of the last 5 minutes.

Could you run that query to see if there is any metric that takes significantly longer than the others? Maybe we can optimize the regular expression for that metric to make processing faster.

raokrutarth · 2019-04-16T20:53:51Z

I can take out all_daemon_log_entry_counts but benchmark_ovsdb_logs are needed.

The metric is
- help: ovsdb-server log entries labels: level: '{{.level}}' match: '%{TIMESTAMP_ISO8601:time} ovsdb-server(\[%{NUMBER}\])?: .*%{LEVEL:level}' name: benchmark_ovsdb_logs type: counter

The frequency of this log is quite high, any chance this can be optimized?

fstab · 2019-04-17T20:29:14Z

It doesn't look too bad. Some things you can try:

as far as I can see you only use level as a label, so you could replace %{TIMESTAMP_ISO8601:time} with %{TIMESTAMP_ISO8601} or even leave it out entirely.
(\[%{NUMBER}\])?: is optional and you don't use it, so you could remove it
It might be faster to replace .* with .*? depending on if the level is more at the end of the line or the beginning of the line.

raokrutarth · 2019-04-17T22:14:09Z

An optional flag like --max-rss=1G for the exporter could help. Once the cap is reached, it drops older items for new ones. Just a suggestion.

fstab · 2019-04-18T21:43:08Z

I didn't add a buffer limit so far because dropping log lines makes grok_exporter unreliable, you might miss fatal log messages.

On the other hand, if you have to restart grok_exporter you also might miss messages. So an optional flag is probably the lesser evil. I will look into it.

Meanwhile I added a log message, so if log messages are written faster than they are processed grok_exporter will print a warning to the console.

raokrutarth · 2019-04-19T18:10:54Z

Thanks for taking a look. I am not really parsing the log messages from grok_exporter because my env can't take any actions regardless of what it says. Also noticed the memory spikes are progressing faster. Unable to verify if there have been changes in the number of log messages but I thought the data might be useful.

fstab · 2019-04-28T21:05:49Z

I started implementing a limit for the number of lines in the line buffer. Specifying a number of lines is not as convenient as specifying a max amount of memory, but it's easier to implement and better than running out of memory. I will comment here when it's done.

fstab · 2019-05-01T20:03:50Z

I implemented a config option for the input section in the configuration file:

input:
    type: file
    path: /var/log/messages
    max_lines_in_buffer: 100000

The example would drop the buffer if it reaches 100000 log lines. This is not good, because you loose 100000 log lines in monitoring, but it's better than running out of memory and crashing.

You need to build grok_exporter from source code to get this feature. It will be part of the next release.

raokrutarth · 2019-05-03T08:04:10Z

Thanks!

Does the release script still need go version < 1.12?

If so, I'll to wait for the release since I'm using go version 1.12 and haven't figured out using 2 versions of go simultaneously in my environment.

fstab · 2019-05-03T20:42:14Z

The release script works with go 1.12.

raokrutarth · 2019-05-07T17:03:13Z

Can confirm the issue got resolved with the new fixes. Graph shows usage after the binary was swapped at 19:00 hours.

raokrutarth · 2019-05-08T04:36:29Z

I started to see a few counters are not getting incremented even though tail -f on the log file show a fairly fast rate of increase in these messages. I'm using https://grokconstructor.appspot.com/do/match to test my filter and seeing the whole line match.

fstab · 2019-05-10T07:56:03Z

@raokrutarth could you provide an example of a log line and the counter? I can have a look why it's not incremented.

raokrutarth · 2019-05-11T22:57:34Z

re-posted comment on #58. memory issues are resolved with 0.2.8 and max_lines_in_buffer in config.

fstab added a commit that referenced this issue Mar 25, 2019

#55 fix memory leak in gsub

ebbf1db

fstab added a commit that referenced this issue Apr 18, 2019

#55 log warning when buffer runs full

fc47ec1

fstab mentioned this issue Apr 27, 2019

Fix race conditions reported by go test -race #57

Open

fstab added a commit that referenced this issue Apr 28, 2019

#55 prepare max number of lines in lineBuffer

879e3bc

fstab added a commit that referenced this issue May 1, 2019

#55 add max_lines_in_buffer option for input config

6f7d651

fstab added a commit that referenced this issue May 1, 2019

#55 test max_lines_in_buffer

f6cf484

raokrutarth closed this as completed May 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grok_exporter v 0.2.6 resident memory rising #55

grok_exporter v 0.2.6 resident memory rising #55

raokrutarth commented Mar 14, 2019 •

edited

Loading

fstab commented Mar 19, 2019

fstab commented Mar 25, 2019

raokrutarth commented Apr 6, 2019

fstab commented Apr 8, 2019

raokrutarth commented Apr 10, 2019

fstab commented Apr 11, 2019 •

edited

Loading

raokrutarth commented Apr 16, 2019

raokrutarth commented Apr 16, 2019

fstab commented Apr 16, 2019

raokrutarth commented Apr 16, 2019 •

edited

Loading

fstab commented Apr 17, 2019

raokrutarth commented Apr 17, 2019

fstab commented Apr 18, 2019

raokrutarth commented Apr 19, 2019 •

edited

Loading

fstab commented Apr 28, 2019

fstab commented May 1, 2019 •

edited

Loading

raokrutarth commented May 3, 2019

fstab commented May 3, 2019

raokrutarth commented May 7, 2019

raokrutarth commented May 8, 2019

fstab commented May 10, 2019

raokrutarth commented May 11, 2019 •

edited

Loading

grok_exporter v 0.2.6 resident memory rising #55

grok_exporter v 0.2.6 resident memory rising #55

Comments

raokrutarth commented Mar 14, 2019 • edited Loading

fstab commented Mar 19, 2019

fstab commented Mar 25, 2019

raokrutarth commented Apr 6, 2019

fstab commented Apr 8, 2019

raokrutarth commented Apr 10, 2019

fstab commented Apr 11, 2019 • edited Loading

raokrutarth commented Apr 16, 2019

raokrutarth commented Apr 16, 2019

fstab commented Apr 16, 2019

raokrutarth commented Apr 16, 2019 • edited Loading

fstab commented Apr 17, 2019

raokrutarth commented Apr 17, 2019

fstab commented Apr 18, 2019

raokrutarth commented Apr 19, 2019 • edited Loading

fstab commented Apr 28, 2019

fstab commented May 1, 2019 • edited Loading

raokrutarth commented May 3, 2019

fstab commented May 3, 2019

raokrutarth commented May 7, 2019

raokrutarth commented May 8, 2019

fstab commented May 10, 2019

raokrutarth commented May 11, 2019 • edited Loading

raokrutarth commented Mar 14, 2019 •

edited

Loading

fstab commented Apr 11, 2019 •

edited

Loading

raokrutarth commented Apr 16, 2019 •

edited

Loading

raokrutarth commented Apr 19, 2019 •

edited

Loading

fstab commented May 1, 2019 •

edited

Loading

raokrutarth commented May 11, 2019 •

edited

Loading