Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numbers are crazy in diskstats plugin after reboot #426

Closed
toofishes opened this issue Apr 7, 2015 · 5 comments
Closed

Numbers are crazy in diskstats plugin after reboot #426

toofishes opened this issue Apr 7, 2015 · 5 comments

Comments

@toofishes
Copy link
Contributor

Both the monitoring server and the node are running Arch Linux, x86_64, munin version 2.0.25. This is after a node restart.

You can see pretty obviously in the graph below that things aren't acting right. This hasn't always been the case in the 2.0.X series, but I have noticed it for at least a little while doing this:

diskstats_utilization-day
diskstats_throughput-day
diskstats_iops-day

It looks like the problem is limited to the diskstats plugin? I see issues in these (screenshots of all above):

  • diskstats_iops
  • diskstats_utilization
  • diskstats_throughput

Not showing it:

  • iostat
  • if_, if_err_
  • forks
  • irqstats
@ssm
Copy link
Member

ssm commented Apr 7, 2015

"After a node restart", is "node" the machine running munin-node, or is it the "munin-node" program?

@toofishes
Copy link
Contributor Author

Machine restart. Restarting munin_node only doesn't seem to cause any issues.
I do notice that the diskstats plugin seems to be the only thing keeping data in /var/lib/munin/plugin-state/nobody/ on the host node machine, not sure if that helps or is just a red flag.
Let me know if dumps or anything from the RRD files themselves would be helpful too, or the plugin config, I'm happy to provide whatever needed.

@ssm
Copy link
Member

ssm commented Apr 7, 2015

It looks like a consequence of how the plugin reports its data.

When the machine boots, the disk IO counter resets. The plugin then reports wrong numbers to the munin master. This is visible as large spikes in the graphs.

A fix for this plugin may be to:

  • Change the RRD Data Source Type from GAUGE to DERIVE
  • Report the numbers directly instead of calculating a delta from the last number stored in the state file
  • Remove the use of the state file from the plugin entirely, unless something else in there needs it.

@ssm ssm added this to the 2.0.26 milestone Apr 7, 2015
@ssm ssm added the [type] bug label Apr 7, 2015
@toofishes
Copy link
Contributor Author

I'm not totally sure on this diagnosis, but commit 0d7505f, and possibly also 768894f, look suspicious- I don't remember diskstats always doing this, and that commit lies between the 2.0.21 and 2.0.22 releases which weren't all that long ago. It also roughly matches when the spikes start showing up on this yearly graph:

sda-year

Let me know what I can do to help; I'm able to dive in a bit, I can definitely test patches, but am no munin internals expert.

@steveschnepp
Copy link
Member

Loosely related, but i created issue munin-monitoring/munin-c#29 to rewrite that plugin.

Coupled with issue munin-monitoring/munin-c#30, it might be very interesting.

mittyorz added a commit to mittyorz/munin-munin that referenced this issue Apr 26, 2015
 * munin-monitoring#426

 * check /proc/uptime to detect system reboot
 * use uptime second instead of interval if uptime < interval
 * reset all previous status values to zero on reboot
mittyorz added a commit to mittyorz/munin-munin that referenced this issue Apr 26, 2015
… reboot)

diskstats ver 2.0.22 and later gives weird numbers on some entries after system reboot

 - how I fix
  - check /proc/uptime to detect system reboot
  - use uptime second instead of interval if uptime < interval
  - reset all previous status values to zero if uptime < interval

fundamental solution should be as munin-monitoring#426 (comment)
but it might be significant rewrite
@sumpfralle sumpfralle removed this from the 2.0.26 milestone Mar 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants