Join GitHub today
network.py does not handle interface resets properly #663
The data presented in /proc/net/dev for tx/rx bytes are counters, not rates.
The code in https://github.com/python-diamond/Diamond/blob/master/src/collectors/network/network.py#L116 tries to handle this, but does things wrongly.
The code present handles overflows by substracting 2^64. It does not take into account interface resets, which would also create negative deltas, but not with a step size of 2^64.
Consequently, whenever an interface resets, the data sent to graphite is incorrectly mangled and produces petabyte sized data rate peaks. They are triggering alerts and mangle graph scaling.
It would in fact be better to either report the counter value unmangled and use nonNegativeDerivative() in graphite to handle this, or duplicate the code from there in diamond, as this code handles resets to 0 as well as overflows (which, in 64 bit counters, generally hardly happen).
This code will first check if a maxValue is present (can be None). If a downstep is observed, a value of None is produced (i.e. the measurement is marked as invalid). That's better than logging Petabytes.