New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network.py does not handle interface resets properly #663

Open
isotopp opened this Issue Jul 20, 2017 · 1 comment

Comments

Projects
None yet
1 participant
@isotopp

isotopp commented Jul 20, 2017

The data presented in /proc/net/dev for tx/rx bytes are counters, not rates.

The code in https://github.com/python-diamond/Diamond/blob/master/src/collectors/network/network.py#L116 tries to handle this, but does things wrongly.

The code present handles overflows by substracting 2^64. It does not take into account interface resets, which would also create negative deltas, but not with a step size of 2^64.

Consequently, whenever an interface resets, the data sent to graphite is incorrectly mangled and produces petabyte sized data rate peaks. They are triggering alerts and mangle graph scaling.

It would in fact be better to either report the counter value unmangled and use nonNegativeDerivative() in graphite to handle this, or duplicate the code from there in diamond, as this code handles resets to 0 as well as overflows (which, in 64 bit counters, generally hardly happen).

@isotopp

This comment has been minimized.

Show comment
Hide comment
@isotopp

isotopp Jul 20, 2017

Graphite code in https://github.com/graphite-project/graphite-web/blob/master/webapp/graphite/render/functions.py#L1653

This code will first check if a maxValue is present (can be None). If a downstep is observed, a value of None is produced (i.e. the measurement is marked as invalid). That's better than logging Petabytes.

isotopp commented Jul 20, 2017

Graphite code in https://github.com/graphite-project/graphite-web/blob/master/webapp/graphite/render/functions.py#L1653

This code will first check if a maxValue is present (can be None). If a downstep is observed, a value of None is produced (i.e. the measurement is marked as invalid). That's better than logging Petabytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment