Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate function is calculating bad values #4404

Closed
sgrzemski opened this Issue Jul 23, 2018 · 6 comments

Comments

Projects
None yet
3 participants
@sgrzemski
Copy link

sgrzemski commented Jul 23, 2018

Bug Report

What did you do?
I am graphing a query like rate(node_network_receive_bytes_total{device="eth0"}[5m]) in Grafana. However, the rate function seems not to work properly, as the calculated results are incorrect. I am having two group of machines in two subnets, same version on node_exportes on both. All servers are in one job in Prometheus. I will paste the example below.
Node exporters are providing proper values, when you query for node_network_receive_bytes_total{device="eth0"}[5m] the values are just fine, but the average of them is calculated improperly. Our Prometheus is dockerized.

What did you expect to see?
Correct average values for time, as specified in the query.

What did you see instead? Under which circumstances?
Here is an example, please see the metrics from node_exporters:

sgrzemski@proxy101:~$ curl http://localhost:4100/metrics | grep network_transmit_bytes_total
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 59890  100 59890    0     0  4369k      0 --:--:-- --:--:-- --:--:-- 4498k
# HELP node_network_transmit_bytes_total Network device statistic transmit_bytes.
# TYPE node_network_transmit_bytes_total counter
node_network_transmit_bytes_total{device="eth0"} 1.16792632e+08
node_network_transmit_bytes_total{device="lo"} 0
sgrzemski@proxy102:/proc/sys$ curl http://localhost:4100/metrics | grep network_transmit_bytes_total
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 59443  100 59443    0     0  4307k      0 --:--:-- --:--:-- --:--:-- 4465k
# HELP node_network_transmit_bytes_total Network device statistic transmit_bytes.
# TYPE node_network_transmit_bytes_total counter
node_network_transmit_bytes_total{device="eth0"} 1.19068203e+08
node_network_transmit_bytes_total{device="lo"} 0

What is more, the values for a query without rate are also OK:
obraz
obraz

However same query with rate function returns incorrect values:
obraz

Environment

  • System information:

Linux 4.10.0-35-lowlatency x86_64

  • Prometheus version:
/prometheus $ prometheus --version
prometheus, version 2.3.2 (branch: HEAD, revision: 71af5e29e815795e9dd14742ee7725682fa14b7b)
  build user:       root@5258e0bd9cc1
  build date:       20180712-14:02:52
  go version:       go1.10.3
  • Prometheus configuration file:
basic configuration 
  • Logs:
docker logs prometheus_prometheus_1 
level=info ts=2018-07-23T10:49:48.346883767Z caller=main.go:222 msg="Starting Prometheus" version="(version=2.3.2, branch=HEAD, revision=71af5e29e815795e9dd14742ee7725682fa14b7b)"
level=info ts=2018-07-23T10:49:48.346950167Z caller=main.go:223 build_context="(go=go1.10.3, user=root@5258e0bd9cc1, date=20180712-14:02:52)"
level=info ts=2018-07-23T10:49:48.346981667Z caller=main.go:224 host_details="(Linux 4.10.0-35-lowlatency #39~16.04.1-Ubuntu SMP PREEMPT Wed Sep 13 10:06:25 UTC 2017 x86_64 04c5e02577f2 (none))"
level=info ts=2018-07-23T10:49:48.347008767Z caller=main.go:225 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-07-23T10:49:48.348597475Z caller=web.go:415 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-07-23T10:49:48.348545275Z caller=main.go:533 msg="Starting TSDB ..."
level=info ts=2018-07-23T10:49:48.349076377Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531720800000 maxt=1531742400000 ulid=01CJHSP30XPT5GN378ATX08X1F
level=info ts=2018-07-23T10:49:48.349188378Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531742400000 maxt=1531764000000 ulid=01CJJE98C3AGCQ21AE0Y447R8E
level=info ts=2018-07-23T10:49:48.349307078Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531764000000 maxt=1531785600000 ulid=01CJK2WDR8XTF46XC45EAPWW75
level=info ts=2018-07-23T10:49:48.349396479Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531785600000 maxt=1531807200000 ulid=01CJKQFKKDQJMBQ6X54YZ1WCB2
level=info ts=2018-07-23T10:49:48.349529779Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531807200000 maxt=1531828800000 ulid=01CJMC2QZ9HQP7JBB5B97DB5YC
level=info ts=2018-07-23T10:49:48.34960438Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531828800000 maxt=1531850400000 ulid=01CJN0NY3SH983TKZF9SSXV7G3
level=info ts=2018-07-23T10:49:48.34968928Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531850400000 maxt=1531872000000 ulid=01CJNN949Z3SYNVE5SFFMW7FA0
level=info ts=2018-07-23T10:49:48.349813981Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531872000000 maxt=1531893600000 ulid=01CJP9WATF3PHR5MM0ZD8HBG4N
level=info ts=2018-07-23T10:49:48.349913381Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531893600000 maxt=1531915200000 ulid=01CJPYFGFPEV0D9534QR2TFSVF
level=info ts=2018-07-23T10:49:48.350032182Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531915200000 maxt=1531936800000 ulid=01CJQK2PE9T28MA0Q98B02ZQNN
level=info ts=2018-07-23T10:49:48.350113282Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531936800000 maxt=1531958400000 ulid=01CJR7NW3RSJKZV9EMAGHCWKQD
level=info ts=2018-07-23T10:49:48.350194783Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531958400000 maxt=1531980000000 ulid=01CJRW91X1E0END13ZPEY6NM0K
level=info ts=2018-07-23T10:49:48.350341083Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1531980000000 maxt=1532001600000 ulid=01CJSGW87YJF22AQTHYB7EBFSM
level=info ts=2018-07-23T10:49:48.350508684Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532001600000 maxt=1532023200000 ulid=01CJT5FDPAW3NV1HGK9A7JB87Z
level=info ts=2018-07-23T10:49:48.350776885Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532023200000 maxt=1532044800000 ulid=01CJTT2KF8X4GX7TX4SS2E5PTB
level=info ts=2018-07-23T10:49:48.350941986Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532044800000 maxt=1532066400000 ulid=01CJVENSC75022GCAKZEBVY7D1
level=info ts=2018-07-23T10:49:48.351071387Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532066400000 maxt=1532088000000 ulid=01CJW38YSWBRDT6T0HE1F49KS8
level=info ts=2018-07-23T10:49:48.351169387Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532088000000 maxt=1532109600000 ulid=01CJWQW4HRX40Z3GXH6ZXE52YP
level=info ts=2018-07-23T10:49:48.351252988Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532109600000 maxt=1532131200000 ulid=01CJXCFAMPE7D9RVWBVP0A1HYD
level=info ts=2018-07-23T10:49:48.351345988Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532131200000 maxt=1532152800000 ulid=01CJY12GD932YXPH1EVK4WSFRY
level=info ts=2018-07-23T10:49:48.351475989Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532152800000 maxt=1532174400000 ulid=01CJYNNP34PRYW1NT0VT1VFZN9
level=info ts=2018-07-23T10:49:48.351566589Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532174400000 maxt=1532196000000 ulid=01CJZA8VTVBSZFRC0MAM8RZY06
level=info ts=2018-07-23T10:49:48.35163549Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532196000000 maxt=1532217600000 ulid=01CJZYW1NV5N2234YVTX68QNWJ
level=info ts=2018-07-23T10:49:48.35174689Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532217600000 maxt=1532239200000 ulid=01CK0KF7EQ9PEFZ2CB26TGF2VA
level=info ts=2018-07-23T10:49:48.351847191Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532239200000 maxt=1532260800000 ulid=01CK182D9BX2SKP5ZRAVB153G5
level=info ts=2018-07-23T10:49:48.351917691Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532260800000 maxt=1532282400000 ulid=01CK1WNJX7Y0DN2CD0J68HTH6P
level=info ts=2018-07-23T10:49:48.351998591Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532282400000 maxt=1532304000000 ulid=01CK2H8RQ03W82Q88849WC9QX6
level=info ts=2018-07-23T10:49:48.352065192Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532325600000 maxt=1532332800000 ulid=01CK35VT1H9W4YWFS2GPHXEFS9
level=info ts=2018-07-23T10:49:48.352189392Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1532304000000 maxt=1532325600000 ulid=01CK35VYJ5JP6Y5ERQY0ZQEES0
level=warn ts=2018-07-23T10:49:58.391408243Z caller=head.go:320 component=tsdb msg="unknown series references in WAL samples" count=11
level=info ts=2018-07-23T10:49:58.422440993Z caller=main.go:543 msg="TSDB started"
level=info ts=2018-07-23T10:49:58.422553793Z caller=main.go:603 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-07-23T10:49:58.428821424Z caller=main.go:629 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-07-23T10:49:58.428914624Z caller=main.go:502 msg="Server is ready to receive web requests."
@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jul 23, 2018

Thanks for your report. It looks as if this is actually a question about usage and not development because the rate() function is widely used and it is unlikely that thre's an issue with it.

To make your question, and all replies, easier to find, we suggest you move this over to our user mailing list, which you can also search. If you prefer more interactive help, join or our IRC channel, #prometheus on irc.freenode.net. Please be aware that our IRC channel has no logs, is not searchable, and that people might not answer quickly if they are busy or asleep. If in doubt, you should choose the mailing list.

Once your questions have been answered, please add a short line pointing to relevant replies in case anyone stumbles here via a search engine in the future.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 23, 2018

I don't see any evidence of a bug here, this all looks normal from some quick maths.

@sgrzemski

This comment has been minimized.

Copy link
Author

sgrzemski commented Jul 24, 2018

Okay, let me put it in other way. We have deployed servers with same configuration files, same docker version, same node_exporter container and one prometheus version. Some servers are shown okay (like the real amount of bytes transfered, but some are not). Example below for same query as in the original post.
obraz
I see 45 bytes as an impossible value for 5 minutes :D

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jul 24, 2018

@sgrzemski this is a rate over 5 minutes, meaning 45 bytes/second.

@sgrzemski

This comment has been minimized.

Copy link
Author

sgrzemski commented Jul 31, 2018

Hello all, I would like to thank you very much for your help!
It occurred that the exporter has to work on host network only on the docker, no other works in that case.

@sgrzemski sgrzemski closed this Jul 31, 2018

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.