Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

negative values with irate #1287

Closed
f0 opened this Issue Jan 6, 2016 · 21 comments

Comments

Projects
None yet
3 participants
@f0
Copy link

f0 commented Jan 6, 2016

Hi,

i use this query to get the cpu utilisation (from http://www.robustperception.io/understanding-machine-cpu-usage/ )

100 - (avg by (instance) (irate(node_cpu{cluster="test",mode="idle"}[5m])) * 100)

with this query i got negative values in my graphs
negative_values

if i change irate to rate , the negative values are gone

the metric source is a node_exporter

i think negative values should never happen with cpu usage....

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 6, 2016

You've got a subtraction in there, so it's not the irate that's going negative. Can you share the raw values and timestamps around the artifact?

@f0

This comment has been minimized.

Copy link
Author

f0 commented Jan 6, 2016

@brian-brazil sure, whats the best way to do this?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 6, 2016

node_cpu{cluster="test",mode="idle"}[5m] for the time period in question.

@f0

This comment has been minimized.

Copy link
Author

f0 commented Jan 6, 2016

@brian-brazil
negative-values-1

i also try to get the values from the api

@f0

This comment has been minimized.

Copy link
Author

f0 commented Jan 6, 2016

raw_data.zip

@brian-brazil here are the api responce for the specific time frame

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 6, 2016

There's something very odd in the data (all values exactly 10s apart):

5595210.02
5595224.21  + 1.4/s
5595238.4   + 1.4/s
5595238.4   + 0/s

Somehow every 3rd scrape is getting the same value, and it's increase is in the first two of the triplet.

Are you using a node exporter that includes prometheus/node_exporter#177 ?

Why do some of the results not have an instance label?

@f0

This comment has been minimized.

Copy link
Author

f0 commented Jan 6, 2016

@brian-brazil i removed the instance label (policy...), no this commit is not included i use the last released version 0.12.0rc1

hm 10s, i useed step=10s in the api call, without i got an error

this is the call

curl -g 'http://10.61.112.227:9090/api/v1/query_range?query=node_cpu{cluster="test",mode="idle"}&start=2016-01-03T16:10:00.781Z&end=2016-01-03T16:15:00.781Z&step=10s' > raw_values.json

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 6, 2016

I removed the instance label

That's bad practice, a given metric should all have the same set of labels and may be causing some of your problems.

i used step=10s in the api call

Ah, please use the query endpoint as we want the raw data.

@f0

This comment has been minimized.

Copy link
Author

f0 commented Jan 6, 2016

i only removed the instance label in the provided data.

hm if i do a query against the query endpoint, how can i specify a timeframe?

@f0

This comment has been minimized.

Copy link
Author

f0 commented Jan 6, 2016

@brian-brazil
ok here are the values with the query endpoint for the specific time, this time i do not removed any labels
raw_data2.zip

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 6, 2016

I need node_cpu{cluster="test",mode="idle"}[5m], you seem to have given me node_cpu{cluster="test",mode="idle"}

@f0

This comment has been minimized.

Copy link
Author

f0 commented Jan 6, 2016

raw_data3.zip
sorry, here is with [5m]

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 6, 2016

Can you give me the same, but a minute later? I want to get some data points after the problem.

scrape_duration_seconds for the node exporters around that time would also help.

@f0

This comment has been minimized.

Copy link
Author

f0 commented Jan 6, 2016

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 6, 2016

I think what happened was that your Prometheus server got paused or overloaded for ~5s, and that messed up the timestamps.

@f0

This comment has been minimized.

Copy link
Author

f0 commented Jan 6, 2016

@brian-brazil ok if this happend , is the impact correct? I mean are negative values the correct answer for this?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 6, 2016

The data that we ended up with was that there was 19s of idle time in 15s, so a negative answer is what comes out. There's always race conditions that can cause oddness (there's a few smaller ones in your data too), when things get overloaded there's not much we can do I'm afraid.

@f0

This comment has been minimized.

Copy link
Author

f0 commented Jan 6, 2016

hm i do not thin the prometheus system (rkt container) was overloaded, the cpu usage was 10% and more than 50GB Memory are free, not much disk activity in this timeframe

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Jan 6, 2016

Can you check the prometheus server logs? Also what were the results of the up timeseries for that job? Maybe prometheus wasn't able to scrape the targets?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jan 6, 2016

The timestamp consistency indicates that there were no scrape failures.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.