Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: range vector queries report samples multiple times #3337

Closed
EdSchouten opened this Issue Oct 23, 2017 · 8 comments

Comments

Projects
None yet
2 participants
@EdSchouten
Copy link
Contributor

EdSchouten commented Oct 23, 2017

What did you do? / What did you expect to see?
Set up Prometheus with the SNMP exporter. Ran the following query through the web interface:

ifInUcastPkts{ifIndex="10648"}[2h]

Got the following results:

3548205210 @1508765401.244
3548946632 @1508765701.244
3549646577 @1508766001.266
3550396212 @1508766301.268
3551081575 @1508766601.367
3551730505 @1508766901.244
3552413802 @1508767201.244
3553181994 @1508767501.244
3554009421 @1508767801.254
3554675942 @1508768101.244
3555286317 @1508768401.244
3555922170 @1508768701.244
3556718950 @1508769001.244
3547556798 @1508765101.244
3548205210 @1508765401.244
3548946632 @1508765701.244
3549646577 @1508766001.266
3550396212 @1508766301.268
3551081575 @1508766601.367
3551730505 @1508766901.244
3552413802 @1508767201.244
3553181994 @1508767501.244
3554009421 @1508767801.254
3554675942 @1508768101.244
3555286317 @1508768401.244
3555922170 @1508768701.244
3556718950 @1508769001.244
3557449169 @1508769301.244

Notice that there are only 15 unique entries in the output; some of them are duplicated. I would have expected this to return

3547556798 @1508765101.244
3548205210 @1508765401.244
3548946632 @1508765701.244
3549646577 @1508766001.266
3550396212 @1508766301.268
3551081575 @1508766601.367
3551730505 @1508766901.244
3552413802 @1508767201.244
3553181994 @1508767501.244
3554009421 @1508767801.254
3554675942 @1508768101.244
3555286317 @1508768401.244
3555922170 @1508768701.244
3556718950 @1508769001.244
3557449169 @1508769301.244

At the same time, I notice that more elaborate queries like rate(ifInUcastPkts{ifIndex="10648"}[30h]) don't work as expected. Every couple of hours, we see a huge jump in the graph that shouldn't be there:

screen shot 2017-10-23 at 17 27 40

If I do the math by hand on the raw sample data, it should be a relatively flat graph with a value of ~2000; not peaks of millions.

Environment

  • System information:

Linux 3.16.0-4-amd64 x86_64

  • Prometheus version:

The official Prometheus 2.0.0-rc.1 Docker image.

@EdSchouten EdSchouten changed the title Regression: range vector queries report metrics multip Regression: range vector queries report metrics multiple times Oct 23, 2017

@EdSchouten EdSchouten changed the title Regression: range vector queries report metrics multiple times Regression: range vector queries report samples multiple times Oct 23, 2017

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Oct 23, 2017

Can you reproduce this across all metrics or just a few selected ones?

@EdSchouten

This comment has been minimized.

Copy link
Contributor Author

EdSchouten commented Oct 23, 2017

(Hi Fabian! Thanks for your awesome work on the new storage backend!)

It seems I can reproduce this across all metrics; even implicit ones like up.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Oct 23, 2017

Okay thanks, looks like I can reproduce it. By first intuition this may be the reason for the spike artefact you are seeing. But I cannot reproduce this equally right away.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Oct 23, 2017

So for me it seems the repeated samples when doing a simple range query only happens once, i.e. if you do it over longer and longer periods of time, there's always the fixed same amount of lines at the beginning being duplicated.

Can you confirm that behavior?

Also, do I see it correctly that your scrape interval is 5 minutes? This is kind of a threshold number for scraping. Just for debugging purposes, could you reduce it to at most 2 minutes and check if the spiking behavior in the rate() queries remain (for the newly collected data)?

@EdSchouten

This comment has been minimized.

Copy link
Contributor Author

EdSchouten commented Oct 23, 2017

So for me it seems the repeated samples when doing a simple range query only happens once, i.e. if you do it over longer and longer periods of time, there's always the fixed same amount of lines at the beginning being duplicated.

Can you confirm that behavior?

Yes, that seems to be the case exactly!

Also, do I see it correctly that your scrape interval is 5 minutes?

That's correct.

This is kind of a threshold number for scraping. Just for debugging purposes, could you reduce it to at most 2 minutes and check if the spiking behavior in the rate() queries remain (for the newly collected data)?

I'll give that a try first thing tomorrow morning, leave it running throughout the day and report back to you at the end of the day.

@EdSchouten

This comment has been minimized.

Copy link
Contributor Author

EdSchouten commented Oct 24, 2017

It looks like switching to a two minute scraping interval worked around the issue:

screen shot 2017-10-24 at 16 11 37

@EdSchouten

This comment has been minimized.

Copy link
Contributor Author

EdSchouten commented Oct 26, 2017

Upgraded to RC2 this morning and set the scraping interval back to five minutes. No anomalies so far. I think it's safe to close this issue. Fabian, thanks for solving this so quickly!

screen shot 2017-10-26 at 11 00 11

@EdSchouten EdSchouten closed this Oct 26, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.