Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate entries in range vector #4358

Closed
jbenoist opened this Issue Jul 6, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@jbenoist
Copy link

jbenoist commented Jul 6, 2018

Bug Report

What did you do?
I'm trying to identify the reason why, on a particular scrape target, rate() computation is systematically inconsistent for basically all counters.

For just one target on a particular instance, it appears that the range vectors rate() operates on aren't right to begin with. Looking at the data points of such a vector (anonymised):

counter_msg_total{label="mylabel", hostname="host4421"}[24h]
[..]
25863821 @1530790815
25863822 @1530790830
25863823 @1530790845
25863825 @1530790860
[..]

This target is scraped every 15s and given the label, there should be no more than 5760 data-points as it is the case for other targets. I'm getting 7200 instead with some duplicates

sort -k2n data-points.txt | perl -MPOSIX -pe 's/@(\d+).*/ctime $1/se' | uniq -c

      1 25863953 Thu Jul  5 04:59:30 2018
      1 25863954 Thu Jul  5 04:59:45 2018
      2 25863954 Thu Jul  5 05:00:00 2018              < - begin duplicate period
      2 25863956 Thu Jul  5 05:00:15 2018
[..]
      2 25864148 Thu Jul  5 05:29:30 2018
      2 25864148 Thu Jul  5 05:29:45 2018               <- end duplicate period
      1 25864148 Thu Jul  5 05:30:00 2018
      1 25864152 Thu Jul  5 05:30:15 2018     

I'm seeing 12 such occurrences of duplicated data, 120 consecutive points spanning 30 minutes. Interestingly, the start time is always an even hour, every 2 hours, and given we are using the following settings:

storage.tsdb.min-block-duration | 2h

I suspect periodic data snapshotting to disk might have something to do with this. Am I on the right track or this 2 hour alignment is just coincidental?

FWIW, Prometheus interprets these as resets:

resets(counter_msg_total{label="mylabel", hostname="host4421"}[24h])
12

it doesn't get therate() right however:

rate(counter_msg_total{label="mylabel", hostname="host4421"}[24h])
3295.033605371303

should be something like this:

$ sort -k1n data-points.txt | head -n1
25863820 @1530790800
$ sort -k1n data-points.txt | tail -n1
25879012 @1530877185
$ echo 'scale=8; (25879012-25863820)/86400' | bc
.17583333

Environment

Version  2.2.1
Go  go1.10

EDIT:

  • 2 hour alignment remark
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 6, 2018

Can you try this with 2.3.1?

@jbenoist

This comment has been minimized.

Copy link
Author

jbenoist commented Jul 6, 2018

I might be able to next week. I'll keep you posted.

My colleagues suspect this might be an issue similar to this one which has been fixed in 2.3.1:
#3939

Do you remember whether this bug caused dups of raw data points as well?

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jul 16, 2018

@jbenoist Any news on testing this with a newer Prometheus version?

@jbenoist

This comment has been minimized.

Copy link
Author

jbenoist commented Jul 19, 2018

@juliusv I haven't had the chance to give it a try on 2.3.1 yet :/
Hopefully in a week or so, I'll give you an update.

@jbenoist

This comment has been minimized.

Copy link
Author

jbenoist commented Jul 25, 2018

Good news, we have been running prometheus 2.3.2 for almost 24hours and the duplicate points are gone.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jul 30, 2018

Thanks for reporting @jbenoist. I'm closing the issue, feel free to reopen if you see the problem again.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.