Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samples 'disappear' after upgrading to 2.2.0 #3981

Closed
ZeeShen opened this Issue Mar 19, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@ZeeShen
Copy link

ZeeShen commented Mar 19, 2018

What did you do?
Upgraded prometheus to 2.2.0 from 2.2.0.rc.1

What did you see instead? Under which circumstances?
Samples disappeared from graph.

image
timezone: +8

We noticed the 1st samples disappear (3.13 13:50 ~ 3.14 02:50) at 3.15 20:00. And on 3.16 morning, we're sure that the data between 3.15 19:50 to 3.16 08:30 were displayed correctly. But they are just disappeared from graph(both grafana and prometheus graph) now.

And we checked the meta.json and prometheus log, it seems good.
meta.json

{
        "ulid": "01C8HE36XZ4WGTZ6Z7EP2YRN9A",
        "minTime": 1520791200000, // 2018-03-12 02:00:00 +0800
        "maxTime": 1520964000000, // 2018-03-14 02:00:00 +0800
        "stats": {
                "numSamples": 7510862718, // a fair number for 48h in our server
                "numSeries": 958405,
                "numChunks": 62832612
        }
}

graph from another prom server(treat the previous prom server as a target)
image
Seems working properly

prometheus log
03-13 14:00(03-13 06:00 utc)

Mar 13 05:00:00 prometheus-prod prometheus[32461]: level=info ts=2018-03-13T05:00:00.015801441Z caller=compact.go:387 component=tsdb msg="compact blocks" count=1 mint=1520906400000 maxt=1520913600000
Mar 13 05:00:16 prometheus-prod prometheus[32461]: level=info ts=2018-03-13T05:00:16.539054115Z caller=head.go:348 component=tsdb msg="head GC completed" duration=678.655817ms
Mar 13 05:00:20 prometheus-prod prometheus[32461]: level=info ts=2018-03-13T05:00:20.784128562Z caller=head.go:357 component=tsdb msg="WAL truncation completed" duration=4.245023391s
Mar 13 06:02:15 prometheus-prod prometheus[32461]: level=error ts=2018-03-13T06:02:15.146000076Z caller=ec2.go:174 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: 0be92c6b-d851-4513-9e7c-4fb8413f4430"
Mar 13 07:00:00 prometheus-prod prometheus[32461]: level=info ts=2018-03-13T07:00:00.015079191Z caller=compact.go:387 component=tsdb msg="compact blocks" count=1 mint=1520913600000 maxt=1520920800000
Mar 13 07:00:17 prometheus-prod prometheus[32461]: level=info ts=2018-03-13T07:00:17.064931216Z caller=head.go:348 component=tsdb msg="head GC completed" duration=701.693945ms
Mar 13 07:00:21 prometheus-prod prometheus[32461]: level=info ts=2018-03-13T07:00:21.332843545Z caller=head.go:357 component=tsdb msg="WAL truncation completed" duration=4.267861188s
Mar 13 07:00:22 prometheus-prod prometheus[32461]: level=info ts=2018-03-13T07:00:22.266165522Z caller=compact.go:387 component=tsdb msg="compact blocks" count=3 mint=1520899200000 maxt=1520920800000
Mar 13 07:00:42 prometheus-prod prometheus[32461]: level=info ts=2018-03-13T07:00:42.870212374Z caller=compact.go:387 component=tsdb msg="compact blocks" count=3 mint=1520856000000 maxt=1520920800000
Mar 13 07:02:15 prometheus-prod prometheus[32461]: level=error ts=2018-03-13T07:02:15.469390901Z caller=ec2.go:174 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: fc15efed-0672-4eaf-9f5e-6c0bf27e8285"
Mar 13 09:00:00 prometheus-prod prometheus[32461]: level=info ts=2018-03-13T09:00:00.020025437Z caller=compact.go:387 component=tsdb msg="compact blocks" count=1 mint=1520920800000 maxt=1520928000000

03-13 18:00(03-14 02:00 utc)

Mar 13 17:00:00 prometheus-prod prometheus[28992]: level=info ts=2018-03-13T17:00:00.038151501Z caller=compact.go:394 component=tsdb msg="compact blocks" count=1 mint=1520949600000 maxt=1520956800000
Mar 13 17:00:15 prometheus-prod prometheus[28992]: level=info ts=2018-03-13T17:00:15.911358876Z caller=head.go:348 component=tsdb msg="head GC completed" duration=676.502157ms
Mar 13 17:00:20 prometheus-prod prometheus[28992]: level=info ts=2018-03-13T17:00:20.189370573Z caller=head.go:357 component=tsdb msg="WAL truncation completed" duration=4.277959919s
Mar 13 19:00:00 prometheus-prod prometheus[28992]: level=info ts=2018-03-13T19:00:00.042806474Z caller=compact.go:394 component=tsdb msg="compact blocks" count=1 mint=1520956800000 maxt=1520964000000
Mar 13 19:00:15 prometheus-prod prometheus[28992]: level=info ts=2018-03-13T19:00:15.516542694Z caller=head.go:348 component=tsdb msg="head GC completed" duration=643.020464ms
Mar 13 19:00:19 prometheus-prod prometheus[28992]: level=info ts=2018-03-13T19:00:19.986127421Z caller=head.go:357 component=tsdb msg="WAL truncation completed" duration=4.469523497s
Mar 13 19:01:41 prometheus-prod prometheus[28992]: level=error ts=2018-03-13T19:01:41.836221415Z caller=ec2.go:174 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: 7c38e3f6-b6e9-4f90-976f-6395dfb9e4d6"
Mar 13 19:02:43 prometheus-prod prometheus[28992]: level=error ts=2018-03-13T19:02:43.589281133Z caller=ec2.go:174 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: 54b9f6d4-8c82-42f5-87de-c7cd0cac2585"
Mar 13 20:01:42 prometheus-prod prometheus[28992]: level=error ts=2018-03-13T20:01:42.535307714Z caller=ec2.go:174 component="discovery manager scrape" discovery=ec2 msg="Refresh failed" err="could not describe instances: RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: 95115df6-c996-4af4-a93a-3658546aa117"

Logs between these 2 timestamps are just normal compact, head gc.

We're not very sure that it's upgrading to 2.2.0 problem. Do you guys have some tsdb tools to unarchive these raws chunks so we can check what's going on with our data?

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Mar 19, 2018

Hi, this is a known issue with 2.2.0 please upgrade to 2.2.1.

@gouthamve gouthamve closed this Mar 19, 2018

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.