Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upprometheus drop many metrics sample after restart #3632
Comments
This comment has been minimized.
This comment has been minimized.
|
Same issue here. Running in a docker container with a bind mount to a directory on the host. All the files are still there but the data just doesn't get loaded back in upon startup, I guess.
|
This comment has been minimized.
This comment has been minimized.
mattouille
commented
Mar 29, 2018
|
I started seeing this a Chef service restart behavior. I'm still working on troubleshooting it. |
This comment has been minimized.
This comment has been minimized.
ijonsnow
commented
Aug 10, 2018
|
Hi, https://www.robustperception.io/reloading-prometheus-configuration |
This comment has been minimized.
This comment has been minimized.
|
Looks similar to #4519 |
simonpasquier
added
component/scraping
and removed
component/scraping
labels
Aug 23, 2018
This comment has been minimized.
This comment has been minimized.
|
If anything please retest with the latest Prometheus version (v2.3.2) that includes many bug fixes for tsdb. Also it doesn't look exactly like #4519 to me as this one is about restarting Prometheus (not reloading). |
simonpasquier
added
kind/more-info-needed
component/scraping
labels
Sep 6, 2018
This comment has been minimized.
This comment has been minimized.
dominikh
commented
Sep 25, 2018
|
I've hit the same issue with 2.4.2 ( I wonder if there's any connection to prometheus/tsdb#21 Prometheus logserver1# ./prometheus level=info ts=2018-09-25T02:38:29.526367852Z caller=main.go:238 msg="Starting Prometheus" version="(version=2.4.2, branch=HEAD, revision=c305ffaa092e94e9d2dbbddf8226c4813b1190a0)" level=info ts=2018-09-25T02:38:29.526448646Z caller=main.go:239 build_context="(go=go1.10.3, user=root@dcde2b74c858, date=20180921-07:36:35)" level=info ts=2018-09-25T02:38:29.526470926Z caller=main.go:240 host_details=(freebsd) level=info ts=2018-09-25T02:38:29.526495291Z caller=main.go:241 fd_limits="(soft=3772890, hard=3772890)" level=info ts=2018-09-25T02:38:29.526514529Z caller=main.go:242 vm_limits="(soft=unlimited, hard=unlimited)" level=info ts=2018-09-25T02:38:29.527921517Z caller=main.go:554 msg="Starting TSDB ..." level=info ts=2018-09-25T02:38:29.528080083Z caller=web.go:397 component=web msg="Start listening for connections" address=0.0.0.0:9090 level=info ts=2018-09-25T02:38:29.528444404Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537812000000 maxt=1537819200000 ulid=01CR77XARHSKQDZASHZZSX3P2Z level=info ts=2018-09-25T02:38:29.528540541Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537819200000 maxt=1537826400000 ulid=01CR77XASEG4487PBXDGVTHHRQ level=info ts=2018-09-25T02:38:29.528612724Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537826400000 maxt=1537833600000 ulid=01CR77XATFHGDJG9P78V99BW7W level=info ts=2018-09-25T02:38:29.528769769Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537768800000 maxt=1537790400000 ulid=01CR77XAVE4ZWH0AZ917GYRGTM level=info ts=2018-09-25T02:38:29.528871542Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537790400000 maxt=1537812000000 ulid=01CR77XAWTC46J928BEK6YE2CG level=warn ts=2018-09-25T02:38:29.615108385Z caller=head.go:371 component=tsdb msg="unknown series references" count=50245 level=info ts=2018-09-25T02:38:29.618446503Z caller=main.go:564 msg="TSDB started" level=info ts=2018-09-25T02:38:29.618531078Z caller=main.go:624 msg="Loading configuration file" filename=prometheus.yml level=info ts=2018-09-25T02:38:29.619960822Z caller=main.go:650 msg="Completed loading of configuration file" filename=prometheus.yml level=info ts=2018-09-25T02:38:29.619993653Z caller=main.go:523 msg="Server is ready to receive web requests." ^Clevel=warn ts=2018-09-25T02:39:39.851666249Z caller=main.go:398 msg="Received SIGTERM, exiting gracefully..." level=info ts=2018-09-25T02:39:39.85174263Z caller=main.go:423 msg="Stopping scrape discovery manager..." level=info ts=2018-09-25T02:39:39.851759987Z caller=main.go:437 msg="Stopping notify discovery manager..." level=info ts=2018-09-25T02:39:39.85177354Z caller=main.go:459 msg="Stopping scrape manager..." level=info ts=2018-09-25T02:39:39.851807335Z caller=main.go:433 msg="Notify discovery manager stopped" level=info ts=2018-09-25T02:39:39.851784873Z caller=main.go:419 msg="Scrape discovery manager stopped" level=info ts=2018-09-25T02:39:39.851862128Z caller=manager.go:638 component="rule manager" msg="Stopping rule manager..." level=info ts=2018-09-25T02:39:39.851881953Z caller=manager.go:644 component="rule manager" msg="Rule manager stopped" level=info ts=2018-09-25T02:39:39.851916549Z caller=main.go:453 msg="Scrape manager stopped" level=info ts=2018-09-25T02:39:39.855099309Z caller=notifier.go:512 component=notifier msg="Stopping notification manager..." level=info ts=2018-09-25T02:39:39.855183239Z caller=main.go:608 msg="Notifier manager stopped" level=info ts=2018-09-25T02:39:39.85535852Z caller=main.go:620 msg="See you next time!" server1# ./prometheus level=info ts=2018-09-25T02:39:44.670238657Z caller=main.go:238 msg="Starting Prometheus" version="(version=2.4.2, branch=HEAD, revision=c305ffaa092e94e9d2dbbddf8226c4813b1190a0)" level=info ts=2018-09-25T02:39:44.670318165Z caller=main.go:239 build_context="(go=go1.10.3, user=root@dcde2b74c858, date=20180921-07:36:35)" level=info ts=2018-09-25T02:39:44.670340061Z caller=main.go:240 host_details=(freebsd) level=info ts=2018-09-25T02:39:44.670361558Z caller=main.go:241 fd_limits="(soft=3772890, hard=3772890)" level=info ts=2018-09-25T02:39:44.670380632Z caller=main.go:242 vm_limits="(soft=unlimited, hard=unlimited)" level=info ts=2018-09-25T02:39:44.671788247Z caller=web.go:397 component=web msg="Start listening for connections" address=0.0.0.0:9090 level=info ts=2018-09-25T02:39:44.671772107Z caller=main.go:554 msg="Starting TSDB ..." level=info ts=2018-09-25T02:39:44.672191239Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537812000000 maxt=1537819200000 ulid=01CR77XARHSKQDZASHZZSX3P2Z level=info ts=2018-09-25T02:39:44.672279677Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537819200000 maxt=1537826400000 ulid=01CR77XASEG4487PBXDGVTHHRQ level=info ts=2018-09-25T02:39:44.672354588Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537826400000 maxt=1537833600000 ulid=01CR77XATFHGDJG9P78V99BW7W level=info ts=2018-09-25T02:39:44.672533257Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537768800000 maxt=1537790400000 ulid=01CR77XAVE4ZWH0AZ917GYRGTM level=info ts=2018-09-25T02:39:44.672621292Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537790400000 maxt=1537812000000 ulid=01CR77XAWTC46J928BEK6YE2CG level=warn ts=2018-09-25T02:39:44.769766936Z caller=head.go:371 component=tsdb msg="unknown series references" count=50245 level=info ts=2018-09-25T02:39:44.771257145Z caller=main.go:564 msg="TSDB started" level=info ts=2018-09-25T02:39:44.771336206Z caller=main.go:624 msg="Loading configuration file" filename=prometheus.yml level=info ts=2018-09-25T02:39:44.77262786Z caller=main.go:650 msg="Completed loading of configuration file" filename=prometheus.yml level=info ts=2018-09-25T02:39:44.772651386Z caller=main.go:523 msg="Server is ready to receive web requests." |
This comment has been minimized.
This comment has been minimized.
|
I've also gotten a report of this with a 2.4.2, which is OOMing. |
This comment has been minimized.
This comment has been minimized.
hhoffstaette
commented
Nov 6, 2018
|
Still happens with 2.4.3/2.5.0. |
This comment has been minimized.
This comment has been minimized.
bentsi
commented
Nov 21, 2018
This comment has been minimized.
This comment has been minimized.
Starefossen
commented
Jan 4, 2019
|
Just got hit by the same while upgrading form 1.4.2 to 1.6.0:
|
This comment has been minimized.
This comment has been minimized.
gt510
commented
Jan 16, 2019
•
|
Experiencing the same right after install. Images:
|

ranbochen commentedDec 28, 2017
What did you do?
restart prometheus
What did you expect to see?
reload all metrics correctly。
seeing of this log msg: "unknown series references in WAL samples" count=35160
prometheus drops a lot of metrics samples!
see:
prometheus/tsdb#21
#3489