Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reference sequence 0 out of range #4845

Closed
theRealAJR opened this Issue Nov 8, 2018 · 11 comments

Comments

Projects
None yet
4 participants
@theRealAJR
Copy link

theRealAJR commented Nov 8, 2018

What did you do?
I executed a query

What did you expect to see?
the result of the query

What did you see instead? Under which circumstances?
error message "reference sequence 0 out of range"

I have been running two Prometheus instances in Docker containers for about 2 years.
Since the end of August 2018 Prometheus starts to produce the error "reference sequence 0 out of range" frequently (on average once per week, sometimes it happens two days in a row, sometimes it happens after 10 days).

I have updated the OS from Ubuntu 16.04 to 18.04 not long before the problems started, but I have no real indication, that the OS upgrade is related (no other problems in the system or other Dockers).

After I get the error message for the first time, all queries fail with this error.
Once the problem has started, then there is nothing I can do to solve it, but restart the entire machine - restarting the Docker containers or even the Docker daemon has no effect.

I have two Prometheus instances, one for short term data and one for long term data with a big scrape interval. The configs are almost identical, the main differences are, that the short term instance has alert rules and the long term instance reads old data from a third Prometheus v1.8.2 instance (also in a Docker, but not scraping or accessed directly, just for access to the old data via the long term instance).

The problem usually happens on both instances simultaneously, but today I noticed for the first time, that the problem is only occurring on the long term instance.

The metrics are stored on a NAS which is accessed via NFS.
I thought that the NFS mount may be the problem and tried different configs for that, but none helped. Note that the NFS shares worked w/o any problem for over a year and the fact, that currently one instance works while the other has the problem, kind of rules out an NFS problem (the metrics for both instances are on the same NFS share).

Once the problem starts, it can be reproduced by entering any query in Prometheus' web interface: It will then always show the error message "Error executing query: reference sequence 0 out of range" in a red box.
I usually notice that the problem has started, when I access Grafana, which shows no data but an exclamation mark in an orange triangle, which shows the error message "reference sequence 0 out of range" when the mouse pointer touches it.

The log contains a line like

level=error ts=2018-11-08T19:21:52.880918076Z caller=engine.go:554 component="query engine" msg="error expanding series set" err="reference sequence 0 out of range"

for each query that is made. All queries fail, once the problem has started.

Environment

  • System information:

Prometheus is running inside Docker, the image is currently:

REPOSITORY                             TAG                              IMAGE ID            CREATED             SIZE
prom/prometheus                        latest                           42e450d926a8        2 days ago          99.8MB

I update the image frequently

Linux 4.15.0-38-generic x86_64

  • Prometheus version:
prometheus, version 2.5.0 (branch: HEAD, revision: 67dc912ac8b24f94a1fc478f352d25179c94ab9b)
  build user:       root@578ab108d0b9
  build date:       20181106-11:40:44
  go version:       go1.11.1

also earlier versions, since end of August 2018

  • Prometheus configuration file:
    This is for the long term instance, there are no rules in the configured directory.
global:
  scrape_interval: 1h
  scrape_timeout: 10s
  evaluation_interval: 1h
rule_files:
- "/etc/prometheus/rules/*.rule"
scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets:
         - gytha:9100
         - server01.vpn:9100
  - job_name: 'pushGateway'
    static_configs:
      - targets:
         - gytha:9091
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - www.xyz.com
    relabel_configs:
      - source_labels: [__address__]
        regex: (.*)(:80)?
        target_label: __param_target
        replacement: ${1}
      - source_labels: [__param_target]
        regex: (.*)
        target_label: instance
        replacement: ${1}
      - source_labels: []
        regex: .*
        target_label: __address__
        replacement: gytha:9115  # Blackbox exporter.
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - gytha.lancre:9093
remote_read:
  - url: "http://gytha.lancre:9094/api/v1/read"

  • Logs:
level=info ts=2018-11-08T19:21:45.284666262Z caller=main.go:244 msg="Starting Prometheus" version="(version=2.5.0, branch=HEAD, revision=67dc912ac8b24f94a1fc478f352d25179c94ab9b)"
level=info ts=2018-11-08T19:21:45.284710328Z caller=main.go:245 build_context="(go=go1.11.1, user=root@578ab108d0b9, date=20181106-11:40:44)"
level=info ts=2018-11-08T19:21:45.284728051Z caller=main.go:246 host_details="(Linux 4.15.0-38-generic #41-Ubuntu SMP Wed Oct 10 10:59:38 UTC 2018 x86_64 d7d517dac246 (none))"
level=info ts=2018-11-08T19:21:45.284743573Z caller=main.go:247 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-11-08T19:21:45.28476792Z caller=main.go:248 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2018-11-08T19:21:45.285177959Z caller=main.go:562 msg="Starting TSDB ..."
level=info ts=2018-11-08T19:21:45.285214249Z caller=web.go:399 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-11-08T19:21:45.288682108Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1523462400000 maxt=1527400800000 ulid=01CEGMF95QXHKCEAHRG497WVV6
level=info ts=2018-11-08T19:21:45.289830359Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1527400800000 maxt=1532649600000 ulid=01CKCVMRPAKRQM5KPQD40HE43D
level=info ts=2018-11-08T19:21:45.290808244Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1532649600000 maxt=1534399200000 ulid=01CN1068Y8Q35YF6H4GA1P9CV3
level=info ts=2018-11-08T19:21:45.291610853Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1534399200000 maxt=1534982400000 ulid=01CNJBSGS3ZZVWDYG8EQPMRVDD
level=info ts=2018-11-08T19:21:45.292383295Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1534982400000 maxt=1535565600000 ulid=01CP3R6J5NDHMNXMX6V6VK5EHH
level=info ts=2018-11-08T19:21:45.293130539Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535565600000 maxt=1535630400000 ulid=01CP5P036YGCH15N5FTCP8P539
level=info ts=2018-11-08T19:21:45.293823959Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535630400000 maxt=1535695200000 ulid=01CP7KXGNCZ0NV9C3XRR75BZ49
level=info ts=2018-11-08T19:21:45.294620407Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535716800000 maxt=1535724000000 ulid=01CP88GP7AHEPZ0FFPXCMDS0DJ
level=info ts=2018-11-08T19:21:45.295444495Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535695200000 maxt=1535716800000 ulid=01CP88GPE2K6QETY1YDR8EJ0AV
level=info ts=2018-11-08T19:21:45.296132395Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535724000000 maxt=1535731200000 ulid=01CP8FCDF18Z790Q03689WYSYE
level=info ts=2018-11-08T19:21:45.296827385Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535731200000 maxt=1535738400000 ulid=01CP8P84Q216MQ5Z3C3ZPK9FZA
level=info ts=2018-11-08T19:21:45.297568445Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535760000000 maxt=1535767200000 ulid=01CP9HQ1QBZS57TW0Q2SW0CD8B
level=info ts=2018-11-08T19:21:45.29829919Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535738400000 maxt=1535760000000 ulid=01CP9HQ1V47GZFWYHQ0QW45E89
level=info ts=2018-11-08T19:21:45.299024373Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535767200000 maxt=1535774400000 ulid=01CP9RJRZDWCK840EQHZ07KN4N
level=info ts=2018-11-08T19:21:45.299717087Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535774400000 maxt=1535781600000 ulid=01CP9ZEG7B0WR4YXYD6VZW7X1Z
level=info ts=2018-11-08T19:21:45.300398487Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535781600000 maxt=1535788800000 ulid=01CPA6A7FA91V7957SW2VP7ZFV
level=info ts=2018-11-08T19:21:45.301088301Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535788800000 maxt=1535796000000 ulid=01CPAD5YQCX4EZXC32ERY177KX
level=info ts=2018-11-08T19:21:45.301800679Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535796000000 maxt=1535803200000 ulid=01CPAM1NZB6YACKXBGQM8TZCBX
level=info ts=2018-11-08T19:21:45.302630315Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535803200000 maxt=1535824800000 ulid=01CPBFGK5SJ0DG542QYEG1MMW0
level=info ts=2018-11-08T19:21:45.303349889Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535889600000 maxt=1535896800000 ulid=01CPDDA47D0NAB58Q6KCQKHGWN
level=info ts=2018-11-08T19:21:45.304322169Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535824800000 maxt=1535889600000 ulid=01CPDDA4G7JM41805XHZD8NVC0
level=info ts=2018-11-08T19:21:45.305022968Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535896800000 maxt=1535904000000 ulid=01CPDM5VFEDDKFMB7XDJY0JD33
level=info ts=2018-11-08T19:21:45.305724461Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535904000000 maxt=1535911200000 ulid=01CPDV1JQ9TNBR9T5X2HWMFTPD
level=info ts=2018-11-08T19:21:45.306455608Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535911200000 maxt=1535932800000 ulid=01CPEPGFWCNPV972YV3RBH6YK8
level=info ts=2018-11-08T19:21:45.307243561Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535954400000 maxt=1535961600000 ulid=01CPFB3NF93K4DNCTHYBZ5A0FQ
level=info ts=2018-11-08T19:21:45.307962979Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535932800000 maxt=1535954400000 ulid=01CPFB3NKMYDBJ3YKHJAJ726G8
level=info ts=2018-11-08T19:21:45.3086747Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535961600000 maxt=1535968800000 ulid=01CPFHZCQDSANH8TNZBKQ51EDD
level=info ts=2018-11-08T19:21:45.30946926Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535968800000 maxt=1535976000000 ulid=01CPFRV3ZA158CG2W5HXGYXSA2
level=info ts=2018-11-08T19:21:45.310230203Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535976000000 maxt=1535983200000 ulid=01CPFZPV788Z325EH8SDZSKKW4
level=info ts=2018-11-08T19:21:45.310993104Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535983200000 maxt=1535990400000 ulid=01CPG6JJF9CY224VDQ3MAE0B1N
level=info ts=2018-11-08T19:21:45.311693526Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535990400000 maxt=1535997600000 ulid=01CPGDE9Q9PP3E0S98Y8WY1VD9
level=info ts=2018-11-08T19:21:45.312404993Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1535997600000 maxt=1536019200000 ulid=01CPH8X6X15MAZ9853B5Z30KEQ
level=info ts=2018-11-08T19:21:45.313195676Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536019200000 maxt=1536084000000 ulid=01CPK6QWQZ824T1G1ZWRDMB420
level=info ts=2018-11-08T19:21:45.31393847Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536084000000 maxt=1536148800000 ulid=01CPN4HE4EGF6FE787NE324WHJ
level=info ts=2018-11-08T19:21:45.314819691Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536148800000 maxt=1536213600000 ulid=01CPQ2AZCPXCBYFCN06QX5V4BC
level=info ts=2018-11-08T19:21:45.315752161Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536213600000 maxt=1536278400000 ulid=01CPS04GQ2GXMHXW1N1ZRM9P8M
level=info ts=2018-11-08T19:21:45.316535023Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536278400000 maxt=1536343200000 ulid=01CPTXY1V5Q1C4E65WWQSVGW4V
level=info ts=2018-11-08T19:21:45.317333881Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536408000000 maxt=1536415200000 ulid=01CPWVQJNVRFKH0520A40YF8KD
level=info ts=2018-11-08T19:21:45.318236572Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536343200000 maxt=1536408000000 ulid=01CPWVQK7X6CTS48DJV0NGJ5JY
level=info ts=2018-11-08T19:21:45.319118775Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536415200000 maxt=1536422400000 ulid=01CPX2K9XRMTGT813YJDMM71WA
level=info ts=2018-11-08T19:21:45.319898Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536422400000 maxt=1536429600000 ulid=01CPX9F15WZ3HAXCFJXGAR02B6
level=info ts=2018-11-08T19:21:45.320686295Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536429600000 maxt=1536451200000 ulid=01CPY4XYDZ8PZD482SH42466JW
level=info ts=2018-11-08T19:21:45.321542081Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536451200000 maxt=1536472800000 ulid=01CPYSH45FWPJM47KEPJP4EJKP
level=info ts=2018-11-08T19:21:45.32245207Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536472800000 maxt=1536537600000 ulid=01CQ0QANNDHP6AMAV0A24ZGKRF
level=info ts=2018-11-08T19:21:45.323330193Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536537600000 maxt=1536559200000 ulid=01CQ1BXV4EMSV14DVBMRF0YEPF
level=info ts=2018-11-08T19:21:45.324218848Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536559200000 maxt=1536580800000 ulid=01CQ20H0VKJ3QTHE483ZTJK94F
level=info ts=2018-11-08T19:21:45.325050692Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536580800000 maxt=1536602400000 ulid=01CQ2N46KDNRR3ZB8DSFJAKBSG
level=info ts=2018-11-08T19:21:45.3258778Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536602400000 maxt=1536624000000 ulid=01CQ39QCDCQFGG1GTTJSJWFS6N
level=info ts=2018-11-08T19:21:45.326775512Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536624000000 maxt=1536645600000 ulid=01CQ3YAJ5W7SFFSHDYF5Y7X1YA
level=info ts=2018-11-08T19:21:45.327665504Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536645600000 maxt=1536667200000 ulid=01CQ4JXQX0NKW9455XCT5DBXRN
level=info ts=2018-11-08T19:21:45.328461692Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536688800000 maxt=1536696000000 ulid=01CQ57GXDQ40CWK98PRF2H8486
level=info ts=2018-11-08T19:21:45.329296297Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536667200000 maxt=1536688800000 ulid=01CQ57GXM4VS5VCKNKXXXF4EY9
level=info ts=2018-11-08T19:21:45.330180695Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536696000000 maxt=1536703200000 ulid=01CQ5ECMNX5FPJ20HZ9YSYP4KQ
level=info ts=2018-11-08T19:21:45.331109011Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536703200000 maxt=1536710400000 ulid=01CQ5N8BXTJ0TRMQQH4TWRAQXG
level=info ts=2018-11-08T19:21:45.331916213Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536710400000 maxt=1536732000000 ulid=01CQ6GTQS5CWEC900XFVQTYY78
level=info ts=2018-11-08T19:21:45.332700652Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536732000000 maxt=1536753600000 ulid=01CQ74Y05DHPSJJTRYDXRXGY9C
level=info ts=2018-11-08T19:21:45.333565431Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536753600000 maxt=1536775200000 ulid=01CQ7SH5XQ8JB6RQQ77GWS2NDP
level=info ts=2018-11-08T19:21:45.334490744Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536775200000 maxt=1536796800000 ulid=01CQ8E4BPVFV0AMF9MVTSYS7PV
level=info ts=2018-11-08T19:21:45.33541019Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536796800000 maxt=1536861600000 ulid=01CQABXX97538W9766RF5C9JN3
level=info ts=2018-11-08T19:21:45.336301329Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536861600000 maxt=1536926400000 ulid=01CQCA6Z1GX3T2J8BY2R5XP9R9
level=info ts=2018-11-08T19:21:45.33710698Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536926400000 maxt=1536948000000 ulid=01CQCYT4F6177CCPBR0GYQXX2V
level=info ts=2018-11-08T19:21:45.337969779Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536948000000 maxt=1536969600000 ulid=01CQDKDA9KNKKQKVW93WAAK313
level=info ts=2018-11-08T19:21:45.338949714Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536969600000 maxt=1536991200000 ulid=01CQE80G0G9N7R6ZE7XZR1NTVY
level=info ts=2018-11-08T19:21:45.339834835Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1536991200000 maxt=1537056000000 ulid=01CQG5T1KGQTHQ7XQ6TZ2BJGQ7
level=info ts=2018-11-08T19:21:45.340764013Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537056000000 maxt=1537120800000 ulid=01CQJ3KJWG8ACNQHGDE55DRN4W
level=info ts=2018-11-08T19:21:45.341578391Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537120800000 maxt=1537185600000 ulid=01CQM1D44PV71S3MM79B7GPE2M
level=info ts=2018-11-08T19:21:45.34237599Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537185600000 maxt=1537207200000 ulid=01CQMP09GK2FV7KYFZM9R99D60
level=info ts=2018-11-08T19:21:45.34317804Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537228800000 maxt=1537236000000 ulid=01CQNAKF1GSJJNKQBDZBSKKPYE
level=info ts=2018-11-08T19:21:45.344002774Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537207200000 maxt=1537228800000 ulid=01CQNAKF9PJ5XZZX1GK11K09YS
level=info ts=2018-11-08T19:21:45.344851676Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537236000000 maxt=1537243200000 ulid=01CQNHF69H9TWNRMX8AXRWN85T
level=info ts=2018-11-08T19:21:45.345760507Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537243200000 maxt=1537250400000 ulid=01CQNRAXHC6MJ1B54XYM291CTV
level=info ts=2018-11-08T19:21:45.346660735Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537250400000 maxt=1537272000000 ulid=01CQPKSTS7KQ4X5YTFJQS40C2C
level=info ts=2018-11-08T19:21:45.347536712Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537293600000 maxt=1537300800000 ulid=01CQQ8D09HBW4CX5NNMYSQ47T2
level=info ts=2018-11-08T19:21:45.348462074Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537272000000 maxt=1537293600000 ulid=01CQQ8D0K5DWYXXW0TGR5794MX
level=info ts=2018-11-08T19:21:45.349311963Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537300800000 maxt=1537308000000 ulid=01CQQF8QHEEVK05HJ5CMB7CJ1Z
level=info ts=2018-11-08T19:21:45.350105653Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537308000000 maxt=1537315200000 ulid=01CQQP4ESHD0KFQ0PFH8K9S0ZZ
level=info ts=2018-11-08T19:21:45.350997328Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537315200000 maxt=1537898400000 ulid=01CR9962WP37GRCNTGQ6NV4JC2
level=info ts=2018-11-08T19:21:45.352029633Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1537898400000 maxt=1539648000000 ulid=01CSXDX56XH8MSSZWTC2V6CMS7
level=info ts=2018-11-08T19:21:45.353107146Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1539648000000 maxt=1541397600000 ulid=01CVHHSWX1V0RMKPCSH4DMFKN5
level=info ts=2018-11-08T19:21:45.354064929Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1541397600000 maxt=1541592000000 ulid=01CVQB38ZCQR0722WPXSHRCC6Q
level=info ts=2018-11-08T19:21:45.354991854Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1541592000000 maxt=1541613600000 ulid=01CVR07C2R6ASZQPA5K9CH0PNN
level=info ts=2018-11-08T19:21:45.355894329Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1541613600000 maxt=1541635200000 ulid=01CVRMTJ9FAPVFCPHRGP9FB7EP
level=info ts=2018-11-08T19:21:45.356778644Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1541635200000 maxt=1541656800000 ulid=01CVS9DQGBX84BECRJC8RGY31W
level=info ts=2018-11-08T19:21:45.357626283Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1541678400000 maxt=1541685600000 ulid=01CVSY0X1DXY6XCZ77D9K5GPJK
level=info ts=2018-11-08T19:21:45.358467705Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1541656800000 maxt=1541678400000 ulid=01CVSY0XARFBKZ08RQSEMH5N2J
level=info ts=2018-11-08T19:21:45.359285123Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1541685600000 maxt=1541692800000 ulid=01CVT4WM98AD72TF97FH9AP27N
level=info ts=2018-11-08T19:21:45.360077857Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1541692800000 maxt=1541700000000 ulid=01CVTBWB9NDK4S20AJ9GVNDEVE
level=info ts=2018-11-08T19:21:46.109899758Z caller=main.go:572 msg="TSDB started"
level=info ts=2018-11-08T19:21:46.10995506Z caller=main.go:632 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-11-08T19:21:46.110833788Z caller=main.go:658 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-11-08T19:21:46.110849189Z caller=main.go:531 msg="Server is ready to receive web requests."
level=error ts=2018-11-08T19:21:52.880918076Z caller=engine.go:554 component="query engine" msg="error expanding series set" err="reference sequence 0 out of range"

@theRealAJR

This comment has been minimized.

Copy link
Author

theRealAJR commented Nov 8, 2018

this is the docker-compose.yml used to start all the dockers:

  image: "prom/alertmanager:latest"
  container_name: "alertmanager"
  volumes:
    - "/etc/prometheus/alertmanager:/etc/alertmanager"
    - "/opt/prometheus/alertmanager:/alertmanager"
  ports:
    - "9093:9093"
  restart: always
  command: --config.file=/etc/alertmanager/config.yml --storage.path=/alertmanager --web.external-url http://gytha.lancre:9093


blackboxExporter:
  image: "prom/blackbox-exporter:latest"
  container_name: "blackboxExporter"
  volumes:
    - "/etc/prometheus:/config"
  ports:
    - "9115:9115"
  restart: always
  command: --config.file=/config/blackbox.yml


pushgateway:
  image: "prom/pushgateway:latest"
  container_name: "pushgateway"
  ports:
    - "9091:9091"
  restart: always


prometheusShort:
  image: "prom/prometheus:latest"
  container_name: "prometheusShort"
  volumes:
    - "/etc/prometheus/prometheusShort.yml:/etc/prometheus/prometheus.yml"
    - "/etc/prometheus/rulesShort:/etc/prometheus/rules"
    - "/file02/Data/prometheus/short/metrics:/prometheus"
  ports:
    - "9090:9090"
  restart: always
  command: --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/prometheus --storage.tsdb.retention=100d --web.external-url http://gytha.lancre:9090


prometheusLong:
  image: "prom/prometheus:latest"
  container_name: "prometheusLong"
  volumes:
    - "/etc/prometheus/prometheusLong.yml:/etc/prometheus/prometheus.yml"
    - "/etc/prometheus/rulesLong:/etc/prometheus/rules"
    - "/file02/Data/prometheus/long/metrics:/prometheus"
  ports:
    - "9290:9090"
  restart: always
  command: --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/prometheus --storage.tsdb.retention=100y --web.external-url http://gytha.lancre:9290


prometheusLong.v1.8:
  image: "prom/prometheus:v1.8.2"
  container_name: "prometheusLong.v1.8"
  volumes:
    - "/etc/prometheus/prometheusLong.old.nonScrape.yml:/etc/prometheus/prometheus.yml"
    - "/etc/prometheus/rulesLong:/etc/prometheus/rules"
    - "/file02/Data/prometheus/long/metrics.old:/prometheus"
  ports:
    - "9094:9094"
  restart: always
  command: -alertmanager.url http://gytha:9093 -config.file=/etc/prometheus/prometheus.yml -storage.local.path=/prometheus -storage.local.retention=876000h -web.console.libraries=/etc/prometheus/console_libraries -web.console.templates=/etc/prometheus/consoles -web.listen-address :9094
@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Nov 9, 2018

The metrics are stored on a NAS which is accessed via NFS.
I thought that the NFS mount may be the problem and tried different configs for that, but none helped. Note that the NFS shares worked w/o any problem for over a year and the fact, that currently one instance works while the other has the problem, kind of rules out an NFS problem (the metrics for both instances are on the same NFS share).

There's been numerous problems reported due to NFS. The fact that the issue vanishes after restarting the machine seems to indicate that the data on disk isn't corrupted.

@theRealAJR

This comment has been minimized.

Copy link
Author

theRealAJR commented Nov 9, 2018

I forgot to mention, that data is stored correctly while the problem exists i.e. after I restart the machine, I can see data from the time period, where queries returned the "reference sequence 0 out of range" error.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Nov 12, 2018

You can try to run promtool debug all ... and attach the output here. But as I wrote in my first comment, running Prometheus on top of NFS isn't recommended.

@theRealAJR

This comment has been minimized.

Copy link
Author

theRealAJR commented Nov 12, 2018

Thanks for the tip with NFS. I have switched back from NFS 4 to NFS 3 for now (tried CIFS, but the permissions drove me crazy).
I'll test it for some time and will report back.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Nov 29, 2018

@theRealAJR still facing any issues after switching?

@theRealAJR

This comment has been minimized.

Copy link
Author

theRealAJR commented Nov 30, 2018

no - but I need to run it for a little longer to be sure.
I had one instance running with NFS V3 and one with NFS V4 and the V4 instance had the problem again this week, but the V3 instance did not. So switched the all instances to V3. Now I have to let it run for two weeks or so, to be sure it's working with NFS V3.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Nov 30, 2018

Closing for now. Note that using NFS for storing Prometheus data is inherently buggy.

@troy256

This comment has been minimized.

Copy link

troy256 commented Mar 22, 2019

Running Prometheus and Grafana on top of Kubernetes and get the same issue periodically. I just checked and my NFS mounts are v4. Is falling back to v3 a viable workaround?

@theRealAJR

This comment has been minimized.

Copy link
Author

theRealAJR commented Mar 22, 2019

Yes, I have no problems any more on NFS V3.

@troy256

This comment has been minimized.

Copy link

troy256 commented Mar 22, 2019

@theRealAJR Thanks, just made the switch. I will report back if any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.