Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSDB error restarting prometheus #4896

Closed
marioanton opened this Issue Nov 22, 2018 · 2 comments

Comments

Projects
None yet
1 participant
@marioanton
Copy link

marioanton commented Nov 22, 2018

Bug Report

What did you do?
Just restarted the service.

What did you expect to see?
No errors restarting the service
What did you see instead? Under which circumstances?
Check logs.
Environment

  • System information:

Linux 3.10.0-862.11.6.el7.x86_64 x86_64

  • Prometheus version:

/usr/local/bin/prometheus --version
prometheus, version 2.4.3 (branch: HEAD, revision: 167a4b4)
build user: root@1e42b46043e9
build date: 20181004-08:42:02
go version: go1.11.1

  • Prometheus configuration file:
---
  global:
    scrape_interval: "15s"
    evaluation_interval: "30s"
    external_labels:
      monitor: master
  rule_files:
    - "/etc/prometheus/rules/recording_rules/*"
    - "/etc/prometheus/rules/alerting_rules/*"
  scrape_configs:
    - job_name: prometheus
  • Logs:
    `Nov 22 11:53:07 en1bdess0190100 prometheus[5768]: level=error ts=2018-11-22T11:53:07.574181131Z caller=main.go:617 err="opening storage failed: invalid block sequence: block time ranges overlap: [mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s, blocks: 32]: <ulid: 01CWXAWEHPGG6HP82MH4JES3GJ, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXACJFPW016V5X91X7X1A1Q, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXAGWGMFBPT6F5Z01Z5K97J, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXAKGH4N26ZF0PP901PRJ9D, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXASRBCQS8XF4RGV8WNMS93, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXABGKSQBVG7VKFY649MXX9, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXBFVEN3PRKWV5F6PSCXSFD, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXBJC8E28K517NJ17B0XM4K, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXBN23K5FV63KWXZSVMNKSW, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXBSN7MEV5AH9YGWE4MYW4G, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXBYGETM4JYNRDHJ0C5G354, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXC1AX4DTVBAGQGMHYJRK36, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXC5V4VZ5B74503YMG6H49Z, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXCSA95KXAQDSYS7B7YG3TS, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXCVYDMW9YVDK7W9SGP68XG, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXDH17SJTCCGPCM688ANEBS, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXDKSP8JSVXGB96EBPF6X3B, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXDPJKTSQWR6DPPFT8E0GKV, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXE4XB1H097FZYG5PZJRPFZ, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXE7GQTAG1RXY83F3NJYT57, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXEA4V6XQN5F1T99KQCF9BZ, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXECQFHDBCZG76MFVCKFJ1R, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXEHBQWZ7VZFKJJMY5ZYPXJ, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXEM387D914Y912BC41PR4X, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXEPT3BBAQ2H5YX313W0SBY, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXESHP70SKPD0DYM4PQBN5E, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXFA2GA8E4HM1R4MBNK32EP, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXFCV580KTRXFS6VJ6R03GJ, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXFQ4NVC5B2VH78RB4HV3VS, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXGGR12YRBGV10P75PW0K5F, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXGS21APMM5NG067VKPWBAK, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>, <ulid: 01CWXGVT7VR0DQ9GJE3CD08TX3, mint: 1542866400000, maxt: 1542873600000, range: 2h0m0s>"
    Nov 22 11:53:07 en1bdess0190100 systemd[1]: prometheus.service: main process exited, code=exited, status=1/FAILURE
    Nov 22 11:53:07 en1bdess0190100 systemd[1]: Unit prometheus.service entered failed state.````
@marioanton

This comment has been minimized.

Copy link
Author

marioanton commented Nov 22, 2018

  • We've got a cpl of nodes with the same config and the other one is no having that issue. Same OS,arch etc...
  • We've got a massive difference of disk usage between one and the other. (around 100G)
@marioanton

This comment has been minimized.

Copy link
Author

marioanton commented Nov 23, 2018

  • Sorted by moving data files for last day to anohter folder.
  • INcreased open files for prometheus.

@marioanton marioanton closed this Nov 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.