Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus not recovering gracefully after disk fill event #4194

Closed
xginn8 opened this Issue May 24, 2018 · 3 comments

Comments

Projects
None yet
3 participants
@xginn8
Copy link

xginn8 commented May 24, 2018

Bug Report

The partition containing Prometheus data (as set by --storage.tsdb.path) filled, and subsequent writes failed with "no space left on partition". This condition persists until the process is restarted, but the /-/healthy endpoint and HTTP API stay up and reporting "Prometheus is Healthy."

What did you expect to see?
After freeing space on the partition, Prometheus should continue writing data to the partition.

What did you see instead? Under which circumstances?
Even after clearing space, Prometheus continues to fail all writes with the same error message:

May 24 16:58:18 prometheus prometheus[8155]: level=warn ts=2018-05-24T20:58:53.194997464Z caller=scrape.go:717 component="scrape manager" scrape_pool=consul-services target=http://host1:9100/metrics msg="append failed" err="WAL log samples: log series: write /var/lib/prometheus/data/wal/000257: no space left on device"
May 24 16:58:18 prometheus prometheus[8155]: level=warn ts=2018-05-24T20:58:53.195621359Z caller=scrape.go:713 component="scrape manager" scrape_pool=consul-services target=http://host2:9100/metrics msg="append failed" err="WAL log samples: log series: write /var/lib/prometheus/data/wal/000257: no space left on device"
Thu May 24 16:57:52 EDT 2018
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0      4.9G  1.3G  3.4G  28% /var/lib/prometheus/data

Environment

  • System information:
uname -srm
Linux 4.14.22 x86_64
  • Prometheus version:
    tested against 2.2.0, 2.2.1, and HEAD
prometheus, version 2.2.1 (branch: master, revision: 18e6fa7c8aafbe45ed0a9981fdd32affe819bbb4)
build user:       user@system
build date:       20180524-20:50:19
go version:       go1.10.2
@hoffie

This comment has been minimized.

Copy link

hoffie commented May 26, 2018

Possibly relevant open issues/PRs:

  • #3283 (Recovery after disk is full)
  • #3807 #3816 (Improving health check in disk full situations)
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 13, 2018

Thanks @hoffie, those look to already cover this.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.