Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus v2.0.0 data corruption #3534
Comments
This comment has been minimized.
This comment has been minimized.
NFS is not supported, by any version of Prometheus. It requires a POSIX filesystem. |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the quick reply. |
This comment has been minimized.
This comment has been minimized.
|
Hi, this is a known issue with NFS and windows systems, but other POSIX
systems should be fine. As you might have noticed, this has been fixed
upstream and there will be a new release soon with the fixes.
Thanks,
Goutham.
…On Sat, Dec 2, 2017 at 3:17 AM Arno Uhlig ***@***.***> wrote:
Thanks for the quick reply.
Please also consider the 2nd part of the issue: In the same setup while
using a Kubernetes PVC the retention is not considered, so the volume fills
up, eventually leading to the error described above. I saw a couple
potentially related commits in prometheus/tsdb. Is this issue known?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3534 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHA3H8Wk8QPmTA7ycNYbLU1TvT4r4jR_ks5s8HPhgaJpZM4Qy1OD>
.
|
This comment has been minimized.
This comment has been minimized.
|
Thanks for the answer @gouthamve. |
This comment has been minimized.
This comment has been minimized.
alexandrul
commented
Dec 5, 2017
|
I have encountered the same error messages on a Windows server with local storage (the disk appears as In my case I was unable to start Prometheus after a config change until I have deleted the affected folder. Windows Server 2012 R2 |
brian-brazil
closed this
Dec 8, 2017
brian-brazil
added
component/local storage
kind/bug
labels
Dec 8, 2017
This comment has been minimized.
This comment has been minimized.
BugRoger
commented
Dec 8, 2017
|
For what it's worth, we built Prometheus against the latest prometheus/tsdb and that solved this particular issue with NFS. |
This comment has been minimized.
This comment has been minimized.
wkruse
commented
Jan 16, 2018
|
Related to #3506 and should be fixed by prometheus/tsdb#213 and #3508. |
This comment has been minimized.
This comment has been minimized.
anguslees
commented
Feb 14, 2018
•
I still see the above issue with prometheus v2.1.0, which afaict includes #3508. I believe this indicates that the tsdb change was not sufficient. Edit: oh, to clarify: my .nfs* file was created with prometheus v2.0.0. So it's possible v2.1.0 has removed the code that deleted files while still open (I need to run long enough to have a prometheus node go offline before I can be sure). I was expecting/hoping that the fix would also involve correctly ignoring these files if present, but this part of the issue has not changed with v2.1.0. |
This comment has been minimized.
This comment has been minimized.
AnilNeeluru
commented
Feb 27, 2018
|
I am seeing similar data corruption issue with prometheus v2.0.0, when i restart prometheus. To clarify, am not using nfs. it just ext4 filesystem, below is the mount location where empty metadata.json error occurred. This can only be fixed manually by deleting at least the affected directory from the mount location. Please let me know whether this issue can be fixed with upgrading prometheus to later version of v2.1.0 |
dhirajsb
referenced this issue
Mar 7, 2018
Closed
Prometheus Pod stopped with error - Opening Storage Failed #1634
This comment has been minimized.
This comment has been minimized.
fuxes
commented
Jul 31, 2018
|
Has this bug been fixed? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |

auhlig commentedDec 1, 2017
•
edited
At SAP we're using Prometheus to monitor our 13+ kubernetes clusters. The recent upgrade to Prometheus v2.0.0 was initially very smooth, but is meanwhile somewhat painful, since we're seeing the following error on a daily basis. At first Prometheus returns inconsistent metric values, which affects alerting, and eventually crashes with:
On restart it fails with
This can only be fixed manually by deleting at least the affected directory.
Memory usage is consistent. Nothing obvious here.
Prometheus stores the data on an NFS mount, which worked perfectly with previous versions.
Since this makes our monitoring setup quite unreliable, I'm thinking about downgrading to Prometheus v1.8.2, which did a fantastic job in the past.
I cannot see where prometheus fails to write the
meta.json. Hopefully you know more @fabxc?Similar to #2805.
We also observed Prometheus v2.0.0 filling up the 300GiB volume with data. This resulted in
no space left on diskfollowed by the above error. Best guess: Retention was not kicking in.Environment
System information:
Linux 4.13.16-coreos-r1 x86_64
Prometheus version:
prometheus, version 2.0.0 (branch: HEAD, revision: 0a74f98)
build user: root@615b82cb36b6
build date: 20171108-07:11:59
go version: go1.9.2
Prometheus configuration file:
Configuration can be found here.