Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus 2 failing to start after delete_series API call #3788
Comments
This comment has been minimized.
This comment has been minimized.
|
Any updates on this issue? I managed to get it started up again by deleting the offending directory, but isn't this a bug if an API call would cause Prometheus to crash in this way? I'm not confident anymore about running any more Would updating to version 2.1 help with this? |
This comment has been minimized.
This comment has been minimized.
|
@gouthamve I think this one is for you :) |
This comment has been minimized.
This comment has been minimized.
|
Hmm, sorry for not responding earlier. I need to see the logs of the service on which the Are you running on NFS or something similar? |
gouthamve
added
the
kind/more-info-needed
label
Feb 5, 2018
This comment has been minimized.
This comment has been minimized.
|
Hi @gouthamve - when you refer to service, do you mean the Prometheus service or the application that was generating the metrics? I am not sure if I can get the logs for the old Prometheus pod now as it was destroyed and had to be recreated. We are not using NFS, no. The prometheus pod is using an attached PV thought, which is just a Where is this rename call happening in Prometheus? |
This comment has been minimized.
This comment has been minimized.
|
When you call delete we re-write the tombstones file (which is essentially missing now causing the error): https://github.com/prometheus/tsdb/blob/master/block.go#L463-L465 Which calls the rename: https://github.com/prometheus/tsdb/blob/d0982ac4d5057a45050c6cedf2d730ed50e5a19d/tombstones.go#L100 which essentially does |
This comment has been minimized.
This comment has been minimized.
|
I wasn't able to get the logs of the previous pod directly, but I found some of the relevant entries in Stackdriver and I'll paste them here. I think these are the first errors that were thrown when that API call was made. The quoting might look a bit strange because I'm copy/pasting from Stackdriver:
It looks like it crashed as soon as the API call was made, then when it restarted it couldn't find the So it's possible something went wrong inside the writeTombstoneFile function but I'm not sure. |
This comment has been minimized.
This comment has been minimized.
|
I have a test reproducing it. Will send a PR soon. |
gouthamve
closed this
in
prometheus/tsdb#277
Apr 3, 2018
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
kerneljack commentedFeb 2, 2018
What did you do?
We used
curlto call the following endpoint:Prometheus immediately crashed, and is now outputting the following to the logs:
What did you expect to see?
Prometheus should have deleted the relevant data and continued working as normal.
What did you see instead? Under which circumstances?
The Prometheus pod is now continuously crashing (on Kubernetes).
Environment
Kubernetes pod on GKE (1.8.5)