Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upCrash recovery should deal better with corrupt data for individual series in checkpoint file #2475
Comments
This comment has been minimized.
This comment has been minimized.
|
see for reference |
korovkin
referenced this issue
Mar 6, 2017
Closed
Too many open files (established connections to same nodes) #1873
beorn7
self-assigned this
Mar 6, 2017
This comment has been minimized.
This comment has been minimized.
|
The code should handle this case more gracefully (by discarding this series and move on), but the root cause is a data corruption in the checkpoint file. Most likely you will have more corruptions, so even with the more graceful behavior in place, you might not be able to recover much. |
This comment has been minimized.
This comment has been minimized.
|
Got it.
Should we close this issue? Or keep it opened ?
sent from a
… On Mar 6, 2017, at 14:57, Björn Rabenstein ***@***.***> wrote:
The code should handle this case more gracefully (by discarding this series and move on), but the root cause is a data corruption in the checkpoint file. Most likely you will have more corruptions, so even with the more graceful behavior in place, you might not be able to recover much.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This comment has been minimized.
This comment has been minimized.
|
I'll take it as a reminder to fix the code as described above. Will change title. |
beorn7
changed the title
unable to recover after an upgrade
Crash recovery should deal better with corrupt data for individual series in checkpoint file
Mar 7, 2017
beorn7
added
component/local storage
kind/bug
labels
Mar 7, 2017
beorn7
added a commit
that referenced
this issue
Apr 6, 2017
beorn7
referenced this issue
Apr 6, 2017
Merged
storage: Guard against a corner case of data corruption #2594
beorn7
closed this
in
#2594
Apr 6, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
lock
bot
locked and limited conversation to collaborators
Mar 23, 2019
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
korovkin commentedMar 6, 2017
What did you do?
upgrade to newer version or prometheus
What did you expect to see?
clean start
What did you see instead? Under which circumstances?
crash
Environment
uname -srm
Linux 4.4.0-62-generic x86_64
./prometheus --version
prometheus, version 1.5.2 (branch: master, revision: bd1182d)
build user: root@a8af9200f95d
build date: 20170210-14:41:22
go version: go1.7.5
Alertmanager version:
insert output of
alertmanager -versionhere (if relevant to the issue)Prometheus configuration file: