Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upcrash recovery: Deal with un-open-able LevelDBs archived_fingerprint_to_timerange and archived_fingerprint_to_metric #2210
Comments
beorn7
added
component/local storage
kind/enhancement
labels
Nov 21, 2016
beorn7
self-assigned this
Nov 21, 2016
beorn7
referenced this issue
Mar 20, 2017
Closed
"The storage is now inconsistent. Restart Prometheus ASAP to initiate recovery." When restarting #2509
beorn7
referenced this issue
Apr 6, 2017
Merged
storage: Recover from corrupted indices for archived series #2593
beorn7
closed this
in
#2593
Apr 7, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
lock
bot
locked and limited conversation to collaborators
Mar 23, 2019
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
beorn7 commentedNov 21, 2016
Crash recovery deals properly with inconsistent data in the leveldb directories archived_fingerprint_to_timerange and archived_fingerprint_to_metric. However, in rare cases, the leveldb can be corrupted in a way that already opening it fails.
In that case, the whole crash recovery bails out.
Instead, we should nuke the respective LevelDBs and continue recovery as far as possible. In fact, nuking archived_fingerprint_to_timerange can be completely recovered (it just will take a long time because all archived time series have to be unarchived, note that this will require additional RAM and has to be taken into account for #2139). Nuking archived_fingerprint_to_metric will mean the loss of all archived series, but that's still better than losing everything.