New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash due to violated storage invariant #367
Comments
When using the dumper tool, the storage around that fingerprint looks like this (note that there is only a single chunk for this fingerprint, it's possible that this is what triggers the error):
|
Ok, that's actually a corrupt chunk. The FirstTime is larger than the LastTime, and you can see that this is due to out-of-order samples in the chunk. So now the question is, what corrupted this chunk? Probably a bug in the compactor. I remember we once had this kind of problem in the compactor already, when two arguments to an append() were the wrong way around. |
I traced this down to two pernicious bugs, one in the compactor, one in the watermarker. I have fixes for both, but am holding on to them as I reflect on how to best add regression tests for them (so far I've tested them with a bunch of Go-written tools and scripts).
prometheus/storage/metric/processor.go Line 118 in 4a87c00
The check prometheus/storage/metric/processor.go Line 139 in 4a87c00
That means when the for loop exits that there will be unacted samples that belong to the wrong (next) fingerprint, and they are stuck into a chunk that gets stored as the previous fingerprint. Thus, chunks get corrupted in the way we saw above. Ouch!
/cc @matttproud |
We should evaluate whether it makes sense to entirely get rid of
|
@matttproud Fixes along with regression tests for the two bugs reported in this bug are uploaded for review here: http://review.prometheus.io:8080/#/c/51/ |
There's also a test case included in http://review.prometheus.io:8080/#/c/54/ which is currently commented out and which triggers #368 (which is not yet fixed), in case you want to reproduce it. |
@juliusv OK. Good astute observations in the course of this! Please review the following for post-TBR code review: |
@matttproud Thanks a lot! I'll get to fix these comments up in time! What is still most problematic right now is this compaction crash that I haven't figured out yet: #368. A number of production Prometheusses are crashing every few hours now due to this. The compaction tests in http://review.prometheus.io/#/c/54/4/storage/metric/compaction_regression_test.go have a commented-out test case that reproduce this crash every time. |
So far we've been using Go's native time.Time for anything related to sample timestamps. Since the range of time.Time is much bigger than what we need, this has created two problems: - there could be time.Time values which were out of the range/precision of the time type that we persist to disk, therefore causing incorrectly ordered keys. One bug caused by this was: #367 It would be good to use a timestamp type that's more closely aligned with what the underlying storage supports. - sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit Unix timestamp (possibly even a 32-bit one). Since we store samples in large numbers, this seriously affects memory usage. Furthermore, copying/working with the data will be faster if it's smaller. *MEMORY USAGE RESULTS* Initial memory usage comparisons for a running Prometheus with 1 timeseries and 100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my tests, this advantage for some reason decreased a bit the more samples the timeseries had (to 5-7% for millions of samples). This I can't fully explain, but perhaps garbage collection issues were involved. *WHEN TO USE THE NEW TIMESTAMP TYPE* The new clientmodel.Timestamp type should be used whenever time calculations are either directly or indirectly related to sample timestamps. For example: - the timestamp of a sample itself - all kinds of watermarks - anything that may become or is compared to a sample timestamp (like the timestamp passed into Target.Scrape()). When to still use time.Time: - for measuring durations/times not related to sample timestamps, like duration telemetry exporting, timers that indicate how frequently to execute some action, etc. *NOTE ON OPERATOR OPTIMIZATION TESTS* We don't use operator optimization code anymore, but it still lives in the code as dead code. It still has tests, but I couldn't get all of them to pass with the new timestamp format. I commented out the failing cases for now, but we should probably remove the dead code soon. I just didn't want to do that in the same change as this. Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f
So far we've been using Go's native time.Time for anything related to sample timestamps. Since the range of time.Time is much bigger than what we need, this has created two problems: - there could be time.Time values which were out of the range/precision of the time type that we persist to disk, therefore causing incorrectly ordered keys. One bug caused by this was: prometheus/prometheus#367 It would be good to use a timestamp type that's more closely aligned with what the underlying storage supports. - sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit Unix timestamp (possibly even a 32-bit one). Since we store samples in large numbers, this seriously affects memory usage. Furthermore, copying/working with the data will be faster if it's smaller. *MEMORY USAGE RESULTS* Initial memory usage comparisons for a running Prometheus with 1 timeseries and 100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my tests, this advantage for some reason decreased a bit the more samples the timeseries had (to 5-7% for millions of samples). This I can't fully explain, but perhaps garbage collection issues were involved. *WHEN TO USE THE NEW TIMESTAMP TYPE* The new clientmodel.Timestamp type should be used whenever time calculations are either directly or indirectly related to sample timestamps. For example: - the timestamp of a sample itself - all kinds of watermarks - anything that may become or is compared to a sample timestamp (like the timestamp passed into Target.Scrape()). When to still use time.Time: - for measuring durations/times not related to sample timestamps, like duration telemetry exporting, timers that indicate how frequently to execute some action, etc. Change-Id: I253a467388774280c10400fda122369ff77c1730
So far we've been using Go's native time.Time for anything related to sample timestamps. Since the range of time.Time is much bigger than what we need, this has created two problems: - there could be time.Time values which were out of the range/precision of the time type that we persist to disk, therefore causing incorrectly ordered keys. One bug caused by this was: #367 It would be good to use a timestamp type that's more closely aligned with what the underlying storage supports. - sizeof(time.Time) is 192, while Prometheus should be ok with a single 64-bit Unix timestamp (possibly even a 32-bit one). Since we store samples in large numbers, this seriously affects memory usage. Furthermore, copying/working with the data will be faster if it's smaller. *MEMORY USAGE RESULTS* Initial memory usage comparisons for a running Prometheus with 1 timeseries and 100,000 samples show roughly a 13% decrease in total (VIRT) memory usage. In my tests, this advantage for some reason decreased a bit the more samples the timeseries had (to 5-7% for millions of samples). This I can't fully explain, but perhaps garbage collection issues were involved. *WHEN TO USE THE NEW TIMESTAMP TYPE* The new clientmodel.Timestamp type should be used whenever time calculations are either directly or indirectly related to sample timestamps. For example: - the timestamp of a sample itself - all kinds of watermarks - anything that may become or is compared to a sample timestamp (like the timestamp passed into Target.Scrape()). When to still use time.Time: - for measuring durations/times not related to sample timestamps, like duration telemetry exporting, timers that indicate how frequently to execute some action, etc. *NOTE ON OPERATOR OPTIMIZATION TESTS* We don't use operator optimization code anymore, but it still lives in the code as dead code. It still has tests, but I couldn't get all of them to pass with the new timestamp format. I commented out the failing cases for now, but we should probably remove the dead code soon. I just didn't want to do that in the same change as this. Change-Id: I821787414b0debe85c9fffaeb57abd453727af0f
document pushover configuration
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
A Prometheus instance was crashing every time a curator was run with:
I have a copy of the storage that triggers this crash if someone wants to look at it.
The text was updated successfully, but these errors were encountered: