Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block time ranges overlap #4302

Closed
tcolgate opened this issue Jun 22, 2018 · 34 comments
Closed

block time ranges overlap #4302

tcolgate opened this issue Jun 22, 2018 · 34 comments
Labels

Comments

@tcolgate
Copy link
Contributor

Bug Report

What did you do?
Found prom in a crashloop

What did you expect to see?
Not that

What did you see instead? Under which circumstances?
level=error ts=2018-06-22T07:54:13.895344831Z caller=main.go:597 err="Opening storage failed invalid block sequence: block time ranges overlap: [mint: 1529604000000, maxt: 1529611200000, range: 2h0m0s, blocks: 2]: <ulid: 01CGJ2B565RBWABAM80KJ9N2CW, mint: 1529604000000, maxt: 1529611200000, range: 2h0m0s>, <ulid: 01CGJQDTK2PE8ZCTK5XN2FDMEZ, mint: 1529604000000, maxt: 1529625600000, range: 6h0m0s>\n[mint: 1529611200000, maxt: 1529618400000, range: 2h0m0s, blocks: 2]: <ulid: 01CGJQDTK2PE8ZCTK5XN2FDMEZ, mint: 1529604000000, maxt: 1529625600000, range: 6h0m0s>, <ulid: 01CGJ96WDY85B5RXBNHGTZY9TH, mint: 1529611200000, maxt: 1529618400000, range: 2h0m0s>\n[mint: 1529618400000, maxt: 1529625600000, range: 2h0m0s, blocks: 2]: <ulid: 01CGJQDTK2PE8ZCTK5XN2FDMEZ, mint: 1529604000000, maxt: 1529625600000, range: 6h0m0s>, <ulid: 01CGJG7NSPZKGGVJ53W3SMXRDR, mint: 1529618400000, maxt: 1529625600000, range: 2h0m0s>"

In addition server fails to start.

Environment

  • System information:

    insert output of uname -srm here

  • Prometheus version:

2.3.1

@tcolgate
Copy link
Contributor Author

This was the contents of the data directory when startup failed

01C72NDRM6B723ZRR8AQ4B5BTQ.tmp  01CFS0RBXQWXGTZ6ZM2SFMG6Y3.tmp
01C826QH3WCJT8EY4YBTXRYRKH.tmp  01CFS15TYJCWEV5EB50W6XJXQF.tmp
01C8CGADR4KF4CY4AH5NXNJXXA.tmp  01CFS1GD279M9J1KTE31WYQAR4.tmp
01CBWY56CGETRSZXBXAGY5K4S5.tmp  01CFT8BC8ME6Y7FE8DNE5X5WA5.tmp
01CE4TPXT9GD07NXHVKYJFZB2P      01CFY3RJWQMFCPZE00M0XXS32S.tmp
01CEAM7WDFKKGZ8A552GW6SD96      01CFY41BXSYNENZ5FFZDHMTN0B.tmp
01CEGDSEEZKWGAS7GKZ87G56S4      01CFYS9GNWPM6JG29W3690CH4N
01CEP6XG3Q7D4V5V8JG9MKR4K1.tmp  01CFZCYYCYFRKWRZ1TSBGC859Q.tmp
01CEP7K9K7AMHHV6Y0KVS533FS      01CFZTPCW4YMKPHA4XGPQN0C2T.tmp
01CEW0MNCAZWJN56H3HXE9G13A      01CG13WRC8GSEBW3ZF6BTWZ07W.tmp
01CF1T3PM77G0K38K7SVZ4Y1HG      01CG1ARFN195RN7KCR3M7M3XPC.tmp
01CF39ERN488YKJY1WEG6Q53YZ.tmp  01CG1RFY5S85KZZRQ2AX9H9C85.tmp
01CF62Q6W7GX71ZBG403FRC4MA.tmp  01CG267CN4EQD5S5DE48NE3HVE.tmp
01CF7KMT8CCRE6KHW6JDXXTX8T      01CG2KYV5TB2AT880QDPE6MWV3.tmp
01CF8N3XXP1T97XQ2H0GSN4KJH.tmp  01CG38YSHRVAZR475F7WYSH20S.tmp
01CF8W74A3MZVQ22ABEKY054X3.tmp  01CG4JWA9GGF8KPAXJF9TSY1FZ.tmp
01CFBEPJT5AQ65NCR8D553QB6A.tmp  01CG4K3YPHE9ZB49V8NEZK9Y9E
01CFDD1KK7VK73KMWVWBSHV2WM      01CGABSX821EG9SCZ5CFW47CSK
01CFENK94CMGGPEPD24P90QY45.tmp  01CGAZWEWPSXPZZ09VT1M8T379.tmp
01CFJH5WZACQEGVWG8ENGP3ASW.tmp  01CGC8YHCWT8A046E3EJPBZ8SC.tmp
01CFJYPSWBQF4JANXSRXSH35M4.tmp  01CGCPNZXHWPQMAC38KJ6PS343.tmp
01CFK5JH46K0MGG1X81E1P4P9N.tmp  01CGDJ4WZTXBJX61JKS7RPMRD6.tmp
01CFK6KEXVKC3Q78THSQ1M2BT6      01CGG57DNT3YXWFGZHWQMDAM09.tmp
01CFN3Q2A5YRNPVVPT6QYTVAZT      01CGG68PZRVWYJTWP2RHTT3P1S.tmp
01CFQ1AD8EEYT1S7V60G1J6B6F      01CGG7HYX5A8T55PQBNS21ZMMW
01CFQ81X3YWTN808RM91XJYQZJ.tmp  01CGJ2B565RBWABAM80KJ9N2CW
01CFQEX2756BT4RSYSFHWHRNBV.tmp  01CGJ2X7FJKVTVBE30ZK3S4TYA
01CFQF26WESA2CX9SRP74KW4R3.tmp  01CGJ96WDY85B5RXBNHGTZY9TH
01CFRYZ506VVXYF5DB8SWS06DQ.tmp  01CGJG7NSPZKGGVJ53W3SMXRDR
01CFRZ5H56VBK5BS7FRFTT9CN7.tmp  01CGJPYAXWDWP3CAGV0HVHVDXA
01CFRZHF6C213PD4YXAHPTBSVJ      01CGJQDTK2PE8ZCTK5XN2FDMEZ
01CFRZQQM4RPNWD27X5RGTNW4W.tmp  lock
01CFS02NT52ZAPV96BD103W174.tmp  wal
01CFS081ADFKRRFQV5BBPJ9MED.tmp

I killed off the 01CGJQDTK2PE8ZCTK5XN2FDMEZ directory, I think things are starting now.
Are all the .tmp dirs wasted disk?

@tcolgate
Copy link
Contributor Author

I've hit startup failure on this, and on missing metadata json files (which I think may have happened after running out of disk space at one time).
Would it make more sense to skip broken data shards rather than bailing?

@brian-brazil
Copy link
Contributor

Can you share the logs of the first failure?

Would it make more sense to skip broken data shards rather than bailing?

Continuing to work with a broken database seems unwise to me.

@brian-brazil
Copy link
Contributor

This sounds like prometheus-junkyard/tsdb#347

@tcolgate
Copy link
Contributor Author

@brian-brazil can the .tmp files be cleaned at startup?

@brian-brazil
Copy link
Contributor

I'm not sure it's wise to delete information useful to debug issues like this.

@tcolgate
Copy link
Contributor Author

@brian-brazil I meant more generally.

@brian-brazil
Copy link
Contributor

It is safe for you to delete these files while Prometheus is shutdown.

@krasi-georgiev
Copy link
Contributor

@brian-brazil prometheus-junkyard/tsdb#347 is different and it will not cause overlapping blocks.

I think this one is similar to #4200
The fix is already merged, but @tcolgate did you try to delete any timeseries at any point before the crash?

I think this one happened at failed compaction - a new 6h block was created as a result of a compaction , but it failed somewhere in the middle leaving the original 2h block behind so at next startup these 2 overlapped.

@tcolgate .tmp are created during compaction and when the compaction is complete the old block will be deleted and the .tmp will be renamed by removing the .tmp extension. In theory these are safe to delete as the original block will not be deleted at failure.

@tcolgate
Copy link
Contributor Author

@krasi-georgiev we've not deleted any TS for a while, though I did have to delete a shard a while back that had lost the metadata file (I think failed to write during a situation with a full disk).
We've also have quite a very OOM's during compaction.

@krasi-georgiev
Copy link
Contributor

it seems you are hitting #4108 which has exactly the same behaviour as what you have described.

did you try to lower the storage.tsdb.max-block-duration?
as per your own comment in #4110 (comment)

I think when the compaction runs it expands all series from all compacted blocks in RAM and than writes a new merged block so lowering the storage.tsdb.max-block-duration should in theory limit the compaction to smaller block sizes.

@tcolgate
Copy link
Contributor Author

@krasi-georgiev I need to understand if that can be changed on a pre-existing tsdb. However, since the nil dedupe bug was fixed, our compactions have been completing without problems (though that is only a couple of days ago)

@krasi-georgiev
Copy link
Contributor

just double checked and I don't see any problem on reducing the storage.tsdb.max-block-duration on an existing db.
The logic is that at all blocks outside the storage.tsdb.max-block-duration won't be touched at compaction

@krasi-georgiev
Copy link
Contributor

We've also have quite a very OOM's during compaction.

did I misunderstand that you are still seeing OOM during compaction?

@brian-brazil
Copy link
Contributor

The block size settings should not be changed, those flags are only there for load testing.

@krasi-georgiev
Copy link
Contributor

krasi-georgiev commented Jun 22, 2018

@brian-brazil would you also note why not and a suggestion to help @tcolgate's issue?

the only downside I see is that reducing the storage.tsdb.max-block-duration would leave more smaller blocks which will result in longer query times.

@brian-brazil
Copy link
Contributor

That's just what they're for, and no user should ever have a need to adjust them. Changing them is likely to only cause more problems, and frustrate debugging.

@brian-brazil
Copy link
Contributor

I also note that changing that setting would get us no closer to solving the actual issue. Let's debug the problem rather than changing random unrelated settings.

@krasi-georgiev
Copy link
Contributor

@tcolgate can confirm, but I think the problem is identical to #4108 - oom-ing at compaction due to unsuficient RAM to complete the compaction for the large blocks

@brian-brazil
Copy link
Contributor

Crashing shouldn't break the tsdb though, we should recover gracefully.

@krasi-georgiev
Copy link
Contributor

krasi-georgiev commented Jun 22, 2018

I think I discussed this with @fabxc or @gouthamve and decided to leave any recovery decisions out of tsdb and put them in the tsdb scan cmd.

@krasi-georgiev
Copy link
Contributor

I guess Prometheus can use the new tsdb.Scan method when it fails to open the DB and prompt the user for some recovery decisions - Delete overlapping blocks etc.

@brian-brazil
Copy link
Contributor

A crash at any point shouldn't put us into a situation that requires manual action. Only bugs and data corruption should get us into such a state.

@gouthamve
Copy link
Member

@tcolgate Could you give us the logs from the prometheus until the point of it errors with overlapping blocks?

@tcolgate
Copy link
Contributor Author

tcolgate commented Jun 25, 2018

did I misunderstand that you are still seeing OOM during compaction?

We are not seeing OOMs during compaction under 2.3.1, (this isn't because compaction is taking less ram, but because our OOMs were largely due to other issues, compaction memory usage is still too high, but that is being dealt with elsewhere).

@tcolgate
Copy link
Contributor Author

The logs are not terribly exciting, except for highlight a few exits and OOMs prior to the crash. (We had an issue with our sidecar that was causing more frequest relstart than it should have, The OOMs in this case were caused by a runaway exported (stackdriver-exporter exposing LB latencies), that decide to add an extra 500k timeseries.

log.gz

@krasi-georgiev
Copy link
Contributor

These logs show that the overlap was caused exactly by the OOMing

2018/06/22 03:11:06 Unsolicited response received on idle HTTP channel starting with "\x00\x00\x00\x04\x00\x00\x00\x00\x00"; err=<nil>
level=info ts=2018-06-22T03:13:41.711884064Z caller=compact.go:347 component=tsdb msg="compact blocks" count=3 mint=1529604000000 maxt=1529625600000 ulid=01CGJQDTK2PE8ZCTK5XN2FDMEZ sources="[01CGJ2B565RBWABAM80KJ9N2CW 01CGJ96WDY85B5RXBNHGTZY9TH 01CGJG7NSPZKGGVJ53W3SMXRDR]"
Container exitted: error = "", exitcode = 137
Container died due to OOM

since it didn't complete it created overlapping blocks which was caight at the next start up.

level=error ts=2018-06-22T03:14:10.823993497Z caller=main.go:597 err="Opening storage failed invalid block sequence: block time ranges overlap: [mint: 1529604000000, maxt: 1529611200000, range: 2h0m0s, blocks: 2]: <ulid: 01CGJ2B565RBWABAM80KJ9N2CW, mint: 1529604000000, maxt: 1529611200000, range: 2h0m0s>, <ulid: 01CGJQDTK2PE8ZCTK5XN2FDMEZ, mint: 1529604000000, maxt: 1529625600000, range: 6h0m0s>\n[mint: 1529611200000, maxt: 1529618400000, range: 2h0m0s, blocks: 2]: <ulid: 01CGJQDTK2PE8ZCTK5XN2FDMEZ, mint: 1529604000000, maxt: 1529625600000, range: 6h0m0s>, <ulid: 01CGJ96WDY85B5RXBNHGTZY9TH, mint: 1529611200000, maxt: 1529618400000, range: 2h0m0s>\n[mint: 1529618400000, maxt: 1529625600000, range: 2h0m0s, blocks: 2]: <ulid: 01CGJQDTK2PE8ZCTK5XN2FDMEZ, mint: 1529604000000, maxt: 1529625600000, range: 6h0m0s>, <ulid: 01CGJG7NSPZKGGVJ53W3SMXRDR, mint: 1529618400000, maxt: 1529625600000, range: 2h0m0s>"
level=info ts=2018-06-22T03:14:10.824081682Z caller=main.go:599 msg="See you next time!"

we need to think how would this be best handled, but in the meanwhile you can use the tsdb scan tool to delete the overlapping blocks.
Still WIP but if you need it now let me know and I will compile and attach to a comment.

@tcolgate
Copy link
Contributor Author

tcolgate commented Jun 25, 2018

I infered the block from the message and just blatted it by hand.
From my PoV, these things should be taken care of at startup if at all possible. We're running all our prom instance in kube so prompting isn't really a viable option. We could run TSDB fixups at startup every time (assuming there's a -y or -f).
In general we prize availability over long term retention (I believe prom does generally), so I'd much rather prom came up and lost some data than stayed down (and left us without monitoring for a weekend, as has happened more than 3 times since moving to 2.0).

@krasi-georgiev
Copy link
Contributor

The tsdb tool is only as a POC , but at some point we can add it to Prom itself and automate it, maybe add some explicit flag at startup how to handle such cases - auto delete conflicts, prompt before deleting etc.

We are still discussing and considering different options and your input would be useful.

@brian-brazil
Copy link
Contributor

We should never automatically throw away user data, we should avoid getting into that situation in the first place.

@tcolgate
Copy link
Contributor Author

@brian-brazil you could ignore bad data at start though. By not starting you are defacto ignoring the data I most care about, the data you could be collecting. Clearly there is a fine line to tread. Certainly we shouldn't be getting into these situations in the first place. But refusing to start (thus not collecting data), is just as bad as ignoring data.
Running two prom instances resulted in a pretty confusing query situation for users in these situations (especially if we round robin queries to different proms). I did consider thanos, but the instability of the on disk format around 2.[12] put me off. Having two things shifting the on-disk data around feels wrong.

@fabxc
Copy link
Contributor

fabxc commented Jun 28, 2018

Should this be fixed via prometheus-junkyard/tsdb#354?

@brian-brazil
Copy link
Contributor

That's the presumption, so closing for now.

@lock
Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants