Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent startup crash in storage adapter #3340

Closed
juliusv opened this Issue Oct 23, 2017 · 10 comments

Comments

Projects
None yet
4 participants
@juliusv
Copy link
Member

juliusv commented Oct 23, 2017

Using Prometheus built from the
current master with the attached data directory (data.tar.gz) (which just contains a WAL so far), Prometheus consistently crashes like this:

level=info ts=2017-10-23T22:57:16.067433961Z caller=main.go:215 msg="Starting Prometheus" version="(version=2.0.0-rc.1, branch=master, revision=c3d6abc8e6444c33c0071aaddbef402aa553ff68)"
level=info ts=2017-10-23T22:57:16.067475452Z caller=main.go:216 build_context="(go=go1.9, user=julius@julius-samsung, date=20171023-22:56:03)"
level=info ts=2017-10-23T22:57:16.067491611Z caller=main.go:217 host_details="(Linux 4.11.0-14-generic #20~16.04.1-Ubuntu SMP Wed Aug 9 09:06:22 UTC 2017 x86_64 julius-samsung (none))"
level=info ts=2017-10-23T22:57:16.069739337Z caller=web.go:380 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2017-10-23T22:57:16.069779637Z caller=main.go:314 msg="Starting TSDB"
level=info ts=2017-10-23T22:57:16.069784013Z caller=targetmanager.go:69 component="target manager" msg="Starting target manager..."
level=info ts=2017-10-23T22:57:16.090802158Z caller=main.go:326 msg="TSDB started"
level=info ts=2017-10-23T22:57:16.090856994Z caller=main.go:394 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2017-10-23T22:57:16.091676177Z caller=main.go:371 msg="Server is ready to receive requests."
panic: runtime error: index out of range

goroutine 242 [running]:
github.com/prometheus/prometheus/storage/tsdb.adapter.StartTime(0xc4203348f0, 0x6ddd00, 0xc420064fa8, 0x42a529, 0xc400000008)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/storage/tsdb/tsdb.go:193 +0x4e0
github.com/prometheus/prometheus/storage/tsdb.(*ReadyStorage).StartTime(0xc420432320, 0x196bf00, 0xc420064f01, 0x297f098)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/storage/tsdb/tsdb.go:68 +0x4e
github.com/prometheus/prometheus/storage/tsdb.(*ReadyStorage).StartTime-fm(0x196bf00, 0x0, 0x0)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:227 +0x2a
github.com/prometheus/prometheus/storage/remote.(*Storage).Querier(0xc4204b9180, 0x2890c00, 0xc421898a80, 0x15f4b6fdbfa, 0x15f4b746fda, 0x0, 0x0, 0x0, 0x0)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/storage/remote/read.go:33 +0x10a
github.com/prometheus/prometheus/storage.(*fanout).Querier(0xc420052600, 0x2890c00, 0xc421898a80, 0x15f4b6fdbfa, 0x15f4b746fda, 0xffffffbbc166ed07, 0x295be80, 0xa60b19d1f4a2c767, 0xc4200651a8)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/storage/fanout.go:79 +0x1ed
github.com/prometheus/prometheus/promql.(*Engine).populateIterators(0xc420436690, 0x2890c00, 0xc421898a80, 0xc42189e550, 0x411bbc, 0xc4200207e0, 0xc4200652a8, 0x43c81c)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/promql/engine.go:510 +0x2f5
github.com/prometheus/prometheus/promql.(*Engine).execEvalStmt(0xc420436690, 0x2890c00, 0xc421898a80, 0xc420e661c0, 0xc42189e550, 0x0, 0x0, 0x0, 0x0)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/promql/engine.go:361 +0xf5
github.com/prometheus/prometheus/promql.(*Engine).exec(0xc420436690, 0x2890c00, 0xc421898a80, 0xc420e661c0, 0x0, 0x0, 0x0, 0x0)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/promql/engine.go:341 +0x317
github.com/prometheus/prometheus/promql.(*query).Exec(0xc420e661c0, 0x2890b80, 0xc420052f00, 0x2890b80)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/promql/engine.go:179 +0xb8
github.com/prometheus/prometheus/rules.(*AlertingRule).Eval(0xc420444c00, 0x7f05ed803ac8, 0xc420052f00, 0xbe73bad4b609b929, 0x19acba507, 0x295be80, 0xc420436690, 0xc420359700, 0x0, 0x0, ...)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/rules/alerting.go:164 +0x177
github.com/prometheus/prometheus/rules.(*Group).Eval.func1(0x1ba6776, 0x8, 0xc420444d80, 0xbe73bad4b609b929, 0x19acba507, 0x295be80, 0x0, 0x2890ec0, 0xc420444c00)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/rules/manager.go:307 +0x194
github.com/prometheus/prometheus/rules.(*Group).Eval(0xc420444d80, 0xbe73bad4b609b929, 0x19acba507, 0x295be80)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/rules/manager.go:375 +0xbf
github.com/prometheus/prometheus/rules.(*Group).run.func1()
	/home/julius/gosrc/src/github.com/prometheus/prometheus/rules/manager.go:182 +0x81
github.com/prometheus/prometheus/rules.(*Group).run(0xc420444d80)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/rules/manager.go:207 +0x23c
github.com/prometheus/prometheus/rules.(*Manager).ApplyConfig.func1.1(0xc420052f40, 0xc420444d80)
	/home/julius/gosrc/src/github.com/prometheus/prometheus/rules/manager.go:510 +0x46
created by github.com/prometheus/prometheus/rules.(*Manager).ApplyConfig.func1
	/home/julius/gosrc/src/github.com/prometheus/prometheus/rules/manager.go:505 +0x56

I'm not aware of anything I did to corrupt the data directory, so this seems like a Prometheus bug to me.

The code where it crashes:

                        if p.Next() {
                                var lset tsdbLabels.Labels
                                var chks []tsdb.ChunkMeta
                                indexr.Series(p.At(), &lset, &chks)
                                if startTime > chks[0].MinTime {    // <--- CRASH HERE
                                        startTime = chks[0].MinTime
                                }
                        }

...looks like it expects there to always be a chunk, but maybe my storage doesn't have one for this case yet?

@juliusv

This comment has been minimized.

Copy link
Member Author

juliusv commented Oct 23, 2017

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Oct 24, 2017

Looks like a bug coming from bf4a279

@brian-brazil @Thib17 what are the intended semantics? Looks like on calling this we are hitting every single series in the entire storage and check its first timestamp. That does seem rather expensive even if we were caching the result.
It seems in return we are not getting much for that high cost and we even add a 'safety margin' further down.

Can we just skip the entire scanning of all series and just return the minTime of the first block plus the duration of --storage.tsdb.min-block-duration, which is typically 2h? Basically, just a bigger safety margin.
On average that should give about the same result after running for a few hours and be a lot cheaper.

Aside from the problem of not closing the index reader discovered yesterday, the code also seems to ignore errors returned by iterators.
This particular function should get some additional testing. Otherwise, I'd be inclined to temporarily revert the commit before the next release.

@Thib17

This comment has been minimized.

Copy link
Contributor

Thib17 commented Oct 24, 2017

Sorry for the bug.

The intended semantic was to look for the first chunk of every known job. This was more precise than the minTime of the first block. I should have avoided things like not closing index reader or ignoring some errors. I'm sorry for that too.

As discussed in #3129 just returning the minTime of the first block doesn't fit our needs. But just returning maxTime of the first block would fit it. At first I thought that scanning the index for all jobs would give us a more precise value, but I had no idea how to check if this was too expensive or not.

@fabxc If you agree I can prepare a new PR cleaning it (by just returning maxTime of the first block) today.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Oct 24, 2017

@Thib17 thanks, and no worries – I changed the index reader API requirements recently and your PR has long been in the works by then. You had no chance to know basically :)

New PR would be great! I was thinking though to actually take the minTime of the first block plus the --storage.tsdb.min-block-duration. Because the storage compacts old blocks into larger ones, the maxTime would get higher over time otherwise. But since the original data doesn't change, the above way should give us some consistency.

Thanks for jumping in so quickly!

@juliusv

This comment has been minimized.

Copy link
Member Author

juliusv commented Oct 24, 2017

I vaguely remember @brian-brazil telling me back then when I implemented the first remote read that the remote storage should always be queried for some reason (maybe because it may generate extra results that aren't even in the local DB, like his MySQL querier demo?), I hope he can confirm/deny.

Side note (that is now probably irrelevant): it's unsafe to assume that all series have a job label. You could remove that label by relabeling, but it also doesn't have to be present in the result of a recording rule.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Oct 24, 2017

I implemented the first remote read that the remote storage should always be queried for some reason

It should be possible for it always to be queried, but that's not what you want for remote storage. For the MVP to keep things simple we always queried.

@juliusv

This comment has been minimized.

Copy link
Member Author

juliusv commented Oct 25, 2017

It should be possible for it always to be queried, but that's not what you want for remote storage. For the MVP to keep things simple we always queried.

Ok, so a change as proposed above that would simply turn off remote queries unconditionally if the timestamp is too young would not be what we want then, right? Either always keep them on, or allow it to be configured as a next step?

@Thib17

This comment has been minimized.

Copy link
Contributor

Thib17 commented Oct 25, 2017

It's not unconditionally. In bf4a279 we made it configurable per remote read to allow reading recent timestamps or not. We add a ReadRecent flag in the config file (default set at true).

@juliusv

This comment has been minimized.

Copy link
Member Author

juliusv commented Oct 25, 2017

@Thib17 Ah, I wasn't aware of the new config option. Thanks for pointing that out.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.