Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prom2: crash on opening WAL block #2795

Closed
mwitkow opened this Issue Jun 1, 2017 · 7 comments

Comments

Projects
None yet
4 participants
@mwitkow
Copy link
Contributor

mwitkow commented Jun 1, 2017

What did you do?
Ran latest dev-2.0 branch.
What did you expect to see?

Crash looping restart:

2017-06-01T19:19:18.768587000Z time="2017-06-01T19:19:18Z" level=error msg="Opening storage failed: open block /prometheus-data/01BGVHR6TMNFQHTSPK67E9K62W: open head block /prometheus-data/01BGVHR6TMNFQHTSPK67E9K62W: unknown series reference 40611 (max 40591); abort WAL restore" source="main.go:89" 

@fabxc for reference :)

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 6, 2017

Seen this a couple of times myself. Not sure where it's coming from yet.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Jun 7, 2017

@mwitkow Do happen to have the offending WAL file / data dir with you. I haven't been able to reproduce this for sometime now.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Jun 8, 2017

One cause for this is a race-condition, but I am not sure how the usage inside Prometheus could ever cause it:

Suppose we have 100 existing series inside a HeadBlock. Now we open two appenders in two routines A1, A2 and append 30 new series and 60 new series respectively with some common series.

Both try to commit at the same time and the following happens in the given order:

A2 executes createSeries()
A1 executes createSeries() (with its common series referencing the ids from A2's creations)
A1 persists its newlabels, samples
A2 persists its newlabels, samples

Now when reading it back, we read A1's samples which reference A2's id and thereby fail.

But we never have 2 appenders with the same series in Prometheus. Could this be due to staleness changes @fabxc?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 14, 2017

But we never have 2 appenders with the same series in Prometheus.

This can and does happen.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jul 14, 2017

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jul 14, 2017

sgtm

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.