Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delta and doubleDelta chunks should be created with minimum length so that data corruption can be handled more gracefully #1653

Closed
CrawX opened this Issue May 24, 2016 · 5 comments

Comments

Projects
None yet
3 participants
@CrawX
Copy link

CrawX commented May 24, 2016

I just got the following slice bounds out of range panic while viewing a graph that has previously worked without problems. I'm using the official 0.18.0 build (linux amd64) available from the release page.

I'm guessing this is irrelevant but the query I'm executing is
label_replace(sum(archive_bytes_written{node="archiving1"}) by (archivename), "archivename_short", "$1$2", "archivename","(standard)|archive-instance-(.*)")

time="2016-05-24T13:54:52Z" level=error msg="parser panic: runtime error: slice bounds out of range
goroutine 118294 [running]:
github.com/prometheus/prometheus/promql.(*evaluator).recover(0xc821f31920, 0xc82391b070)
    /go/src/github.com/prometheus/prometheus/promql/engine.go:526 +0xdd
github.com/prometheus/prometheus/storage/local.(*deltaEncodedChunk).firstTime(0xc823b8e660, 0x1546cf6965b)
    <autogenerated>:22 +0x1c5
github.com/prometheus/prometheus/storage/local.(*memorySeriesIterator).ValueAtOrBeforeTime.func1(0x10, 0x0)
    /go/src/github.com/prometheus/prometheus/storage/local/series.go:583 +0x81
sort.Search(0x11, 0xc82391a4a0, 0x0)
    /usr/local/go/src/sort/search.go:66 +0x52
github.com/prometheus/prometheus/storage/local.(*memorySeriesIterator).ValueAtOrBeforeTime(0xc82372ec80, 0x153ae455648, 0x0, 0x0)
    /go/src/github.com/prometheus/prometheus/storage/local/series.go:582 +0x2a7
github.com/prometheus/prometheus/storage/local.(*boundedIterator).ValueAtOrBeforeTime(0xc821f318e0, 0x153ae455648, 0x0, 0x0)
    /go/src/github.com/prometheus/prometheus/storage/local/storage.go:404 +0x79
github.com/prometheus/prometheus/promql.(*evaluator).vectorSelector(0xc821f31920, 0xc821f5ed00, 0x0, 0x0, 0x0)
    /go/src/github.com/prometheus/prometheus/promql/engine.go:685 +0x157
github.com/prometheus/prometheus/promql.(*evaluator).eval(0xc821f31920, 0x7fc534c1a268, 0xc821f5ed00, 0x0, 0x0)
    /go/src/github.com/prometheus/prometheus/promql/engine.go:675 +0x161e
github.com/prometheus/prometheus/promql.(*evaluator).evalVector(0xc821f31920, 0x7fc534c1a268, 0xc821f5ed00, 0x0, 0x0, 0x0)
    /go/src/github.com/prometheus/prometheus/promql/engine.go:548 +0x64
github.com/prometheus/prometheus/promql.(*evaluator).eval(0xc821f31920, 0x7fc534cdcac8, 0xc821f5ed40, 0x0, 0x0)
    /go/src/github.com/prometheus/prometheus/promql/engine.go:612 +0x6bc
github.com/prometheus/prometheus/promql.(*evaluator).evalVector(0xc821f31920, 0x7fc534cdcac8, 0xc821f5ed40, 0x0, 0x0, 0x0)
    /go/src/github.com/prometheus/prometheus/promql/engine.go:548 +0x64
github.com/prometheus/prometheus/promql.funcLabelReplace(0xc821f31920, 0xc823f68780, 0x5, 0x8, 0x0, 0x0)
    /go/src/github.com/prometheus/prometheus/promql/functions.go:792 +0x7e
github.com/prometheus/prometheus/promql.(*evaluator).eval(0xc821f31920, 0x7fc534cdca90, 0xc8235f2b00, 0x0, 0x0)
    /go/src/github.com/prometheus/prometheus/promql/engine.go:645 +0x1712
github.com/prometheus/prometheus/promql.(*evaluator).Eval(0xc821f31920, 0x7fc534cdca90, 0xc8235f2b00, 0x0, 0x0, 0x0, 0x0)
    /go/src/github.com/prometheus/prometheus/promql/engine.go:599 +0x9a
github.com/prometheus/prometheus/promql.(*Engine).execEvalStmt(0xc8201ddc40, 0x7fc534c1a398, 0xc824eaba40, 0xc821f5edc0, 0xc8236b6c30, 0x0, 0x0, 0x0, 0x0)
    /go/src/github.com/prometheus/prometheus/promql/engine.go:435 +0xa9b
github.com/prometheus/prometheus/promql.(*Engine).exec(0xc8201ddc40, 0xc821f5edc0, 0x0, 0x0, 0x0, 0x0)
    /go/src/github.com/prometheus/prometheus/promql/engine.go:357 +0x4ca
github.com/prometheus/prometheus/promql.(*query).Exec(0xc821f5edc0, 0xc822915cc0)
    /go/src/github.com/prometheus/prometheus/promql/engine.go:195 +0x2e
github.com/prometheus/prometheus/web/api/v1.(*API).queryRange(0xc82026ce40, 0xc82378a700, 0x0, 0x0, 0xffffffffffffffff)
    /go/src/github.com/prometheus/prometheus/web/api/v1/api.go:202 +0x679
github.com/prometheus/prometheus/web/api/v1.(*API).(github.com/prometheus/prometheus/web/api/v1.queryRange)-fm(0xc82378a700, 0x0, 0x0, 0x4)
    /go/src/github.com/prometheus/prometheus/web/api/v1/api.go:126 +0x38
github.com/prometheus/prometheus/web/api/v1.(*API).Register.func1.1(0x7fc534c1a208, 0xc8235f2aa0, 0xc82378a700)
    /go/src/github.com/prometheus/prometheus/web/api/v1/api.go:110 +0x4b
net/http.HandlerFunc.ServeHTTP(0xc8204ff050, 0x7fc534c1a208, 0xc8235f2aa0, 0xc82378a700)
    /usr/local/go/src/net/http/server.go:1422 +0x3a
github.com/prometheus/prometheus/util/httputil.CompressionHandler.ServeHTTP(0x7fc534d1a708, 0xc8204ff050, 0x7fc534c1a130, 0xc8228db9b8, 0xc82378a700)
    /go/src/github.com/prometheus/prometheus/util/httputil/compression.go:90 +0x97
github.com/prometheus/prometheus/util/httputil.(*CompressionHandler).ServeHTTP(0xc8204ff060, 0x7fc534c1a130, 0xc8228db9b8, 0xc82378a700)
    <autogenerated>:5 +0xb6
net/http.(Handler).ServeHTTP-fm(0x7fc534c1a130, 0xc8228db9b8, 0xc82378a700)
    /go/src/github.com/prometheus/prometheus/web/web.go:171 +0x50
github.com/prometheus/prometheus/vendor/github.com/prometheus/client_golang/prometheus.InstrumentHandlerFuncWithOpts.func1(0x7fc534c1a058, 0xc824eb1d90, 0xc82378a700)
    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/client_golang/prometheus/http.go:158 +0x335
github.com/prometheus/prometheus/vendor/github.com/prometheus/common/route.handle.func1(0x7fc534c1a058, 0xc824eb1d90, 0xc82378a700, 0x0, 0x0, 0x0)
    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/common/route/route.go:49 +0x417
github.com/prometheus/prometheus/vendor/github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc8201ddc80, 0x7fc534c1a058, 0xc824eb1d90, 0xc82378a700)
    /go/src/github.com/prometheus/prometheus/vendor/github.com/julienschmidt/httprouter/router.go:299 +0x193
github.com/prometheus/prometheus/vendor/github.com/prometheus/common/route.(*Router).ServeHTTP(0xc8201e5060, 0x7fc534c1a058, 0xc824eb1d90, 0xc82378a700)
    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/common/route/route.go:107 +0x42
net/http.serverHandler.ServeHTTP(0xc8200766c0, 0x7fc534c1a058, 0xc824eb1d90, 0xc82378a700)
    /usr/local/go/src/net/http/server.go:1862 +0x19e
net/http.(*conn).serve(0xc823434000)
    /usr/local/go/src/net/http/server.go:1361 +0xbee
created by net/http.(*Server).Serve
    /usr/local/go/src/net/http/server.go:1910 +0x3f6
" source="engine.go:528"

I manually replaced \n and \t with actual \n and \t in the ouput. Github doesn't format the backtrace too well though so you can also find it here http://pastebin.com/Qj31Qz8W.

I suspect that there is some kind of file corruption here but I assume this should not lead to a panic.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 24, 2016

That looks like data corruption alright. @beorn7

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 24, 2016

There's a good chance that the disk this was on filled up at some point, which we are aware of as the only currently known source of data corruption.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented May 24, 2016

Yes, this is definitely data corruption.

I have an idea why it is not handled more gracefully.

I'll change the title accordingly.

@beorn7 beorn7 self-assigned this May 24, 2016

@beorn7 beorn7 changed the title Slice bounds out of range panic while viewing graph delta and doubleDelta chunks should be created with minimum length so that data corruption can be handled more gracefully May 24, 2016

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented May 24, 2016

Essentially: The used part of a delta or double-delta chunk is encoded in the chunk itself. If that part is corrupted, a chunk may be created that doesn't even have use enough of its (fixed) 1k length that the header fit in completely.

Access to header fields can then lead to the kind of panic reported here.

solution is easy: create chunks with a minimum length, or quarantine the series anyway if a chunk is found that claims to be shorter than possible.

dmilstein added a commit to dmilstein/prometheus that referenced this issue Aug 30, 2016

Catch errors when unmarshalling delta/doubleDelta encoded chunks
This is (hopefully) a fix for prometheus#1653

Specifically, this makes it so that if the length for the stored
delta/doubleDelta is somehow corrupted to be too small, the attempt to
unmarshal will return an error.

The current (broken) behavior is to return a malformed chunk, which can
then lead to a panic when there is an attempt to read header values.

The referenced issue proposed creating chunks with a minimum length -- I
instead opted to just error on the attempt to unmarshal, since I'm not
clear on how it could be safe to proceed when the length is
incorrect/unknown.

The issue also talked about possibly "quarantining series", but I don't
know the surrounding code well enough to understand how to make that
happen.

@beorn7 beorn7 closed this Aug 30, 2016

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.