Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in LabelValues sorting (Prometheus 2.0) #3217

Closed
grobie opened this Issue Sep 25, 2017 · 5 comments

Comments

Projects
None yet
3 participants
@grobie
Copy link
Member

grobie commented Sep 25, 2017

What did you do?

Run Prometheus v2.0.0-beta.5 (plus a tiny patch #3215) against our node_exporter infrastructure (so very static targets, almost no time series churn)

What did you expect to see?

Stable Prometheus servers, without any gaps in data.

What did you see instead? Under which circumstances?

I first noticed that our graphs had gaps. I then realized that these gaps were the results of crashes. All crashes happen at the beginning of a compaction cycle it seems.

Environment

  • System information:

Linux 4.4.10+soundcloud #1 SMP Thu Jun 16 15:17:20 UTC 2016 x86_64 GNU/Linux

  • Prometheus version:
prometheus, version 2.0.0-beta.5.sc0 (branch: grobie/http-accept, revision: 5964c0dc6f460d12ba8df62cde5849a62b24648d)
  build user:       grobie@grobook
  build date:       20170922-16:11:02
  go version:       go1.9
  • Logs:

The full stack trace includes over 50k lines

2017-09-25_13:05:07.99418 unexpected fault address 0x7f411bf728a3
2017-09-25_13:05:07.99421 fatal error: fault
2017-09-25_13:05:07.99748 [signal SIGSEGV: segmentation violation code=0x1 addr=0x7f411bf728a3 pc=0x45c232]
2017-09-25_13:05:07.99749 
2017-09-25_13:05:07.99750 goroutine 9874 [running]:
2017-09-25_13:05:07.99750 runtime.throw(0x1babb6f, 0x5)
2017-09-25_13:05:07.99750       /usr/lib/go/src/runtime/panic.go:605 +0x95 fp=0xc726366540 sp=0xc726366520 pc=0x42bc55
2017-09-25_13:05:07.99751 runtime.sigpanic()
2017-09-25_13:05:07.99751       /usr/lib/go/src/runtime/signal_unix.go:374 +0x227 fp=0xc726366590 sp=0xc726366540 pc=0x4426c7
2017-09-25_13:05:07.99752 runtime.cmpbody()
2017-09-25_13:05:07.99752       /usr/lib/go/src/runtime/asm_amd64.s:1561 +0xa2 fp=0xc726366598 sp=0xc726366590 pc=0x45c232
2017-09-25_13:05:07.99752 sort.StringSlice.Less(...)
2017-09-25_13:05:07.99752       /usr/lib/go/src/sort/sort.go:337
2017-09-25_13:05:07.99753 sort.(*StringSlice).Less(0xc6e12fb2c0, 0x0, 0x63d, 0xf00)
2017-09-25_13:05:07.99754       <autogenerated>:1 +0x85 fp=0xc7263665d0 sp=0xc726366598 pc=0x4dd985
2017-09-25_13:05:07.99754 sort.doPivot(0x28a07a0, 0xc6e12fb2c0, 0x0, 0x1fd0, 0x7f470df158b0, 0x0)
2017-09-25_13:05:07.99754       /usr/lib/go/src/sort/sort.go:123 +0x144 fp=0xc726366660 sp=0xc7263665d0 pc=0x4dae04
2017-09-25_13:05:07.99755 sort.quickSort(0x28a07a0, 0xc6e12fb2c0, 0x0, 0x1fd0, 0x1a)
2017-09-25_13:05:07.99755       /usr/lib/go/src/sort/sort.go:192 +0x8a fp=0xc7263666b8 sp=0xc726366660 pc=0x4db3aa
2017-09-25_13:05:07.99756 sort.Sort(0x28a07a0, 0xc6e12fb2c0)
2017-09-25_13:05:07.99756       /usr/lib/go/src/sort/sort.go:220 +0x79 fp=0xc7263666f8 sp=0xc7263666b8 pc=0x4db5b9
2017-09-25_13:05:07.99757 sort.Strings(0xc5f95f6000, 0x1fd0, 0x2600)
2017-09-25_13:05:07.99758       /usr/lib/go/src/sort/sort.go:353 +0x6d fp=0xc726366740 sp=0xc7263666f8 pc=0x4dbd7d
2017-09-25_13:05:07.99759 github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*headIndexReader).LabelValues(0xc6e12fb0a0, 0xc4e318e140, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0)
2017-09-25_13:05:07.99759       /home/grobie/code/go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:766 +0x24d fp=0xc726366830 sp=0xc726366740 pc=0x15882cd
2017-09-25_13:05:07.99760 github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*postingsReader).selectSingle(0xc726366b00, 0x2891aa0, 0xc6e12fb220, 0xc4205cb500, 0x7f470df158b0)
2017-09-25_13:05:07.99761       /home/grobie/code/go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/querier.go:251 +0x1c8 fp=0xc726366990 sp=0xc726366830 pc=0x1597188
2017-09-25_13:05:07.99761 github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*postingsReader).Select(0xc726366b00, 0xc6e12fb1a0, 0x2, 0x2, 0x20, 0xc6e12fb240, 0x7f4108a861c8, 0x0, 0x411108)
2017-09-25_13:05:07.99763       /home/grobie/code/go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/querier.go:202 +0x130 fp=0xc726366a78 sp=0xc726366990 pc=0x1596920
2017-09-25_13:05:07.99763 github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*blockQuerier).Select(0xc47a54e080, 0xc6e12fb1a0, 0x2, 0x2, 0xc6e12fb240, 0xc5d0c00d20)
2017-09-25_13:05:07.99764       /home/grobie/code/go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/querier.go:135 +0x92 fp=0xc726366b20 sp=0xc726366a78 pc=0x15960c2
2017-09-25_13:05:07.99768 github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*querier).sel(0xc6e12fb080, 0xc4e318e0f0, 0x1, 0x1, 0xc6e12fb1a0, 0x2, 0x2, 0x1, 0xc6e12fb240)
2017-09-25_13:05:07.99769       /home/grobie/code/go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/querier.go:94 +0x1d3 fp=0xc726366b90 sp=0xc726366b20 pc=0x1595c23
2017-09-25_13:05:07.99769 github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*querier).Select(0xc6e12fb080, 0xc6e12fb1a0, 0x2, 0x2, 0x0, 0x2)
2017-09-25_13:05:07.99770       /home/grobie/code/go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/querier.go:85 +0x64 fp=0xc726366be8 sp=0xc726366b90 pc=0x1595a24
2017-09-25_13:05:07.99770 github.com/prometheus/prometheus/storage/tsdb.querier.Select(0x28a17a0, 0xc6e12fb080, 0xc691507f50, 0x2, 0x2, 0x1, 0xc6e12fb180)
2017-09-25_13:05:07.99771       /home/grobie/code/go/src/github.com/prometheus/prometheus/storage/tsdb/tsdb.go:166 +0x1c7 fp=0xc726366c98 sp=0xc726366be8 pc=0x15aab67
2017-09-25_13:05:07.99771 github.com/prometheus/prometheus/storage/tsdb.(*querier).Select(0xc4e318e100, 0xc691507f50, 0x2, 0x2, 0x0, 0x2)
2017-09-25_13:05:07.99772       <autogenerated>:1 +0x63 fp=0xc726366ce0 sp=0xc726366c98 pc=0x15ac023
2017-09-25_13:05:07.99773 github.com/prometheus/prometheus/storage.(*mergeQuerier).Select(0xc6e12fb000, 0xc691507f50, 0x2, 0x2, 0x60, 0xc6a977ff80)
2017-09-25_13:05:07.99774       /home/grobie/code/go/src/github.com/prometheus/prometheus/storage/fanout.go:183 +0x105 fp=0xc726366d88 sp=0xc726366ce0 pc=0x155afd5
2017-09-25_13:05:07.99775 github.com/prometheus/prometheus/promql.(*Engine).populateIterators.func2(0x2885da0, 0xc5222e48c0, 0x0)
2017-09-25_13:05:07.99775       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/engine.go:530 +0x579 fp=0xc726366e80 sp=0xc726366d88 pc=0x15d3629
2017-09-25_13:05:07.99776 github.com/prometheus/prometheus/promql.inspector.Visit(0xc46e83d830, 0x2885da0, 0xc5222e48c0, 0x1, 0xc726366f70)
2017-09-25_13:05:07.99777       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/ast.go:306 +0x3a fp=0xc726366ea8 sp=0xc726366e80 pc=0x15ad2aa
2017-09-25_13:05:07.99778 github.com/prometheus/prometheus/promql.Walk(0x288d120, 0xc46e83d830, 0x2885da0, 0xc5222e48c0)
2017-09-25_13:05:07.99778       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/ast.go:255 +0x58 fp=0xc726366f30 sp=0xc726366ea8 pc=0x15ac938
2017-09-25_13:05:07.99779 github.com/prometheus/prometheus/promql.Walk(0x288d120, 0xc46e83d830, 0x288d0a0, 0xc6e12fb160)
2017-09-25_13:05:07.99779       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/ast.go:275 +0x1a8 fp=0xc726366fb8 sp=0xc726366f30 pc=0x15aca88
2017-09-25_13:05:07.99780 github.com/prometheus/prometheus/promql.Walk(0x288d120, 0xc46e83d830, 0x2885d20, 0xc6e12faea0)
2017-09-25_13:05:07.99781       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/ast.go:285 +0x5df fp=0xc726367040 sp=0xc726366fb8 pc=0x15acebf
2017-09-25_13:05:07.99782 github.com/prometheus/prometheus/promql.Walk(0x288d120, 0xc46e83d830, 0x2885c60, 0xc686591040)
2017-09-25_13:05:07.99783       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/ast.go:278 +0x47c fp=0xc7263670c8 sp=0xc726367040 pc=0x15acd5c
2017-09-25_13:05:07.99783 github.com/prometheus/prometheus/promql.Walk(0x288d120, 0xc46e83d830, 0x2885ce0, 0xc590de7f80)
2017-09-25_13:05:07.99784       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/ast.go:281 +0x7d9 fp=0xc726367150 sp=0xc7263670c8 pc=0x15ad0b9
2017-09-25_13:05:07.99785 github.com/prometheus/prometheus/promql.Walk(0x288d120, 0xc46e83d830, 0x288d0a0, 0xc6e12fb120)
2017-09-25_13:05:07.99786       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/ast.go:275 +0x1a8 fp=0xc7263671d8 sp=0xc726367150 pc=0x15aca88
2017-09-25_13:05:07.99786 github.com/prometheus/prometheus/promql.Walk(0x288d120, 0xc46e83d830, 0x2885d20, 0xc6e12fafa0)
2017-09-25_13:05:07.99787       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/ast.go:285 +0x5df fp=0xc726367260 sp=0xc7263671d8 pc=0x15acebf
2017-09-25_13:05:07.99788 github.com/prometheus/prometheus/promql.Inspect(0x2885d20, 0xc6e12fafa0, 0xc46e83d830)
2017-09-25_13:05:07.99789       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/ast.go:316 +0x4b fp=0xc726367290 sp=0xc726367260 pc=0x15ad34b
2017-09-25_13:05:07.99789 github.com/prometheus/prometheus/promql.(*Engine).populateIterators(0xc42055ff20, 0x7f470de80580, 0xc81f14a000, 0xc686591180, 0x411bbc, 0xc4202ec540, 0xc7263673e8, 0x43c81c)
2017-09-25_13:05:07.99790       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/engine.go:515 +0x3e6 fp=0xc726367398 sp=0xc726367290 pc=0x15b09a6
2017-09-25_13:05:07.99791 github.com/prometheus/prometheus/promql.(*Engine).execEvalStmt(0xc42055ff20, 0x7f470de80580, 0xc81f14a000, 0xc47a54e000, 0xc686591180, 0x0, 0x0, 0x0, 0x0)
2017-09-25_13:05:07.99792       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/engine.go:361 +0xf5 fp=0xc726367740 sp=0xc726367398 pc=0x15aee75
2017-09-25_13:05:07.99793 github.com/prometheus/prometheus/promql.(*Engine).exec(0xc42055ff20, 0x7f470de80580, 0xc81f14a000, 0xc47a54e000, 0x0, 0x0, 0x0, 0x0)
2017-09-25_13:05:07.99793       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/engine.go:341 +0x317 fp=0xc7263677b0 sp=0xc726367740 pc=0x15aeb47
2017-09-25_13:05:07.99794 github.com/prometheus/prometheus/promql.(*query).Exec(0xc47a54e000, 0x7f470dbb3408, 0xc420285a40, 0xbe6a5da0fb1c75e0)
2017-09-25_13:05:07.99795       /home/grobie/code/go/src/github.com/prometheus/prometheus/promql/engine.go:179 +0x94 fp=0xc726367850 sp=0xc7263677b0 pc=0x15ada04
2017-09-25_13:05:07.99796 github.com/prometheus/prometheus/rules.RecordingRule.Eval(0xc522601a00, 0x3d, 0x28963e0, 0xc52272a0e0, 0x298ee80, 0x0, 0x0, 0x7f470dbb3408, 0xc420285a40, 0xbe6a5da0fb1c75e0, ...)
2017-09-25_13:05:07.99797       /home/grobie/code/go/src/github.com/prometheus/prometheus/rules/recording.go:61 +0xf7 fp=0xc726367998 sp=0xc726367850 pc=0x15fe787
2017-09-25_13:05:07.99798 github.com/prometheus/prometheus/rules.(*RecordingRule).Eval(0xc52272c180, 0x7f470dbb3408, 0xc420285a40, 0xbe6a5da0fb1c75e0, 0x13c38805a6aa, 0x296bd00, 0xc42055ff20, 0xc420606400, 0x0, 0x0, ...)
2017-09-25_13:05:07.99799       <autogenerated>:1 +0xd0 fp=0xc726367a40 sp=0xc726367998 pc=0x1602970
2017-09-25_13:05:07.99800 github.com/prometheus/prometheus/rules.(*Group).Eval.func1(0x1bb152e, 0x9, 0xc522756100, 0xbe6a5da0fb1c75e0, 0x13c38805a6aa, 0x296bd00, 0x2, 0x28a0fa0, 0xc52272c180)
2017-09-25_13:05:07.99801       /home/grobie/code/go/src/github.com/prometheus/prometheus/rules/manager.go:307 +0x194 fp=0xc726367d60 sp=0xc726367a40 pc=0x16000f4
2017-09-25_13:05:07.99802 github.com/prometheus/prometheus/rules.(*Group).Eval(0xc522756100, 0xbe6a5da0fb1c75e0, 0x13c38805a6aa, 0x296bd00)
2017-09-25_13:05:07.99803       /home/grobie/code/go/src/github.com/prometheus/prometheus/rules/manager.go:375 +0xbf fp=0xc726367de0 sp=0xc726367d60 pc=0x15fbf4f
2017-09-25_13:05:07.99804 github.com/prometheus/prometheus/rules.(*Group).run.func1()
2017-09-25_13:05:07.99804       /home/grobie/code/go/src/github.com/prometheus/prometheus/rules/manager.go:182 +0x81 fp=0xc726367e30 sp=0xc726367de0 pc=0x15ffd81
2017-09-25_13:05:07.99805 github.com/prometheus/prometheus/rules.(*Group).run(0xc522756100)
2017-09-25_13:05:07.99805       /home/grobie/code/go/src/github.com/prometheus/prometheus/rules/manager.go:207 +0x23c fp=0xc726367fb0 sp=0xc726367e30 pc=0x15fb21c
2017-09-25_13:05:07.99806 github.com/prometheus/prometheus/rules.(*Manager).ApplyConfig.func1.1(0xc420285a80, 0xc522756100)
2017-09-25_13:05:07.99807       /home/grobie/code/go/src/github.com/prometheus/prometheus/rules/manager.go:510 +0x46 fp=0xc726367fd0 sp=0xc726367fb0 pc=0x1601ec6
2017-09-25_13:05:07.99807 runtime.goexit()
2017-09-25_13:05:07.99807       /usr/lib/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc726367fd8 sp=0xc726367fd0 pc=0x45ca71
2017-09-25_13:05:07.99808 created by github.com/prometheus/prometheus/rules.(*Manager).ApplyConfig.func1
2017-09-25_13:05:07.99808       /home/grobie/code/go/src/github.com/prometheus/prometheus/rules/manager.go:505 +0x56
2017-09-25_13:05:07.99809 

screenshot from 2017-09-25 16-09-47

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Sep 25, 2017

Hmm, this looks like memory corruption and the only point where I can see that something could go wrong is this: https://github.com/prometheus/prometheus/blob/dev-2.0/storage/tsdb/tsdb.go#L251-L257

I am not sure even this could cause anything wrong as both definitions are the same. My simple tests revealed nothing wrong.


This is happening in headIndexReader which means that it is not from persisted data but from the data passed down from Prometheus. And the error usually is caused by compiler errors or unsafe usage and assuming it is actually caused by unsafe usage, the above referenced functions look most suspicious.

@grobie

This comment has been minimized.

Copy link
Member Author

grobie commented Oct 2, 2017

Our prometheus server scraping node_exporters in one of our datacenters has hit this problem 27 times during the last 7d.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Oct 10, 2017

@grobie considered fixed?

@grobie

This comment has been minimized.

Copy link
Member Author

grobie commented Oct 10, 2017

None of our servers have experienced such a panic since rc.0.

@grobie grobie closed this Oct 10, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.