Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock (deadlock?) on flushing #390

Closed
grobie opened this Issue Apr 13, 2014 · 3 comments

Comments

Projects
None yet
2 participants
@grobie
Copy link
Member

grobie commented Apr 13, 2014

A prometheus server got unusable (no more API queries possible) and needed a hard kill to get it back in a usable state. All metrics were lost since the moment the flushing started. That's the output of the log.

2014-04-13_16:25:24.90409 I0413 16:25:24.902636 21513 tiered.go:306] Flushing samples to disk...
2014-04-13_16:35:24.90230 W0413 16:35:24.902201 21513 tiered.go:273] Backlogging on flush...
2014-04-13_16:45:24.90230 W0413 16:45:24.902188 21513 tiered.go:273] Backlogging on flush...
2014-04-13_16:55:24.90231 W0413 16:55:24.902199 21513 tiered.go:273] Backlogging on flush...
2014-04-13_17:05:24.90255 W0413 17:05:24.902190 21513 tiered.go:273] Backlogging on flush...
2014-04-13_17:15:24.90229 W0413 17:15:24.902189 21513 tiered.go:273] Backlogging on flush...
2014-04-13_17:25:24.90230 W0413 17:25:24.902198 21513 tiered.go:273] Backlogging on flush...
2014-04-13_17:35:24.90253 W0413 17:35:24.902203 21513 tiered.go:273] Backlogging on flush...
2014-04-13_17:45:24.90229 W0413 17:45:24.902194 21513 tiered.go:273] Backlogging on flush...
2014-04-13_17:55:24.90231 W0413 17:55:24.902221 21513 tiered.go:273] Backlogging on flush...
2014-04-13_18:05:24.90229 W0413 18:05:24.902192 21513 tiered.go:273] Backlogging on flush...
2014-04-13_18:15:24.90229 W0413 18:15:24.902219 21513 tiered.go:273] Backlogging on flush...
2014-04-13_18:25:24.90098 I0413 18:25:24.900878 21513 main.go:316] Starting compaction...
2014-04-13_18:25:24.90120 W0413 18:25:24.900918 21513 main.go:106] Deferred compaction for 5m0s and %!s(int=500) due to existing operation.
2014-04-13_18:25:24.90168 I0413 18:25:24.900943 21513 main.go:322] Done
2014-04-13_18:25:24.90226 W0413 18:25:24.902193 21513 tiered.go:273] Backlogging on flush...
2014-04-13_18:35:24.90232 W0413 18:35:24.902199 21513 tiered.go:273] Backlogging on flush...
2014-04-13_18:45:24.90229 W0413 18:45:24.902192 21513 tiered.go:273] Backlogging on flush...
2014-04-13_18:55:24.90229 W0413 18:55:24.902191 21513 tiered.go:273] Backlogging on flush...
2014-04-13_19:05:24.90232 W0413 19:05:24.902200 21513 tiered.go:273] Backlogging on flush...
2014-04-13_19:15:24.90230 W0413 19:15:24.902185 21513 tiered.go:273] Backlogging on flush...
2014-04-13_19:25:24.90232 W0413 19:25:24.902231 21513 tiered.go:273] Backlogging on flush...
2014-04-13_19:35:24.90232 W0413 19:35:24.902216 21513 tiered.go:273] Backlogging on flush...
2014-04-13_19:45:24.90231 W0413 19:45:24.902219 21513 tiered.go:273] Backlogging on flush...
2014-04-13_19:55:24.90233 W0413 19:55:24.902228 21513 tiered.go:273] Backlogging on flush...
2014-04-13_20:05:24.90233 W0413 20:05:24.902227 21513 tiered.go:273] Backlogging on flush...
2014-04-13_20:15:24.90229 W0413 20:15:24.902191 21513 tiered.go:273] Backlogging on flush...
2014-04-13_20:25:24.90103 I0413 20:25:24.900917 21513 main.go:328] Starting deletion of stale values...
2014-04-13_20:25:24.90128 W0413 20:25:24.900961 21513 main.go:135] Deferred deletion for 360h0m0s due to existing operation.
2014-04-13_20:25:24.90159 I0413 20:25:24.900983 21513 main.go:334] Done
2014-04-13_20:25:24.90227 W0413 20:25:24.902183 21513 tiered.go:273] Backlogging on flush...
2014-04-13_20:35:24.90230 W0413 20:35:24.902194 21513 tiered.go:273] Backlogging on flush...
2014-04-13_20:45:24.90231 W0413 20:45:24.902202 21513 tiered.go:273] Backlogging on flush...
2014-04-13_20:55:24.90229 W0413 20:55:24.902195 21513 tiered.go:273] Backlogging on flush...
2014-04-13_21:05:24.90232 W0413 21:05:24.902200 21513 tiered.go:273] Backlogging on flush...
2014-04-13_21:15:24.90232 W0413 21:15:24.902229 21513 tiered.go:273] Backlogging on flush...
2014-04-13_21:22:46.19764 W0413 21:22:46.197011 21513 main.go:97] Received SIGINT/SIGTERM; Exiting gracefully...
2014-04-13_21:22:46.19766 I0413 21:22:46.197052 21513 targetmanager.go:120] Target manager exiting...
2014-04-13_21:25:24.90230 W0413 21:25:24.902194 21513 tiered.go:273] Backlogging on flush...

Prometheus: 0.3.0
OS: wheezy 7.0 amd64

@grobie grobie added bug labels Apr 13, 2014

@matttproud

This comment has been minimized.

Copy link
Member

matttproud commented Apr 13, 2014

As discussed over e-mail, this is likely the result of a deadlock.

@grobie

This comment has been minimized.

Copy link
Member Author

grobie commented Apr 13, 2014

It happened again. Same symptoms, SIGTERM wasn't respected, KILL necessary to get the server running again. I downgraded to version 0.2.1 for now, maybe there is a regression in 0.3.0.

2014-04-13_21:49:12.59679 I0413 21:49:12.596461 18216 tiered.go:306] Flushing samples to disk...
2014-04-13_21:59:12.59552 W0413 21:59:12.595408 18216 tiered.go:273] Backlogging on flush...
2014-04-13_22:06:39.53878 W0413 22:06:39.538145 18216 main.go:97] Received SIGINT/SIGTERM; Exiting gracefully...
2014-04-13_22:06:39.53880 I0413 22:06:39.538184 18216 targetmanager.go:120] Target manager exiting...

@juliusv juliusv closed this in 1b29975 Apr 17, 2014

simonpasquier pushed a commit to simonpasquier/prometheus that referenced this issue Oct 12, 2017

Merge pull request prometheus#390 from prometheus/remove-stars
Move GitHub star button to "Open Source" section.
@lock

This comment has been minimized.

Copy link

lock bot commented Mar 25, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 25, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.