Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goroutine leak(s) #34

Closed
gebn opened this issue Oct 7, 2019 · 1 comment
Closed

Goroutine leak(s) #34

gebn opened this issue Oct 7, 2019 · 1 comment
Labels
bug Something isn't working

Comments

@gebn
Copy link
Owner

gebn commented Oct 7, 2019

There is known to be one when requests are abandoned (K8s ingress restart).

Another only reveals itself when one exporter is running.

@gebn gebn added the bug Something isn't working label Oct 7, 2019
@gebn
Copy link
Owner Author

gebn commented Oct 7, 2019

Pprof to the rescue. There are at least 2 leaks. The first was lots of goroutines stuck sending a scrape request to the Target's channel in ServeHTTP(). By grepping for the pointer, it emerged it was actually lots of goroutines for a small number of targets - these BMCs were simply slow. Running a single exporter created more contention, exacerbating the problem. The issue is the send has no way to terminate, even if Prometheus abandons the scrape. Interestingly, no single goroutine was >45 mins old; either this was the backlog, or there was some timeout (I presume the former). This queue caused the second leak, where lots of goroutines were stuck in waitpoll. As the request was still being served, the request goroutine could not terminate.

It's unclear whether the K8s ingress restart triggered this condition, or is a separate problem. The ingress will be restarted again at some point after this is fixed, so we can close this and keep an eye on goroutines, opening another issue later if there is still a bug.

@gebn gebn closed this as completed in 373c3ea Oct 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant