Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging cause of high CPU usage queries #4923

Closed
weeco opened this Issue Nov 27, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@weeco
Copy link

weeco commented Nov 27, 2018

I am reposting my issue of debugging high cpu usage queries here, since I haven't got any responses/ideas on stackoverflow, nor from colleagues who also run prometheus clusters. I want to figure out how I could log prometheus queries which cause high cpu usage.

The use case and more details are given in this stackoverflow post: https://stackoverflow.com/questions/53432660/figuring-out-high-cpu-usage-queries-in-prometheus

I hope someone can help with some ideas. I saw an issue about debugging slow queries, but I am afraid this is not going to help me in the near future because this has been opened for nearly 3 years now: #1315 .

@jaredeis

This comment has been minimized.

Copy link

jaredeis commented Dec 12, 2018

I'm new to supporting Prometheus and this is one of my concerns. There's a fine line between scaling and correct usage, and if there was a way to determine what's hammering the CPU it would help. We have quite a few people on-boarding into our EKS architecture, which includes istio, and we have a ton of metrics now. It would be great to know if some badly formed query on a dashboard somewhere is causing the hurt.

@fanhaozzu

This comment has been minimized.

Copy link

fanhaozzu commented Mar 1, 2019

Same concerns and problem

@fanhaozzu

This comment has been minimized.

Copy link

fanhaozzu commented Mar 1, 2019

Same concerns and problems

@weeco

This comment has been minimized.

Copy link
Author

weeco commented Mar 1, 2019

We mitigated the issue by setting a very strict limit for max samples. Some dashboards were broken due to the limit, but they were probably also causing high CPU usages as we don't have these issues anymore.

Still I think there should be a better way for profiling the cpu usage

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Mar 4, 2019

I'm closing it for now. If you have further questions, please use our user mailing list, which you can also search.

@weeco

This comment has been minimized.

Copy link
Author

weeco commented Mar 7, 2019

@simonpasquier I believe this issue should not be closed because the issue itself exists. I just found a workaround without knowing if this was actually the cause. Others may still have issues with other root causes.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Mar 8, 2019

#1315 already exists for a similar reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.