Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign up2.0.0-beta.3 long sluggish startup #3166
Comments
This comment has been minimized.
This comment has been minimized.
|
Update: This host is experiencing bad CPU steal, which is likely causing these issues. A duplicate host performed much better on initial startup. I'll continue to monitor both hosts and report anything I find. But I'm going to close this issue because it looks like it's just me. Sorry for the noise! |
TimSimmons
closed this
Sep 13, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
lock
bot
locked and limited conversation to collaborators
Mar 23, 2019
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
TimSimmons commentedSep 13, 2017
•
edited
What did you do?
Installed Prometheus
2.0.0-beta.3on a 64g RAM 20 CPU cloud VM.What did you expect to see?
Relatively stable collection of around 25-30k metrics/s across ~6.5 million time series from ~2600 targets.
What did you see instead? Under which circumstances?
When I start Prometheus, there is a long "spin-up" time where it seems to work on something and starts scraping slowly before eventually getting up to speed. During this time (30min+), the UI is very slow to respond (
/targetstaking >60s, simple queries timing out,/statustaking 5+ seconds), and the CPU load high. All targets are not scraped for the first time for over an hour. Eventually the behavior levels out and becomes consistent with what you would expect.Environment
System information:
Linux 4.4.0-78-generic x86_64Prometheus version:
pprof svgs:
inuse: https://cdn.rawgit.com/TimSimmons/c6e25dbe3fcdffd9c6983c6cca6afef8/raw/96703350554d7e7aaf2c023a92eb91d7e7f7a815/inuse.svgalloc_space: https://cdn.rawgit.com/TimSimmons/e3ff71417a269af0d21713c59fef807f/raw/b615602b721dae74b9be11a364e78f82bf230fe0/alloc.svgcpuprofilefor 30s after it has recovered: https://cdn.rawgit.com/TimSimmons/f33114c2d73e6205f391e46e03ab3259/raw/59429c87d024dfa488bb8be437903dc873842a66/healthycpu30s.svgcpuprofilefor 30s while starting up:https://cdn.rawgit.com/TimSimmons/53625cb5b30311f1def449755f6324f8/raw/9384f3acfd1e58bbb09dcc6a409841a260bc7688/startingup30s.svg
cpuprofilefor 300s while starting up: https://cdn.rawgit.com/TimSimmons/69f5fdc6e23d3e375d3186e38471b673/raw/e15181ea014d9e2dc474b7e7446b5cf64f2ed2a9/startingup300s.svgblockwhile starting up: https://cdn.rawgit.com/TimSimmons/d9487c268e1bcde4818c19153e98eb3d/raw/8b9735202a1c82289e52daf550805f2d3ef86a69/block.svgmutexwhile starting up: https://cdn.rawgit.com/TimSimmons/f7cac9d1fe0ed2b5a7c5c606b385486a/raw/50f1cc771bf202e5696174c322640533cdf6a629/mutex.svgGraphs
Restarts are indicated by the by the local minima in the top graph.
Note: I believe I saw similar behavior on
beta.0, but it was a much shorter time, maybe 3-5 minutes. This often goes over an hour.I also saw, after about 12 hours that it ran up against my limit of 25k open files, which wasn't a problem before.
This is new, every once in a while I'll get a few of these popping up: