Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign uprushed mode after decreasing retention time. #1740
Comments
This comment has been minimized.
This comment has been minimized.
|
After two hours of weird disk and cpu activity we left rushed mode and checkpoint time slowly go down:
Config file still the same, with only ony target. |
This comment has been minimized.
This comment has been minimized.
|
This is all "working as intended". Even if you have only one current target, you have a lot of older data in your storage. If you reduce the retention time by a lot, Prometheus will essentially rewrite your whole data into smaller files. That requires a lot of disk i/o. "Rushed mode" is a mean to work through the backlog, so it's good and not bad that Prometheus switches into rushed mode. What's bad is if it throttles ingestion. Apparently, that happened in your case, but the log line you posted above tells us that throttling has just ended. So you might have only suffered a small amount of throttling. Your checkpointing time seems OK to me, too. You had 5217499 chunks to persist. Those are the ones that have to be checkpointed. Your checkpoint file will be about 6GiB in size, so your checkpointing write speed is ~15MiB/s. |
beorn7
closed this
Jun 15, 2016
This comment has been minimized.
This comment has been minimized.
|
Problem is: i have debug log level and i don't know what's going on with prometheus :( |
This comment has been minimized.
This comment has been minimized.
|
What are you missing? |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Andor commentedJun 14, 2016
•
edited
history
I had ~1M timeseries and retention time 1440h, 470Gb data.
Retention time decreased to 360h.
After apply these changes I have rushed mode and never scrape localhost.
Also, I have very long checkpointing:
According to
atopprometheus constantly writing/reading disks, ~100% disk utilization:For 15 minutes running prometheus i have incredible write values:
I'm using hardware raid10 spinning SAS disks.
current config
running options:
before:
after:
prometheus version
I'm using prometheus 1.19.1 with custom service discovery via etcd