Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttled scraping with small heap size doesn't get logged. #3011

Closed
discordianfish opened this Issue Aug 1, 2017 · 11 comments

Comments

Projects
None yet
4 participants
@discordianfish
Copy link
Member

discordianfish commented Aug 1, 2017

I'm running a tiny prometheus instance (v1.7.1, 50 targets, 2500 series, 70 samples/second). When setting -storage.local.target-heap-size "too low" Prometheus silently stops scraping my targets. Some targets got scraped but became stale for >10 minutes while others never got scraped at all.
In my case -storage.local.target-heap-size 50000000 is 'too low' while 90000000 works without problems.

For such tiny setup 90MB heap size seems already quite a lot, but either way Prometheus should log something in this case.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Aug 1, 2017

That's intended behavior. If your desired heap size cannot support your performance demands, scrapes are being skipped. Skipping that many is unfortunate of course, but I don't think there's a feasible way to change this right now.

prometheus_target_skipped_scrapes_total should indicate this. Not sure whether we should log here. Seems like something that can very easily become the next source of extreme logging noise.
Setting up an alert on the metric should be a better alternative – though I'm not 100% sure right now whether alerts would still work in this case as they also write samples

@beorn7

@discordianfish

This comment has been minimized.

Copy link
Member Author

discordianfish commented Aug 1, 2017

@fabxc Ah didn't know about that metric. Well, monitoring prometheus_target_skipped_scrapes_total is okay I suppose. Maybe we should provide a set of example alerts for all these things? My main concern is that right now the average user won't know about these and have a hard time realizing what is going on.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 1, 2017

I think #1481 would help with a lot of these in user support terms.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Aug 7, 2017

Are you sure nothing was logged? Prometheus should log Storage needs throttling. Scrapes and rule evaluations will be skipped.
Perhaps there is a cornercase for such small instances that it doesn't get triggered. But I doubt that.

@discordianfish could you please double check that the above log message is not logged? It should be on error level. And please also check if you ever see Storage does not need throttling anymore. on info level.

@beorn7 beorn7 changed the title Low target heap size causes scrapes to stall Throttled scraping with small heap size does not appear to get logged. Aug 7, 2017

@discordianfish

This comment has been minimized.

Copy link
Member Author

discordianfish commented Aug 7, 2017

@beorn7 Just confirmed, it's not logging that. It's easy to reproduce too, just set the heap too small. With low heap size you see frequent frequent sweeps logged but no throttling messages:

time="2017-08-07T16:29:14Z" level=info msg="Completed full maintenance sweep through 34 in-memory fingerprints in 609.702µs." source="storage.go:1398" 
time="2017-08-07T16:29:24Z" level=info msg="Completed full maintenance sweep through 34 in-memory fingerprints in 1.984457ms." source="storage.go:1398" 
time="2017-08-07T16:29:34Z" level=info msg="Completed full maintenance sweep through 34 in-memory fingerprints in 1.825138ms." source="storage.go:1398" 
@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Aug 7, 2017

Eeeenteresting…

@beorn7 beorn7 changed the title Throttled scraping with small heap size does not appear to get logged. Throttled scraping with small heap size doesn't get logged. Aug 7, 2017

@beorn7 beorn7 self-assigned this Aug 7, 2017

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Aug 18, 2017

@discordianfish I tried to reproduce as you described, but I do get the correct logging:

prometheus -storage.local.target-heap-size 50000000
INFO[0000] Starting prometheus (version=, branch=, revision=)  source="main.go:88"
INFO[0000] Build context (go=go1.8.1, user=, date=)      source="main.go:89"
INFO[0000] Host details (Linux 4.4.0-92-generic #115-Ubuntu SMP Thu Aug 10 09:04:33 UTC 2017 x86_64 reasonableresolution (none))  source="main.go:90"
INFO[0000] Loading configuration file prometheus.yml     source="main.go:252"
INFO[0000] Loading series map and head chunks...         source="storage.go:428"
INFO[0000] 569 series loaded.                            source="storage.go:439"
INFO[0000] Listening on :9090                            source="web.go:259"
INFO[0000] Starting target manager...                    source="targetmanager.go:63"
WARN[0016] Storage has entered rushed mode.              chunksToPersist=0 memoryChunks=582 source="storage.go:1867" urgencyScore=1
ERRO[0016] Storage needs throttling. Scrapes and rule evaluations will be skipped.  chunksToPersist=0 memoryChunks=582 source="storage.go:1007" urgencyScore=1
INFO[0020] Completed initial partial maintenance sweep through 154 in-memory fingerprints in 10.004917421s.  source="storage.go:1398"
INFO[0030] Completed full maintenance sweep through 582 in-memory fingerprints in 12.477596ms.  source="storage.go:1398"
INFO[0040] Completed full maintenance sweep through 582 in-memory fingerprints in 20.384172ms.  source="storage.go:1398"
INFO[0050] Completed full maintenance sweep through 582 in-memory fingerprints in 18.831748ms.  source="storage.go:1398"

As you can see, line starting ERRO[0016] tells about the throttling.
I need more information to reproduce.
From staring at the code, I can hardly see a way this does not get logged (unless the logging goroutine never gets scheduled or stdout is blocked somehow...)

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 21, 2017

@discordianfish Can you confirm prometheus_target_skipped_scrapes_total is not being incremented?

@discordianfish

This comment has been minimized.

Copy link
Member Author

discordianfish commented Aug 21, 2017

It was incremented and I can't reproduce the behavior anymore. Now I see storage throttling being logged.
Maybe I missed the 'Storage needs throttling' message or runsvdir somehow ate it. Going to close this for now and will re-open with something to reproduce it if I run into this again.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 21, 2017

prometheus_target_skipped_scrapes_total happens if scrapes aren't happening fast enough, usually due to being starved of resources - the storage here I guess.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.