Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support priority between jobs during throttling #3179

Closed
kaoet opened this Issue Sep 15, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@kaoet
Copy link

kaoet commented Sep 15, 2017

What did you do?
I configured prometheus to scrape both prometheus itself and other data.

When my underlying storage degrades, prometheus enters throttled mode.

What did you expect to see?
I can config priority of the prometheus job higher than any other jobs.

Then prometheus scrape itself at high priority, and other data at lower priority. So that I can monitor prometheus_local_storage_persistence_urgency_score, and rescue prometheus better.

What did you see instead? Under which circumstances?
I cannot config job priority.

Prometheus scrape all jobs at the same priority. And the job prometheus gets little chance to be scraped. Finally the line prometheus_local_storage_persistence_urgency_score is broken.

Environment

  • System information:
    Linux 4.4.0-93-generic x86_64

  • Prometheus version:
    prometheus, version 1.7.1 (branch: master, revision: 3afb3ff)
    build user: root@0aa1b7fc430d
    build date: 20170612-11:44:05
    go version: go1.8.3

  • Prometheus configuration file:

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
- job_name: 'prometheus'
  scrape_interval: 5s
  static_configs:
  - targets: ['localhost:9090']

- job_name: 'some-other-jobs'
  ... ...
  • Logs:
time="2017-09-15T02:17:02Z" level=warning msg="Storage has entered rushed mode." chunksToPersist=0 memoryChunks=167413 source="storage.go:1867" urgencyScore=1 
time="2017-09-15T02:17:02Z" level=error msg="Storage needs throttling. Scrapes and rule evaluations will be skipped." chunksToPersist=0 memoryChunks=167422 source="storage.go:1007" urgencyScore=1 
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 15, 2017

If your Prometheus is throttling, your monitoring is likely so badly broken that this is just rearranging deck chairs on the Titanic. For meta-monitoring, you should have another Prometheus to scrape it rather than always depending on it being healthy enough to do so itself.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.