Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange performance issues #3189

Closed
sashgorokhov opened this Issue Sep 18, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@sashgorokhov
Copy link

sashgorokhov commented Sep 18, 2017

For the last few days i've noticed a great prometheus performance leak. It is literally under 100% and memory and I/O usage has gone beyond limits (i set -storage.local.target-heap-size to 150mb). I also noticed a periodic gaps in metrics values in grafana. First time, I've thought it was some sort of a bug, and stopped prometheus, removed its volume and skipped a day or two without prometheus. Yesterday i launched it again, and again i need to shut it down because of the same insane performance usage.
If you need an additional info, please let me know. You may close this issue, but please, i want to know what went wrong with my poor little prometheus server.

P.S. May be the flag -storage.local.retention=604800s which i added recently has made prometheus insane?

Environment

Docker containers

  prometheus:
    image: prom/prometheus
    restart: unless-stopped
    network_mode: host
    command:
      - -config.file=/etc/prometheus/prometheus.yml
      - -storage.local.path=/prometheus
      - -web.console.libraries=/etc/prometheus/console_libraries
      - -web.console.templates=/etc/prometheus/consoles
      - -storage.local.target-heap-size=157286400 # 150mb
      - -storage.local.retention=604800s # 7 days
  • System information:
time="2017-09-18T16:29:42Z" level=info msg="Starting prometheus (version=1.7.1, branch=master, revision=3afb3fffa3a29c3de865e1172fb740442e9d0133)" source="main.go:88" 
time="2017-09-18T16:29:42Z" level=info msg="Build context (go=go1.8.3, user=root@0aa1b7fc430d, date=20170612-11:44:05)" source="main.go:89" 
time="2017-09-18T16:29:42Z" level=info msg="Host details (Linux 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 pikaspy (none))" source="main.go:90" 
  • Prometheus configuration file:
global:
  scrape_interval:     15s

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']

  - job_name: redis
    static_configs:
      - targets: ['127.0.0.1:9121']

  - job_name: node
    static_configs:
      - targets: ['127.0.0.1:9122']

  - job_name: nginx
    static_configs:
      - targets: ['127.0.0.1:9123']

  - job_name: cadvisor
    static_configs:
      - targets: ['127.0.0.1:9124']

  - job_name: scheduler
    static_configs:
      - targets: ['127.0.0.1:9125']

  - job_name: pushgateway
    honor_labels: true
    static_configs:
      - targets: ['127.0.0.1:9126']

  - job_name: web
    static_configs:
      - targets: ['127.0.0.1:9127']
  • Logs:
time="2017-09-18T16:29:52Z" level=warning msg="Storage has entered rushed mode." chunksToPersist=0 memoryChunks=72954 source="storage.go:1867" urgencyScore=1 
time="2017-09-18T16:29:52Z" level=info msg="Completed initial partial maintenance sweep through 20582 in-memory fingerprints in 253.988099ms." source="storage.go:1398" 
time="2017-09-18T16:29:54Z" level=info msg="Completed full maintenance sweep through 72954 in-memory fingerprints in 1.343759207s." source="storage.go:1398" 
time="2017-09-18T16:29:55Z" level=info msg="Completed full maintenance sweep through 72954 in-memory fingerprints in 846.866976ms." source="storage.go:1398" 
time="2017-09-18T16:29:56Z" level=error msg="Storage needs throttling. Scrapes and rule evaluations will be skipped." chunksToPersist=0 memoryChunks=72954 source="storage.go:1007" urgencyScore=1 
time="2017-09-18T16:31:50Z" level=info msg="Completed full maintenance sweep through 72954 in-memory fingerprints in 866.210736ms." source="storage.go:1398" 
time="2017-09-18T16:31:52Z" level=info msg="Completed full maintenance sweep through 72954 in-memory fingerprints in 1.345504189s." source="storage.go:1398" 
time="2017-09-18T16:31:53Z" level=info msg="Completed full maintenance sweep through 72954 in-memory fingerprints in 818.321266ms." source="storage.go:1398" 
time="2017-09-18T16:31:55Z" level=info msg="Completed full maintenance sweep through 72954 in-memory fingerprints in 1.458991238s." source="storage.go:1398" 
time="2017-09-18T16:31:56Z" level=info msg="Completed full maintenance sweep through 72954 in-memory fingerprints in 858.181051ms." source="storage.go:1398" 
time="2017-09-18T16:31:58Z" level=info msg="Completed full maintenance sweep through 72954 in-memory fingerprints in 1.488280281s." source="storage.go:1398" 
time="2017-09-18T16:31:59Z" level=info msg="Completed full maintenance sweep through 72954 in-memory fingerprints in 950.765037ms." source="storage.go:1398" 
time="2017-09-18T16:32:01Z" level=info msg="Completed full maintenance sweep through 72954 in-memory fingerprints in 1.986861457s." source="storage.go:1398" 
time="2017-09-18T16:32:03Z" level=info msg="Completed full maintenance sweep through 72954 in-memory fingerprints in 1.209511563s." source="storage.go:1398" 
^ repeats 10-20 times
@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Sep 19, 2017

150MiB is way too little for the 73k time series you have in your setup. Prometheus stops ingestion because it has hit the memory limit you have given to it.

Since this is not a bug but more a discussion about how to use Prometheus correctly in a certain scenario, it makes more sense to bring this to the prometheus-users mailing list rather than seeking support in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.

@beorn7 beorn7 closed this Sep 19, 2017

@sashgorokhov

This comment has been minimized.

Copy link
Author

sashgorokhov commented Sep 19, 2017

@beorn7 I'll try it, thank you!

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.