Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Crash Recovery Consumes Excessive Amount of Memory #4609

Open
PeterZaitsev opened this Issue Sep 14, 2018 · 6 comments

Comments

Projects
None yet
6 participants
@PeterZaitsev
Copy link

PeterZaitsev commented Sep 14, 2018

As Of Prometheus 2.3.2 Crash recovery can be excessively memory important leading to the case when normally running system is unable to ever recover after abnormal reboot.

How to repeat:

Run prometheus with high ingest rate, consuming 60% of memory
Kill -9 prometheus

If auto-restart is configured Prometheus may enter crash loop running out of memory during crash recovery and restarting again

The only way I found to recover from such situation is to restart prometheus disabling all targets; wait for recovery to complete and perform normal restart with all targets. This also confirms issue is crash recovery related.

Sorry not having exact repeatable example.

@violenti

This comment has been minimized.

Copy link

violenti commented Sep 19, 2018

Hi, I have the same problem with prometheus 2.3.2 but in kubernetes. Create a cluster federate and the principal nodo consumed 5 gb of memory ram. Is normal is performance?

The configure of the scrape federate is :

  • job_name: 'federate'
    scrape_interval: 60s
    honor_labels: true
    metrics_path: '/federate'

    params:
      'match[]':
         - '{__name__=~"job:.*"}'
         - '{job="prometheus"}'
         - '{job="kubernetes-nodes"}'
         - '{job="kubernetes-cadvisor"}'
         - '{name=~".+"}'
         - '{job="kubernetes-service-endpoints"}'
         - '{job="kubernetes-pods"}'
         - '{job="kubernetes-apiservers"}'
         - '{release="prometheus-production"}'
         - '{release="prometheus-development"}'
         - '{release="prometheus-sandbox"}'
         - '{pod_name=".+"}'
    static_configs:
      - targets:
        - 'prometheus.1'
        - 'prometheus2'
        - 'prometheus3'
    
@dswarbrick

This comment has been minimized.

Copy link

dswarbrick commented Nov 6, 2018

I'm also running into this problem quite frequently, with Prometheus 2.4.3. We have some moderately large instances, with about 2.8 M series and about 45k samples / sec. They are VMs, so it's relatively easy to add more memory to them, but one is already spec'd with 24 GB, and I'm getting a little nervous how much higher it's going to go.

The RAM usage settles down after about 20 minutes, but you have to get over that (very steep) hill first.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 7, 2018

Can you try out 2.6.0 and see if it's better? There's been a number of performance improvements made.

@viberan

This comment has been minimized.

Copy link

viberan commented Dec 20, 2018

Same issue with 2.6.0

@hectorhuertas

This comment has been minimized.

Copy link

hectorhuertas commented Feb 14, 2019

We are seeing the same issue in 2.7.1. We got two prometheus replicas in kubernetes with the same configuration and around 1M series. Replica 1 using around 8Gi, and replica 2 getting killed by kube when reaching it's 20Gi limit. 2 hours after raising the limit and letting it start, memory is down to 10 Gi.

@violenti

This comment has been minimized.

Copy link

violenti commented Feb 18, 2019

Can you try out 2.6.0 and see if it's better? There's been a number of performance improvements made.

Yes, I now run 2.6.1 and the performace is best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.