Crash recovery uses too much memory compared to target-heap-size #3038

JorritSalverda · 2017-08-08T13:09:37Z

What did you do?

I restarted Prometheus with the default heap size and - the by documentation suggested - 50% memory headroom with this flag:

-storage.local.target-heap-size=2147483648

And the following values for the Kubernetes pod:

resources.requests.memory: 3072Mi
resources.limits.memory: 3072Mi

What did you expect to see?

The memory usage to stay within reasonable limits so the Prometheus pod can recover.

What did you see instead? Under which circumstances?

When starting and running crash recovery I get errors like the following and Kubernetes killing the pod with OOMKilled as reason:

level=error msg="Error updating archived time range for fingerprint 00d5856f9d626ba1: write/prometheus/archived_fingerprint_to_timerange/001392.log: cannot allocate memory" source="storage.go:1791
level=error msg="Error dropping persisted chunks: read /prometheus/00/d5f1e1c2dd8195.db: cannot allocate memory" source="storage.go:1779
level=error msg="Error updating archived time range for fingerprint 00d5f1e1c2dd8195: write /prometheus/archived_fingerprint_to_timerange/001392.log: cannot allocate memory" source="storage.go:1791

This even happens at a memory usage of 4x the heap size (8192Mi in my case).

Environment

The official docker container prom/prometheus:v1.7.1 running on GKE version 1.7.2 with a 200Gi pd-ssd persistent volume.

resources.requests.cpu: 1500m
resources.limits.cpu: 5500m

System information:

Linux 4.4.52+ x86_64

Prometheus version:

prometheus, version 1.7.1 (branch: master, revision: 3afb3fffa3a29c3de865e1172fb740442e9d0133)
  build user:       root@0aa1b7fc430d
  build date:       20170612-11:44:05
  go version:       go1.8.3

Prometheus configuration file:

The default config as shown at https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml with only difference of

    global:
      scrape_interval: 10s
      evaluation_interval: 10s

Logs:

When the pod finally starts after a number of restarts it logs the following, which might give an indication as too how many time ranges, metrics, etc are processed by recovery.

time="2017-08-08T12:54:31Z" level=info msg="680000 archived time ranges checked." source="crashrecovery.go:480"
time="2017-08-08T12:54:31Z" level=info msg="Clean-up of archive indexes complete." source="crashrecovery.go:504"
time="2017-08-08T12:54:31Z" level=info msg="Rebuilding label indexes." source="crashrecovery.go:512"
time="2017-08-08T12:54:31Z" level=info msg="Indexing metrics in memory." source="crashrecovery.go:513"
time="2017-08-08T12:54:31Z" level=info msg="10000 metrics queued for indexing." source="crashrecovery.go:518"
...
time="2017-08-08T12:54:45Z" level=info msg="1090000 metrics queued for indexing." source="crashrecovery.go:518"
time="2017-08-08T12:54:45Z" level=info msg="Indexing archived metrics." source="crashrecovery.go:521"
time="2017-08-08T12:54:45Z" level=info msg="1100000 metrics queued for indexing." source="crashrecovery.go:534"
...
time="2017-08-08T12:55:02Z" level=info msg="1770000 metrics queued for indexing." source="crashrecovery.go:534"
time="2017-08-08T12:55:02Z" level=info msg="All requests for rebuilding the label indexes queued. (Actual processing may lag behind.)" source="crashrecovery.go:540"
time="2017-08-08T12:55:02Z" level=warning msg="Crash recovery complete." source="crashrecovery.go:153"
time="2017-08-08T12:55:03Z" level=info msg="1094563 series loaded." source="storage.go:439"
time="2017-08-08T12:55:03Z" level=info msg="Starting target manager..." source="targetmanager.go:63"
time="2017-08-08T12:55:03Z" level=info msg="Listening on :9090" source="web.go:259"
time="2017-08-08T12:55:03Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:104"
time="2017-08-08T12:55:03Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:104"
time="2017-08-08T12:55:03Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:104"
time="2017-08-08T12:55:03Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:104"
time="2017-08-08T12:55:03Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:104"
time="2017-08-08T12:55:03Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:104"
time="2017-08-08T12:55:03Z" level=info msg="Using pod service account via in-cluster config" source="kubernetes.go:104"
time="2017-08-08T12:55:04Z" level=warning msg="Storage has entered rushed mode." chunksToPersist=870434 memoryChunks=1648511 source="storage.go:1867" urgencyScore=1
time="2017-08-08T12:55:04Z" level=error msg="Storage needs throttling. Scrapes and rule evaluations will be skipped." chunksToPersist=870468 memoryChunks=1648547 source="storage.go:1007" urgencyScore=1
time="2017-08-08T13:00:03Z" level=info msg="Checkpointing in-memory metrics and chunks..." source="persistence.go:633"
time="2017-08-08T13:00:31Z" level=info msg="Done checkpointing in-memory metrics and chunks in 28.605482826s." source="persistence.go:665"

The text was updated successfully, but these errors were encountered:

beorn7 · 2017-08-08T16:40:30Z

Please check out #2139 . Bottom line: The sometimes excessive RAM usage during crash recovery is coming from LevelDB, which blows up during re-indexing everything (as it is required by the crash recovery). LevelDB won't be used anymore in Prometheus 2.x, so there is very little incentive to fix those issues. Sorry for that…

JorritSalverda · 2017-08-09T07:30:47Z

@beorn7 Thanks for answering so quickly. It make sense to make best use of your time and not address this. I'll manage in the meantime. Looking forward to version 2.

FYI I run Prometheus in HA setup on GKE using preemptibles, so the pods get relocated frequently - at least each 24 hours - making this startup process pretty crucial.

ahume · 2017-11-22T09:29:23Z

@JorritSalverda I'd be really interested to know more about your configuration for HA Prometheus on GKE. Is there anything you can share? Or any advice?

brian-brazil · 2017-11-22T09:45:34Z

@ahume Please don't ask unrelated questions on issues.

JorritSalverda · 2017-11-22T16:51:23Z

In version 2 this is no longer an issue, so I think we can close this ticket if everyone's okay with that.

beorn7 · 2017-11-23T09:57:11Z

This issue has been closed long ago.

lock · 2019-03-23T03:54:53Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

beorn7 closed this as completed Aug 8, 2017

lock bot locked and limited conversation to collaborators Mar 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash recovery uses too much memory compared to target-heap-size #3038

Crash recovery uses too much memory compared to target-heap-size #3038

JorritSalverda commented Aug 8, 2017

beorn7 commented Aug 8, 2017

JorritSalverda commented Aug 9, 2017

ahume commented Nov 22, 2017

brian-brazil commented Nov 22, 2017

JorritSalverda commented Nov 22, 2017

beorn7 commented Nov 23, 2017

lock bot commented Mar 23, 2019

Crash recovery uses too much memory compared to target-heap-size #3038

Crash recovery uses too much memory compared to target-heap-size #3038

Comments

JorritSalverda commented Aug 8, 2017

beorn7 commented Aug 8, 2017

JorritSalverda commented Aug 9, 2017

ahume commented Nov 22, 2017

brian-brazil commented Nov 22, 2017

JorritSalverda commented Nov 22, 2017

beorn7 commented Nov 23, 2017

lock bot commented Mar 23, 2019