Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issues on short block sizes #4952

Closed
kacper-jackiewicz opened this Issue Dec 4, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@kacper-jackiewicz
Copy link

kacper-jackiewicz commented Dec 4, 2018

Bug Report

We are testing Prometheus with Thanos to gather metrics from multiple Openshift clusters. On each we have significant amount of ephemeral pods, which are being scraped by Prometheus. We encounter strange memory consumption pattern during test that resembles our real time metrics almost 1:1. We are able to process millions of metrics when changes for particular label occurs every ~2 hours, while for the changes which are happening more often we see constant increase in memory consumption. We tried to lower retention time to 15 minutes, and block size down to 5 minutes, but this didn't help as a lot as we still were seeing the increase in memory consumption, even after the retention period. Further investigation has shown that short block sizes together with short retention caused degradation of performance of writing the blocks to disk. After 30 minutes of test the 5 minutes block was being flushed longer than 10 minutes, +/- 3m for head GC.

After locking label to change every 2 hours, we could easily use longer blocks and retention periods as memory consumption was stable. Memory consumption peaks at 16GB with average at 11GB for 3M metrics per minute, while for labels which were changing every minute for 1.5M after ~30m it crashed at ~100GB.

In both cases proportions between number of different metrics to number of blocks was constant, which raises a question, why is it behaving like that and how can we tweak the Prometheus to be able to perform better with high cardinality labels (e.g. cron job id / pod name). We are able to access and aggregate over such metrics in Thanos so Prometheus would need to be able to scrape and preserve data only for a time needed to flush block to disk.

What did you expect to see?
We expected to see proportional relation between block sizes, number of different metrics (cardinality / data points), retention and memory consumption.

What did you see instead? Under which circumstances?
Significant memory consumption increase for metrics with high cardinality / variable label

Environment
Openshift v.3.9 with live data
RHEL 6 VM with artificial data (Linux 2.6.32-696.18.7.el6.x86_64 x86_64)

  • Prometheus version:
    Tested for versions:
    2.3, 2.4, 2.5

  • Prometheus configuration file:
    3000 jobs, 1 minute scrape interval, 1-3M metrics(total)

@lstanczak

This comment has been minimized.

Copy link

lstanczak commented Dec 4, 2018

Pprof memory profile from tests:
prometheus.zip

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Dec 26, 2018

a duplicate of prometheus/tsdb#480 and we are discussing is there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.