Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heavy memory usage, keep getting OOM killed with data scrape only #4553

Closed
Ehekatl opened this Issue Aug 28, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@Ehekatl
Copy link

Ehekatl commented Aug 28, 2018

I'm running a prom instance with average of 100K ingested samples/s, 7 days rotation period, scrape from 4 targets every 15s, even without any query/rules running it still use 50% of system memory upon start, and memory usage is keep climbing till it get killed by OS.

screencapture-grafana-ops-agoralab-co-d-ahsfzfpmk-netdata-prometheus-perf-2018-08-28-15_36_17

Is this memory usage normal? How can I limit total memory consumed by prom (or make gc more often)?

Time: Aug 28, 2018 at 2:54pm (CST)
Showing nodes accounting for 6.02GB, 99.14% of 6.07GB total
Dropped 75 nodes (cum <= 0.03GB)
      flat  flat%   sum%        cum   cum%
    1.98GB 32.61% 32.61%     1.98GB 32.61%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*bstream).writeByte (inline)
    0.83GB 13.61% 46.22%     0.83GB 13.61%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*bstream).writeBit (inline)
    0.72GB 11.92% 58.14%     0.72GB 11.92%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.NewXORChunk (inline)
    0.46GB  7.60% 65.74%     0.46GB  7.60%  github.com/prometheus/prometheus/pkg/labels.(*Builder).Labels
    0.36GB  5.95% 71.69%     1.15GB 18.93%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*memSeries).cut
    0.29GB  4.71% 76.40%     0.29GB  4.71%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*decbuf).uvarintStr
    0.28GB  4.68% 81.08%     0.57GB  9.39%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*walReader).decodeSeries
    0.26GB  4.28% 85.36%     0.26GB  4.28%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.newMemSeries (inline)
    0.23GB  3.73% 89.09%     0.23GB  3.73%  github.com/prometheus/prometheus/pkg/textparse.(*Parser).Metric
    0.17GB  2.76% 91.85%     0.17GB  2.76%  github.com/prometheus/prometheus/scrape.newScrapePool.func1
    0.14GB  2.25% 94.10%     0.14GB  2.25%  github.com/prometheus/prometheus/scrape.(*scrapeCache).addRef (inline)
    0.10GB  1.72% 95.82%     0.10GB  1.72%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.seriesHashmap.set
    0.06GB  1.06% 96.88%     0.06GB  1.06%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*XORChunk).Appender
    0.05GB  0.87% 97.75%     0.05GB  0.87%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*MemPostings).addFor
    0.05GB  0.86% 98.61%     0.16GB  2.58%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*stripeSeries).getOrSet
    0.03GB  0.53% 99.14%     0.03GB  0.53%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*MemPostings).Delete
         0     0% 99.14%     0.17GB  2.76%  github.com/prometheus/prometheus/pkg/pool.(*Pool).Get
         0     0% 99.14%     4.70GB 77.42%  github.com/prometheus/prometheus/scrape.(*scrapeLoop).append
         0     0% 99.14%     4.87GB 80.20%  github.com/prometheus/prometheus/scrape.(*scrapeLoop).run
         0     0% 99.14%     3.88GB 63.82%  github.com/prometheus/prometheus/scrape.(*timeLimitAppender).Commit
         0     0% 99.14%     0.46GB  7.60%  github.com/prometheus/prometheus/scrape.mutateSampleLabels
         0     0% 99.14%     0.46GB  7.60%  github.com/prometheus/prometheus/scrape.newScrapePool.func2.1
         0     0% 99.14%     3.88GB 63.82%  github.com/prometheus/prometheus/storage.(*fanoutAppender).Commit
         0     0% 99.14%     3.88GB 63.82%  github.com/prometheus/prometheus/storage/tsdb.(*appender).Commit
         0     0% 99.14%     0.60GB  9.85%  github.com/prometheus/prometheus/storage/tsdb.Open
         0     0% 99.14%     3.88GB 63.82%  github.com/prometheus/prometheus/storage/tsdb.appender.Commit
         0     0% 99.14%     0.60GB  9.85%  github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run.func1
         0     0% 99.14%     0.03GB  0.57%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*DB).compact
         0     0% 99.14%     0.06GB  0.99%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*DB).reload
         0     0% 99.14%     0.03GB  0.57%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*DB).run
         0     0% 99.14%     0.57GB  9.39%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).ReadWAL
         0     0% 99.14%     0.09GB  1.46%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).ReadWAL.func1
         0     0% 99.14%     0.47GB  7.70%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).ReadWAL.func2
         0     0% 99.14%     0.03GB  0.54%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).Truncate
         0     0% 99.14%     0.03GB  0.54%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).gc
         0     0% 99.14%     0.47GB  7.73%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).getOrCreateWithID
         0     0% 99.14%     0.09GB  1.46%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).processWALSamples
         0     0% 99.14%     3.88GB 63.82%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*dbAppender).Commit
         0     0% 99.14%     3.88GB 63.82%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*headAppender).Commit
         0     0% 99.14%     3.96GB 65.15%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*memSeries).append
         0     0% 99.14%     0.57GB  9.39%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*repairingWALReader).Read
         0     0% 99.14%     0.57GB  9.39%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*walReader).Read
         0     0% 99.14%     0.47GB  7.70%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*walReader).Read.func1
         0     0% 99.14%     0.60GB  9.85%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.Open
         0     0% 99.14%     3.88GB 63.82%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.dbAppender.Commit
         0     0% 99.14%     2.60GB 42.82%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*bstream).writeBits
         0     0% 99.14%     2.81GB 46.22%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*xorAppender).Append
         0     0% 99.14%     1.68GB 27.73%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*xorAppender).writeVDelta
         0     0% 99.14%     0.05GB  0.87%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*MemPostings).Add
         0     0% 99.14%     0.60GB  9.85%  main.main.func20

Environment

  • System information:

    Linux 4.4.0-31-generic x86_64

  • Prometheus version:

    prometheus, version 2.3.2 (branch: HEAD, revision: 71af5e2)
    build user: root@5258e0bd9cc1
    build date: 20180712-14:02:52
    go version: go1.10.3

  • Sample of heap file:

heap.pprof.zip

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Aug 28, 2018

Thanks for your report. It looks as if this is actually a question about usage and not development. If I read correctly, Prometheus is using roughly 10GB of memory for 1.5M series in the head. This is in line with the empirical rule of requiring 8kB per active serie. Notet that the rule doesn't take into account query usage.

I'm closing it for now. If you have further questions, please use our user mailing list, which you can also search.

@Ehekatl

This comment has been minimized.

Copy link
Author

Ehekatl commented Aug 28, 2018

@simonpasquier I don't think this is just usage question, please consider reopen my issue

  1. to demonstrate why the memory consumption is abnormal, I split same amount of workload from 4 targets to 9 targets. With the same ingested samples rate and same number of active series, the total memory consumption had a significant drop.

screencapture-grafana-ops-agoralab-co-d-ahsfzfpmk-netdata-prometheus-perf-2018-08-28-19_51_10

  1. despite of the high mem issue, I think the number of active series keep in memory should be configurable (as well as the compaction ratio) , it was possible prior to prom 2
@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Aug 28, 2018

@Ehekatl What do you mean the memory usage dropped for 4 targets --> 9 targets? From what I see, a compaction was triggered which wrote a bunch of data to disk causing memory to drop.

Due to the architecture of the new TSDB, we keep 2hours of data in-memory and then compact it to disk. You could read about it here: https://fabxc.org/tsdb/

As @simonpasquier mentioned, I too think 10G for 1.5Mil series is too low and you should consider looking at bigger machines.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.