Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Issue #2472

Closed
kiranalii opened this Issue Mar 5, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@kiranalii
Copy link

kiranalii commented Mar 5, 2017

What did you do?
Hey! I'm using prometheus to store Netflow data. Each netflow record metric has 10 labels.

netflow_xxxxxxx_packetDeltaCount{From="xxxxxxxxx",destinationIPv4Address="xxxxxx",destinationTransportPort="xxxxxxx",egressInterface="1",flowDirection="1",ingressInterface="0",ipClassOfService="0",protocolIdentifier="6",sourceIPv4Address="172.31.16.49",sourceTransportPort="xxxxxx"} 1 1488721942012

Two more labels prometheus adds while scraping data are instance and job. Values of the labels changes so frequently that almost there are new timeseries for all samples.

What did you expect to see?
I expected promethues will work fine and won't crash and if i stop exporter then memory usage should decrese

Environment
Ubuntu 16.04

  • Prometheus version:
    1.5.2

What did you see instead? Under which circumstances?
If i make a query of one week, involving sum operation then prometheus loads those metrics to memory. Then what i expect is that prometheus will load those metric to memory and will keep those metric to memory for some time. Number of samples in memory will be equal to the value of following flag

storage.local.memory-chunks

In my server configurations, value of some flags are

storage.local.index-cache-size.fingerprint-to-metric 10485760
storage.local.index-cache-size.fingerprint-to-timerange 5242880
storage.local.index-cache-size.label-name-to-label-values 10485760
storage.local.index-cache-size.label-pair-to-fingerprints 20971520
storage.local.max-chunks-to-persist 524288
storage.local.memory-chunks 1048576

Right now for every 5 minutes it have to scrape around 1k samples. Prometheus scrapes data after every 5 minutes. I have machine of 8GB. When i make query to retrieve data of 1week then it occupies more memory. It seems after some some times prometheus does not release occupied memory. According to this flag storage.local.memory-chunks 1048576 it should occupy only 1GB of RAM. but memory grows up to 5 GB. Observed prometheus container behaviour using htop. If i stop my exporter then it have nothing to scrape and its memory should decrease but its remains to 5GB. There can 70k to 1million(in future) samples after every 5minutes for prometheus to scrape then it will required a lot of resources.

My question is that why after stopping exporter memory usage doesnot decrease?
Help would be much appreciated.

@danni-m

This comment has been minimized.

Copy link

danni-m commented Mar 5, 2017

Regardless the ticket, it usually recommended to keep label with low cardinality.
The nature of netflow data is a lot of src_ip-dst_ip-port pairs, that will make Prometheus to have a lot of series in memory (usually it keeps 3 chunks in memory per series).

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Mar 5, 2017

Recommended reading/watching:

If you have more questions, I recommend to discuss them on the prometheus-users mailing list. A GitHub issue is not very suitable for support questions.

@beorn7 beorn7 closed this Mar 5, 2017

@kiranalii

This comment has been minimized.

Copy link
Author

kiranalii commented Mar 5, 2017

sure thanks @danni-m and @beorn7

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.