Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus is eating almost 6GB memory, how could this be possible? What's the end of the memory usage? #2358
Comments
This comment has been minimized.
This comment has been minimized.
|
add args to prometheus on 176 since we have enough memory refer to #1459 |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
With those settings Prometheus will use at least 26GB of RAM. |
This comment has been minimized.
This comment has been minimized.
|
How could you calc out 26GB of RAM? |
This comment has been minimized.
This comment has been minimized.
|
What is your advice to lower down the usage of RAM? |
This comment has been minimized.
This comment has been minimized.
songjiayang
commented
Jan 23, 2017
•
|
It confuses me, as I known, a Chunk almost is 1 KB, so memory-chunks=6666666 will be 6 GB, are there some points I miss ? |
This comment has been minimized.
This comment has been minimized.
|
There's various overheads, so on 1.4.1 you're talking a minimum of 3.9KB. |
This comment has been minimized.
This comment has been minimized.
|
Is there any formula to calculate that the usage of RAM will be 26GB? |
This comment has been minimized.
This comment has been minimized.
|
What is the relationship between a minimum of 3.9KB chunks and 26GB of RAM? @brian-brazil |
This comment has been minimized.
This comment has been minimized.
|
Yeah, It's now using almost 6GB because I rise the value of chunks, and when it comes to 6GB, the prometheus server is likely to crash down, and keeps restarting until I can do nothing but restart the prometheus node manually. @songjiayang |
This comment has been minimized.
This comment has been minimized.
songjiayang
commented
Jan 24, 2017
•
|
@xixikaikai so bad, you can try to set |
This comment has been minimized.
This comment has been minimized.
|
@songjiayang Have you encountered this problem before in your product? |
This comment has been minimized.
This comment has been minimized.
songjiayang
commented
Jan 24, 2017
|
@xixikaikai Memory problem is big problem, Maybe you can get more information in https://prometheus.io/docs/operating/storage/#persistence-pressure-and-rushed-mode |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil you mean that one memory chunk in prometheus is 3.9KB, so we can calculate the usage of RAM if we config prometheus. |
brian-brazil
closed this
Mar 27, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |

xixikaikai commentedJan 23, 2017
What did you do?

I make the following changes
What did you expect to see?
I hope the memory will fall down at some point, and will not rise up endlessly,
and when I limit the memory to 1G, the prometheus server is so busy restarting prometheus
on marathon(dcos+mesos) that the server bootstap can not be access via ssh channel, I guest that the disk is so busy by the restarting of prometheus, and prometheus need to recover from the disk and write to disk when asked to fall down.
What did you see instead? Under which circumstances?
curl -s http://172.19.0.176:31090/metrics | grep '^prometheus_local_storage'
prometheus_local_storage_checkpoint_duration_seconds 5.366887795
prometheus_local_storage_chunk_ops_total{type="clone"} 1
prometheus_local_storage_chunk_ops_total{type="create"} 2.575955e+06
prometheus_local_storage_chunk_ops_total{type="load"} 421
prometheus_local_storage_chunk_ops_total{type="persist"} 2.460509e+06
prometheus_local_storage_chunk_ops_total{type="pin"} 8353
prometheus_local_storage_chunk_ops_total{type="transcode"} 2.486157e+06
prometheus_local_storage_chunk_ops_total{type="unpin"} 8353
prometheus_local_storage_chunkdesc_ops_total{type="evict"} 25448
prometheus_local_storage_chunkdesc_ops_total{type="load"} 125
prometheus_local_storage_chunks_to_persist 94579
prometheus_local_storage_fingerprint_mappings_total 0
prometheus_local_storage_inconsistencies_total 0
prometheus_local_storage_indexing_batch_duration_seconds{quantile="0.5"} 0.014941447000000002
prometheus_local_storage_indexing_batch_duration_seconds{quantile="0.9"} 0.016533442000000002
prometheus_local_storage_indexing_batch_duration_seconds{quantile="0.99"} 0.019961967
prometheus_local_storage_indexing_batch_duration_seconds_sum 137.80948704499997
prometheus_local_storage_indexing_batch_duration_seconds_count 11727
prometheus_local_storage_indexing_batch_sizes{quantile="0.5"} 1
prometheus_local_storage_indexing_batch_sizes{quantile="0.9"} 1
prometheus_local_storage_indexing_batch_sizes{quantile="0.99"} 1
prometheus_local_storage_indexing_batch_sizes_sum 11995
prometheus_local_storage_indexing_batch_sizes_count 11727
prometheus_local_storage_indexing_queue_capacity 16384
prometheus_local_storage_indexing_queue_length 0
prometheus_local_storage_ingested_samples_total 1.208599541e+09
prometheus_local_storage_maintain_series_duration_seconds{location="archived",quantile="0.5"} NaN
prometheus_local_storage_maintain_series_duration_seconds{location="archived",quantile="0.9"} NaN
prometheus_local_storage_maintain_series_duration_seconds{location="archived",quantile="0.99"} NaN
prometheus_local_storage_maintain_series_duration_seconds_sum{location="archived"} 0
prometheus_local_storage_maintain_series_duration_seconds_count{location="archived"} 0
prometheus_local_storage_maintain_series_duration_seconds{location="memory",quantile="0.5"} 0.003913998
prometheus_local_storage_maintain_series_duration_seconds{location="memory",quantile="0.9"} 0.006114805
prometheus_local_storage_maintain_series_duration_seconds{location="memory",quantile="0.99"} 0.022362947
prometheus_local_storage_maintain_series_duration_seconds_sum{location="memory"} 1289.5624914789798
prometheus_local_storage_maintain_series_duration_seconds_count{location="memory"} 285090
prometheus_local_storage_max_chunks_to_persist 3.333332e+06
prometheus_local_storage_memory_chunkdescs 3.155664e+06
prometheus_local_storage_memory_chunks 2.576376e+06
prometheus_local_storage_memory_series 20889
prometheus_local_storage_non_existent_series_matches_total 0
prometheus_local_storage_out_of_order_samples_total{reason="multiple_values_for_timestamp"} 0
prometheus_local_storage_out_of_order_samples_total{reason="timestamp_out_of_order"} 0
prometheus_local_storage_persist_errors_total 0
prometheus_local_storage_persistence_urgency_score 0.02837491134996454
prometheus_local_storage_rushed_mode 0
prometheus_local_storage_series_ops_total{type="archive"} 1248
prometheus_local_storage_series_ops_total{type="create"} 11995
prometheus_local_storage_series_ops_total{type="maintenance_in_archive"} 0
prometheus_local_storage_series_ops_total{type="maintenance_in_memory"} 285090
prometheus_local_storage_series_ops_total{type="purge_from_archive"} 0
prometheus_local_storage_series_ops_total{type="purge_from_memory"} 0
prometheus_local_storage_series_ops_total{type="purge_on_request"} 0
prometheus_local_storage_series_ops_total{type="quarantine_completed"} 0
prometheus_local_storage_series_ops_total{type="quarantine_dropped"} 0
prometheus_local_storage_series_ops_total{type="quarantine_failed"} 0
prometheus_local_storage_series_ops_total{type="unarchive"} 6
prometheus_local_storage_started_dirty 0
Environment
4 cpu
16G mem
System information:
Linux 3.10.0-327.36.1.el7.x86_64 x86_64
centos 7.0+
Prometheus version:
1.4.1
Alertmanager version:
no
Prometheus configuration file:
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
Attach these labels to any time series or alerts when communicating with
external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'lkt-monitor-prod'
A scrape configuration containing exactly one endpoint to scrape:
Here it's Prometheus itself.
scrape_configs:
The job name is added as a label
job=<job_name>to any timeseries scraped from this config.job_name: 'lkt-prometheus-prod'
Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
labels:
instance: prometheus
job_name: 'host'
scrape_interval: 5s
metrics_path: '/metrics'
scheme: 'http'
static_configs:
'172.19.0.175:31902',
'172.19.0.176:31902',
'172.19.0.177:31902',
'172.19.0.176:31666',
'172.19.0.176:31888'
]
PS:
31666 is for mesos exporter metric
31666 is for marathon exporter metric
Alertmanager configuration file:
no
Logs: