Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus use much more RAMs than configed #3459

Closed
honorking opened this Issue Nov 13, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@honorking
Copy link

honorking commented Nov 13, 2017

  • what happened

image

  • System information:

     Linux 4.4.0-31-generic x86_64
    
  • Prometheus version:

    prometheus, version 1.4.1 (branch: master, revision:)
    build user: root@e685d23d8809
    build date: 20161128-09:59:22
    go version: go1.7.3

  • prometheus error logs
    time="2017-11-13T16:33:26+08:00" level=error msg="Storage needs throttling. Scrapes and rule evaluations will be skipped." chunksToPersist=27528012 maxChunksToPersist=27525120 maxToleratedMemChunks=60555264 memoryChunks=55066764 source="storage.go:908"
    time="2017-11-13T16:46:26+08:00" level=error msg="Storage needs throttling. Scrapes and rule evaluations will be skipped." chunksToPersist=27527741 maxChunksToPersist=27525120 maxToleratedMemChunks=60555264 memoryChunks=55067326 source="storage.go:908"
    time="2017-11-13T17:05:20+08:00" level=error msg="Storage needs throttling. Scrapes and rule evaluations will be skipped." chunksToPersist=27530310 maxChunksToPersist=27525120 maxToleratedMemChunks=60555264 memoryChunks=55060778 source="storage.go:908"
    time="2017-11-13T17:26:19+08:00" level=error msg="Storage needs throttling. Scrapes and rule evaluations will be skipped." chunksToPersist=27528404 maxChunksToPersist=27525120 maxToleratedMemChunks=60555264 memoryChunks=55062451 source="storage.go:908"

  • cmd_line output:

prometheus_local_storage_checkpoint_duration_seconds 809.391851948
prometheus_local_storage_chunk_ops_total{type="clone"} 1.7806917e+07
prometheus_local_storage_chunk_ops_total{type="create"} 5.6688908e+07
prometheus_local_storage_chunk_ops_total{type="drop"} 2.756955e+07
prometheus_local_storage_chunk_ops_total{type="evict"} 3.058718e+06
prometheus_local_storage_chunk_ops_total{type="load"} 1.573277e+06
prometheus_local_storage_chunk_ops_total{type="persist"} 2.6284268e+07
prometheus_local_storage_chunk_ops_total{type="pin"} 2.576032244e+09
prometheus_local_storage_chunk_ops_total{type="transcode"} 5.9461996e+07
prometheus_local_storage_chunk_ops_total{type="unpin"} 2.576032244e+09
prometheus_local_storage_chunkdesc_ops_total{type="evict"} 5.0182746e+07
prometheus_local_storage_chunkdesc_ops_total{type="load"} 1.0686893e+07
prometheus_local_storage_chunks_to_persist 2.7363845e+07
prometheus_local_storage_fingerprint_mappings_total 128
prometheus_local_storage_inconsistencies_total 0
prometheus_local_storage_indexing_batch_duration_seconds{quantile="0.5"} 9.637830991
prometheus_local_storage_indexing_batch_duration_seconds{quantile="0.9"} 47.032961849
prometheus_local_storage_indexing_batch_duration_seconds{quantile="0.99"} 51.199579204
prometheus_local_storage_indexing_batch_duration_seconds_sum 6129.972741586999
prometheus_local_storage_indexing_batch_duration_seconds_count 812
prometheus_local_storage_indexing_batch_sizes{quantile="0.5"} 1
prometheus_local_storage_indexing_batch_sizes{quantile="0.9"} 41
prometheus_local_storage_indexing_batch_sizes{quantile="0.99"} 69
prometheus_local_storage_indexing_batch_sizes_sum 3.963987e+06
prometheus_local_storage_indexing_batch_sizes_count 812
prometheus_local_storage_indexing_queue_capacity 16384
prometheus_local_storage_indexing_queue_length 0
prometheus_local_storage_ingested_samples_total 5.528012165e+09
prometheus_local_storage_maintain_series_duration_seconds{location="archived",quantile="0.5"} 0.003939789
prometheus_local_storage_maintain_series_duration_seconds{location="archived",quantile="0.9"} 0.0076295140000000004
prometheus_local_storage_maintain_series_duration_seconds{location="archived",quantile="0.99"} 0.033108174000000004
prometheus_local_storage_maintain_series_duration_seconds_sum{location="archived"} 1674.767393987979
prometheus_local_storage_maintain_series_duration_seconds_count{location="archived"} 238853
prometheus_local_storage_maintain_series_duration_seconds{location="memory",quantile="0.5"} 0.007118083000000001
prometheus_local_storage_maintain_series_duration_seconds{location="memory",quantile="0.9"} 0.012552301
prometheus_local_storage_maintain_series_duration_seconds{location="memory",quantile="0.99"} 0.040874002
prometheus_local_storage_maintain_series_duration_seconds_sum{location="memory"} 18608.19823851928
prometheus_local_storage_maintain_series_duration_seconds_count{location="memory"} 2.145201e+06
prometheus_local_storage_max_chunks_to_persist 2.752512e+07
prometheus_local_storage_memory_chunkdescs 3.46538924e+08
prometheus_local_storage_memory_chunks 5.5061705e+07
prometheus_local_storage_memory_series 3.116566e+06
prometheus_local_storage_non_existent_series_matches_total 0
prometheus_local_storage_out_of_order_samples_total{reason="multiple_values_for_timestamp"} 0
prometheus_local_storage_out_of_order_samples_total{reason="timestamp_out_of_order"} 0
prometheus_local_storage_persist_errors_total 0
prometheus_local_storage_persistence_urgency_score 0.9906942095075335
prometheus_local_storage_rushed_mode 1
prometheus_local_storage_series_ops_total{type="archive"} 109110
prometheus_local_storage_series_ops_total{type="create"} 16808
prometheus_local_storage_series_ops_total{type="maintenance_in_archive"} 231896
prometheus_local_storage_series_ops_total{type="maintenance_in_memory"} 2.145201e+06
prometheus_local_storage_series_ops_total{type="purge_from_archive"} 285
prometheus_local_storage_series_ops_total{type="purge_from_memory"} 0
prometheus_local_storage_series_ops_total{type="purge_on_request"} 0
prometheus_local_storage_series_ops_total{type="quarantine_completed"} 0
prometheus_local_storage_series_ops_total{type="quarantine_dropped"} 0
prometheus_local_storage_series_ops_total{type="quarantine_failed"} 0
prometheus_local_storage_series_ops_total{type="unarchive"} 54973
prometheus_local_storage_started_dirty 1

  • my problem

as described upon, i known throttling scrape is because
chunks_to_persist >max_chunks_to_persist ,
but
chunks_to_persist + prometheus_local_storage_memory_series < prometheus_local_storage_memory_chunks

and
( chunks_to_persist + prometheus_local_storage_memory_series ) * 3.9 kb ~= 119 Gb

so
why my prometheus server use ~200 Gb RAMs ?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Nov 13, 2017

It makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.