Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

promethues 2.3.2 memory leakage #4774

Closed
mrsiano opened this Issue Oct 23, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@mrsiano
Copy link

mrsiano commented Oct 23, 2018

Background
We identified some sort of memory leakage for a populated cluster, in steady state (same scale for continues hours operation) using prometheus 2.3.2.

Cluster distribution:
2 compute nodes
2 infra nodes
250 pods
83 namespaces

Summary bullets
-A memory leakage were found in the prometheus component, around ~100MB per day for 500 pods.
-Currently we can’t tell if the leakage has linear growth per the scale size.
-There is a leakage for other prometheus pods like the prometheus-config and prometheus-proxy (see the Graph section).
By comparing two heap files (for 5 days) we can identify the largest memory consumers, the biggest one’s are related to tsdb and promql components

Prometheus RSS_Usage over the last 5 days
image

pprof comparison

 ~/prometheus_profiler # go tool pprof -inuse_objects -base heap_1539815522.pprof /usr/bin/prometheus heap_1540294587.pprof
File: prometheus
Type: inuse_objects
Time: Oct 17, 2018 at 10:32pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 25810, 1.01% of 2550832 total
Dropped 15 nodes (cum <= 12754)
Showing top 10 nodes out of 156
      flat  flat%   sum%        cum   cum%
   -314256 12.32% 12.32%    -270564 10.61%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*SegmentWAL).Truncate /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go
    266509 10.45%  1.87%     266509 10.45%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*uint32slice).Less <autogenerated>
     65537  2.57%   0.7%      65537  2.57%  sync.(*Map).Store /usr/local/go/src/sync/map.go
     43692  1.71%  2.41%      43692  1.71%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*Writer).WritePostings /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/index.go
     32768  1.28%  3.69%      32768  1.28%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*Writer).Close /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/index.go
    -32767  1.28%  2.41%     -32767  1.28%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*SegmentWAL).LogSeries /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go
    -20229  0.79%  1.62%     -20229  0.79%  github.com/prometheus/prometheus/storage.(*fanoutAppender).Rollback /go/src/github.com/prometheus/prometheus/storage/fanout.go
    -17876   0.7%  0.92%     -81331  3.19%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*SegmentWAL).truncate /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go
     16384  0.64%  1.56%      16384  0.64%  github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1.(*NodeAffinity).codecDecodeSelfFromArray /go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1/types.generated.go
    -13952  0.55%  1.01%     -13952  0.55%  github.com/prometheus/prometheus/scrape.(*Target).URL /go/src/github.com/prometheus/prometheus/scrape/target.go

cumulative

(pprof) top --cum
Showing nodes accounting for -314241, 12.32% of 2550832 total
Dropped 15 nodes (cum <= 12754)
Showing top 10 nodes out of 156
      flat  flat%   sum%        cum   cum%
         0     0%     0%    -368839 14.46%  github.com/prometheus/prometheus/vendor/github.com/golang/protobuf/protoc-gen-go/descriptor.(*FileDescriptorProto).GetWeakDependency /go/src/github.com/prometheus/prometheus/vendor/github.com/golang/protobuf/protoc-gen-go/descriptor/descriptor.pb.go
         0     0%     0%    -315143 12.35%  github.com/prometheus/prometheus/promql.(*evaluator).VectorOr /go/src/github.com/prometheus/prometheus/promql/engine.go
         0     0%     0%    -315143 12.35%  github.com/prometheus/prometheus/promql.(*evaluator).eval /go/src/github.com/prometheus/prometheus/promql/engine.go
         0     0%     0%    -315143 12.35%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*xorIterator).Next /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc/xor.go
   -314256 12.32% 12.32%    -270564 10.61%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*SegmentWAL).Truncate /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go
         0     0% 12.32%     268253 10.52%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*safeChunk).Iterator /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go
         0     0% 12.32%     268029 10.51%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*DB).beyondRetention /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/db.go
        15 0.00059% 12.32%     268029 10.51%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*LeveledCompactor).Compact /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/compact.go
         0     0% 12.32%     268029 10.51%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*LeveledCompactor).Plan /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/compact.go
         0     0% 12.32%     268029 10.51%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.newCompactorMetrics /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/compact.go

subissues
prometheus/tsdb#424

@mrsiano

This comment has been minimized.

Copy link
Author

mrsiano commented Nov 19, 2018

looks like the leakage is gone in prometheus 2.5.0
after a 24 hours of populated system with 250 nodes and 10K pods, we can't identify any leakage anymore.

image

it will be nice to have some sort of GC'ed objects over time printouts to logs.
image

@mrsiano mrsiano closed this Nov 19, 2018

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Nov 19, 2018

thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.