Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose tiller server metrics to prometheus #2171

Merged
merged 1 commit into from Apr 13, 2017

Conversation

@sadlil
Copy link
Contributor

sadlil commented Mar 23, 2017

Fixes #2163

@thomastaylor312

This comment has been minimized.

Copy link
Collaborator

thomastaylor312 commented Mar 24, 2017

I am going to hold off on putting this in a milestone until more discussion can happen on this.

@technosophos

This comment has been minimized.

Copy link
Member

technosophos commented Mar 24, 2017

@fibonacci1729 and @jchauncey -- Could I beg you two to pair on this and validate that this is the correct approach for collecting Prometheus metrics? Admittedly, this is outside my expertise.

@jchauncey

This comment has been minimized.

Copy link
Contributor

jchauncey commented Mar 24, 2017

Here is a sample of the metrics that go-grpc-prometheus exposes

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.4359e-05
go_gc_duration_seconds{quantile="0.25"} 4.5177e-05
go_gc_duration_seconds{quantile="0.5"} 5.5868e-05
go_gc_duration_seconds{quantile="0.75"} 8.0119e-05
go_gc_duration_seconds{quantile="1"} 0.000207363
go_gc_duration_seconds_sum 0.001685209
go_gc_duration_seconds_count 23
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 17
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 4.147952e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 7.9673296e+07
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.487611e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 665234
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 739328
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 4.147952e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 7.340032e+06
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 6.651904e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 22738
# HELP go_memstats_heap_released_bytes_total Total number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes_total counter
go_memstats_heap_released_bytes_total 6.955008e+06
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 1.3991936e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.490370241239965e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 302
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 687972
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 9600
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 107920
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 196608
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 8.15904e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.834493e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 688128
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 688128
# HELP go_memstats_sys_bytes Number of bytes obtained by system. Sum of all system allocations.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 1.8954488e+07
# HELP grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.
# TYPE grpc_server_handled_total counter
grpc_server_handled_total{grpc_code="OK",grpc_method="UninstallRelease",grpc_service="hapi.services.tiller.ReleaseService",grpc_type="unary"} 1
# HELP grpc_server_msg_received_total Total number of RPC stream messages received on the server.
# TYPE grpc_server_msg_received_total counter
grpc_server_msg_received_total{grpc_method="UninstallRelease",grpc_service="hapi.services.tiller.ReleaseService",grpc_type="unary"} 1
# HELP grpc_server_msg_sent_total Total number of gRPC stream messages sent by the server.
# TYPE grpc_server_msg_sent_total counter
grpc_server_msg_sent_total{grpc_method="UninstallRelease",grpc_service="hapi.services.tiller.ReleaseService",grpc_type="unary"} 1
# HELP grpc_server_started_total Total number of RPCs started on the server.
# TYPE grpc_server_started_total counter
grpc_server_started_total{grpc_method="UninstallRelease",grpc_service="hapi.services.tiller.ReleaseService",grpc_type="unary"} 1
@jchauncey

This comment has been minimized.

Copy link
Contributor

jchauncey commented Mar 24, 2017

@technosophos I think this is fine for now. Its definitely a good start to monitoring the tiller process and prometheus is used extensively in other parts of the kubernetes infrastructure. I think it would be nice to have another issue that starts a conversation on the types of metrics tiller should expose that are not covered by this lib.

@thomastaylor312

This comment has been minimized.

Copy link
Collaborator

thomastaylor312 commented Mar 24, 2017

@technosophos Are we ok with putting this in 2.3?

@sadlil sadlil force-pushed the appscode:tiler-prom branch from 9027f33 to 56c1ec4 Mar 28, 2017
@technosophos

This comment has been minimized.

Copy link
Member

technosophos commented Apr 3, 2017

@thomastaylor312 No, we will put this on 2.4 and get it merged as soon as tests are passing.

@technosophos technosophos added this to the 2.4.0 milestone Apr 3, 2017
@sadlil

This comment has been minimized.

Copy link
Contributor Author

sadlil commented Apr 4, 2017

@technosophos @thomastaylor312 All tests are passing.

@sadlil sadlil force-pushed the appscode:tiler-prom branch 2 times, most recently from e2e64af to 6a64581 Apr 5, 2017
@technosophos

This comment has been minimized.

Copy link
Member

technosophos commented Apr 12, 2017

A quick rebase, and we can get this tested and merged into 2.4.0

@@ -130,6 +131,11 @@ func start(c *cobra.Command, args []string) {

go func() {
mux := newProbesMux()

This comment has been minimized.

Copy link
@technosophos

technosophos Apr 12, 2017

Member

Is there any way to disable prometheus collection?

@sadlil sadlil force-pushed the appscode:tiler-prom branch from 6a64581 to 8c81e73 Apr 13, 2017
@sadlil

This comment has been minimized.

Copy link
Contributor Author

sadlil commented Apr 13, 2017

@technosophos rebased, tests passing.

Is there any way to disable prometheus collection?

Did you mean turn off metrics endpoint? Currently No. But there is scope to add flag to tiler that can disable metrics expose.

Copy link
Member

technosophos left a comment

Manually tested on system without prometheus enabled in order to test that case. And all seemed to work fine.

@technosophos technosophos merged commit 0c11033 into helm:master Apr 13, 2017
2 checks passed
2 checks passed
ci/circleci Your tests passed on CircleCI!
Details
cla/linuxfoundation sadlil authorized
Details
@sadlil sadlil deleted the appscode:tiler-prom branch Apr 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.