metrics/prometheus: added prometheus metrics #17077

chapsuk · 2018-06-26T01:23:24Z

Hi, guys!

added 2 flags:

  --metrics.prometheus             Enable prometheus HTTP server
  --metrics.prometheus.addr value  Prometheus server listening address (default: "localhost:9455")

when --metrics.prometheus flag provided, start http server which returns internal geth metrics by http in prometheus format; log output:

≻ ./build/bin/geth --metrics --metrics.prometheus
INFO [06-26|04:18:12.544] Enabling metrics collection
INFO [06-26|04:18:12.544] Starting prometheus http server          addr=localhost:9455
INFO [06-26|04:18:12.546] Maximum peer count                       ETH=25 LES=0 total=25

example of response https://gist.github.com/chapsuk/60b90b042526bb25a03da12d3326b0fb
i'm wrote small sample for testing and build dashboard: https://github.com/chapsuk/geth-prometheus
dashboard example

chapsuk · 2018-06-28T10:09:54Z

take a look please
@karalabe @nonsense

karalabe · 2018-06-28T10:23:30Z

There's no reason to start up a new HTTP server for Prometheus. The metrics are already exposed on the pprof server, you can just add an additional endpoint there to expose further metrics.

chapsuk · 2018-06-28T11:58:03Z

Thanks @karalabe!

I'm looks to #1697, metrics flags put in a separate block. Looks like all metrics settings need to be there.

Several reasons to start up a new HTTP server:

enable/disable pprof does not affect collect internal metrics, this is set up by --metrics flag
different access levels to metrics and pprof ports
enable Prometheus handler only if needed, this can be done through --pprof.prometheus flag,
but it confuse, because it not part of standard pprof handler;

In my case it would be great to enable the Prometheus handler when pprof is disabled. This is true for those environments where it is not possible to use pprof but need to collect metrics and monitor geth node.

Separate settings look more obvious for configuring geth according to own requirements.

chapsuk · 2018-07-04T08:22:05Z

@karalabe what you think about it #17077 (comment)?

2color · 2019-04-04T08:14:08Z

This would be greatly appreciated!

holiman · 2019-04-04T09:11:35Z

This is true for those environments where it is not possible to use pprof but need to collect metrics and monitor geth node.

Why would that not be possible?

We're planning to include this into 1.9, but it needs to be reworked to use the same endpoint as pprof (not a separate http server) and also needs licensing headers. @chapsuk are you willing to work more on this? Otherwise we'll put it into the backlog to try and finish it ourselves.

chapsuk · 2019-04-04T09:21:01Z

Thanks for replay @holiman.
Not a problem, i will change to use pprof endpoints in 2-3 days.
See you later

chapsuk · 2019-04-04T17:01:54Z

Done, PTAL

karalabe · 2019-04-05T07:35:59Z

Hey @chapsuk, thanks a lot for this. I've been playing around with it a bit and have a few questions though.

Prometheus generally has each metric as its separate instance:

# Minimalistic line:
metric_without_timestamp_and_labels 12.47

# A weird metric from before the epoch:
something_weird{problem="division by zero"} +Inf -3982045

However, you grouped all everything under 4 metrics (counters, gauges, meters, timers):

timers{name="chain/account/reads",quantile="0.50"} 0
timers{name="chain/account/reads",quantile="0.75"} 0
timers{name="chain/account/reads",quantile="0.95"} 0
timers{name="chain/account/reads",quantile="0.99"} 0
timers{name="chain/account/reads",quantile="0.999"} 0
timers{name="chain/account/reads",quantile="0.9999"} 0
timers_count{name="chain/storage/commits"} 22094
timers{name="chain/storage/commits",quantile="0.50"} 0
timers{name="chain/storage/commits",quantile="0.75"} 0
timers{name="chain/storage/commits",quantile="0.95"} 0
timers{name="chain/storage/commits",quantile="0.99"} 0
timers{name="chain/storage/commits",quantile="0.999"} 0
timers{name="chain/storage/commits",quantile="0.9999"} 0
timers_count{name="chain/execution"} 22094

I don't have much experience with Prometheus, but this seems the wrong way to approach it, since then all these data points will be pushed into the same table of the time series database, instead of separate ones per metric.

It also seems to make Grafana charting more complex, since you need to fetch timers first, and then filter for chain/storage/commits.

Instead of creating these grouping, which imho only add complexity but don't provide any value, couldn't er simply convert / to _ and report every metric on it's own? Seems like that's the official way. Eg.

chain_storage_commits_count 22094
chain_storage_commits{quantile="0.50"} 0
chain_storage_commits{quantile="0.75"} 0
chain_storage_commits{quantile="0.95"} 0
chain_storage_commits{quantile="0.99"} 0
chain_storage_commits{quantile="0.999"} 0
chain_storage_commits{quantile="0.9999"} 0

This would also simplify your code a lot, since it completely removes the need for the collector type, which just facilitates the groupings. Instead you can just use a singe byte buffer in the Handler and push every meter in the order of appearance, no need to change it.

A second issue is that you seem to convert go-metrics counters into Prometheus gauges. I think we can retain the counter type in Prometheus too.

2color · 2019-04-05T07:48:16Z

@karalabe

You're right about naming/grouping metrics based on their semantic value rather than unit type.

Some good pointers can be found:

Regarding Prometheus counters, they can only increment unlike go-metric counters which can also decrement.

A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.

chapsuk · 2019-04-05T08:44:37Z

@karalabe named metrics better then typed groups 👍

This would also simplify your code a lot, since it completely removes the need for the collector type, which just facilitates the groupings. Instead you can just use a singe byte buffer in the Handler and push every meter in the order of appearance, no need to change it.

Not sure about that, i need to send type of each metrics to Prometheus. But i will try

@2color fully right about second question.

karalabe · 2019-04-05T08:48:48Z

Not sure about that, i need to send type of each metrics to Prometheus. But i will try

Based on the Prometheus sample, you don't need to explicitly specify the type. But even if you want to, you can still do with the type switch you currently have, you just don't need to "group" all counters together for example, you can simply output as the come along: Counter1, Meter1, Counter2, Counter3, Meter2, etc.

chapsuk · 2019-04-05T18:02:06Z

Based on the Prometheus sample, you don't need to explicitly specify the type.

Yep, it works without TYPE , but I left him

Fixed names of metrics, PTAL

chapsuk · 2019-04-11T09:10:32Z

@karalabe
Need something else to do?

karalabe · 2019-04-11T09:32:42Z

Nope, just pushes a tiny typo fix on top. LGTM, will merge when CI is green.

karalabe

LGTM

karalabe · 2019-04-11T09:34:09Z

Oh @chapsuk, would you be willing to share your Grafana dashboard setup? I'd like to write up a blog post about metrics for the 1.9 release and would be nice to have a Grafana dashboard included there too.

chapsuk · 2019-04-11T09:40:24Z

This repo contains dev environment and Grafana dashboard. I'm updated dashboard and test last changes with it, but README screenshot not actual. I will try to fix it tonight.

karalabe · 2019-04-11T09:57:01Z

Thanks a lot for this PR! Btw, we'll probably simplify the flags a bit even further so not even --pprof should be needed. I'll ping you on the followup PR when I open it.

2color · 2019-04-11T15:27:29Z

Thanks @chapsuk! Much appreciated

chapsuk mentioned this pull request Jun 26, 2018

Proposal: Prometheus metrics #17086

Closed

chapsuk force-pushed the prom branch 2 times, most recently from 0a29635 to ed9ae6c Compare July 2, 2018 13:15

chapsuk force-pushed the prom branch from ed9ae6c to b4d5879 Compare July 6, 2018 11:25

adamschmideg added the status:triage label Apr 4, 2019

karalabe added this to the 1.9.0 milestone Apr 4, 2019

karalabe self-requested a review April 4, 2019 09:11

karalabe removed the status:triage label Apr 4, 2019

chapsuk force-pushed the prom branch from b4d5879 to cdd3b0b Compare April 4, 2019 16:22

karalabe force-pushed the prom branch from cdd3b0b to de5398f Compare April 5, 2019 07:27

karalabe force-pushed the prom branch from de5398f to e419f5c Compare April 5, 2019 07:36

chapsuk and others added 3 commits April 11, 2019 10:56

metrics/prometheus: added prometheus http server and metrics collector

ae5c0fc

metrics/prometheus: minor cleanups

8e8d55e

metrics/prometheus: named keys instead name in tag

55f2232

metrics/prometheus: minor typo cleanups, sorted report

88339c3

karalabe force-pushed the prom branch from df701f6 to 88339c3 Compare April 11, 2019 09:32

karalabe approved these changes Apr 11, 2019

View reviewed changes

karalabe merged commit 31bc2a2 into ethereum:master Apr 11, 2019

chapsuk deleted the prom branch April 11, 2019 12:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics/prometheus: added prometheus metrics #17077

metrics/prometheus: added prometheus metrics #17077

chapsuk commented Jun 26, 2018 •

edited

chapsuk commented Jun 28, 2018

karalabe commented Jun 28, 2018

chapsuk commented Jun 28, 2018

chapsuk commented Jul 4, 2018

2color commented Apr 4, 2019

holiman commented Apr 4, 2019

chapsuk commented Apr 4, 2019

chapsuk commented Apr 4, 2019

karalabe commented Apr 5, 2019

2color commented Apr 5, 2019 •

edited

chapsuk commented Apr 5, 2019

karalabe commented Apr 5, 2019

chapsuk commented Apr 5, 2019 •

edited

chapsuk commented Apr 11, 2019

karalabe commented Apr 11, 2019

karalabe left a comment

karalabe commented Apr 11, 2019 •

edited

chapsuk commented Apr 11, 2019

karalabe commented Apr 11, 2019

2color commented Apr 11, 2019

metrics/prometheus: added prometheus metrics #17077

metrics/prometheus: added prometheus metrics #17077

Conversation

chapsuk commented Jun 26, 2018 • edited

chapsuk commented Jun 28, 2018

karalabe commented Jun 28, 2018

chapsuk commented Jun 28, 2018

chapsuk commented Jul 4, 2018

2color commented Apr 4, 2019

holiman commented Apr 4, 2019

chapsuk commented Apr 4, 2019

chapsuk commented Apr 4, 2019

karalabe commented Apr 5, 2019

2color commented Apr 5, 2019 • edited

chapsuk commented Apr 5, 2019

karalabe commented Apr 5, 2019

chapsuk commented Apr 5, 2019 • edited

chapsuk commented Apr 11, 2019

karalabe commented Apr 11, 2019

karalabe left a comment

Choose a reason for hiding this comment

karalabe commented Apr 11, 2019 • edited

chapsuk commented Apr 11, 2019

karalabe commented Apr 11, 2019

2color commented Apr 11, 2019

chapsuk commented Jun 26, 2018 •

edited

2color commented Apr 5, 2019 •

edited

chapsuk commented Apr 5, 2019 •

edited

karalabe commented Apr 11, 2019 •

edited