R4R: prometheus exporter in substrate #4511

nodebreaker0-0 · 2019-12-30T05:45:12Z

We are B-Harvest working on w3f granted project
(https://github.com/w3f/Web3-collaboration/blob/master/grants/speculative/substrate%20x%20(prometheus%20%2B%20grafana)%20by%20B-Harvest.md).

This PR is a draft implementation to request a review to be merged from parity team. We will do necessary modification if raised.

Please refer to README.md for detailed function information.

Create new due to error:
[#4505 ]

parity-cla-bot · 2019-12-30T05:45:16Z

It looks like @nodebreaker0-0 signed our Contributor License Agreement. 👍

Many thanks,

Parity Technologies CLA Bot

…nce and correction,applicable

expenses · 2020-01-02T14:42:57Z

Hi, thanks for your PR! Have you considered integrating these changes with the existing metrics recording set up? #4187

expenses · 2020-01-02T15:00:25Z

@expenses To expand, it would be nice to be able to do

record_metrics!(
    a => 1,
    b => 2,
    ...
)

and have it go to either the built-in metrics server or the Prometheus server depending on a --metrics-server flag or something similar.

utils/prometheus/src/metrics.rs

nodebreaker0-0 · 2020-01-02T17:00:53Z

Integrating the metric server is a good idea. but,
Your code must include Prometheus, and you will need to test a bit more because of the wasm and no_std compatibility issues.

The reason for using prometheus is due to grafana extensibility and to use the function completely, you must use prometheus code as below.
ex) prometheus :: register
record_metrics! Let's consider whether we can integrate the basic metrics we received through

But the heart of the project is v0.4, which will be integrated later.
https://github.com/nodebreaker0-0/substrate/blob/prometheus_v0.4/utils/prometheus/src/expansion.rs

Further consideration is needed for integrations that take into account extended metrics.
https://github.com/nodebreaker0-0/substrate/blob/prometheus_v0.4/utils/prometheus/README.md#seting-grafana

First, If the code runs without any problem,
I think we should consider metrics and macro integration,

utils/prometheus/src/networking.rs

client/service/src/builder.rs

…dit and note

expenses · 2020-01-03T10:23:42Z

Your code must include Prometheus, and you will need to test a bit more because of the wasm and no_std compatibility issues.

This shouldn't be a problem, as I think that the metrics recording will only happen outside the runtime.

Hyung-bharvest · 2020-01-03T11:07:24Z

Your code must include Prometheus, and you will need to test a bit more because of the wasm and no_std compatibility issues.

This shouldn't be a problem, as I think that the metrics recording will only happen outside the runtime.

Hi. I think the metric serving part should be separated from grafana-data-source so that we can connect our metric into the separated module. If you can separate it soon(2weeks from now)(we spent too much time on this project as now), we can migrate the metric part to newly separated metric serving module.

mxinden

I am one of the maintainers of the Prometheus project, thus naturally I am very happy about this effort.

I don't think we should expose both a Granfana endpoint as well as a Prometheus endpoint. Given that Grafana can connect to a Prometheus server and given that the Prometheus metric format is adopted by many software projects and is pretty much the industry standard, I would prefer a Prometheus endpoint. But before we rush anything, this should be properly discussed.

client/service/src/config.rs

utils/prometheus/README.md

utils/prometheus/src/lib.rs

utils/prometheus/src/metrics.rs

expenses

Looks decent in general. A few nitpicks and things to tidy up around gauge creation etc.

client/service/src/builder.rs

utils/prometheus/Cargo.toml

utils/prometheus/src/lib.rs

utils/prometheus/src/metrics.rs

expenses · 2020-01-10T11:45:02Z

I've made a PR on this fork with a few changes :^) nodebreaker0-0#2

expenses · 2020-01-13T14:49:31Z

client/tracing/src/lib.rs

-		log::warn!("Unable to send metrics to grafana: {:?}", e);
-	}
+fn send_prometheus(span_datum: SpanDatum) {
+	unimplemented!()
 }


@mxinden do you know of a way that we could implement this API? It seems like prometheus gauges etc are intended to created as static refs etc.

My guess is that you can't implement it without creating a macro.

Or you can simply make a static reference.
https://github.com/tikv/rust-prometheus/blob/master/examples/example_edition_2018.rs

I don't yet understand fully how prometheus works, but to properly support this I think we would need to be able to:

batch together all values (spans) collected since last scrape

partition by target and name (by the use of labels perhaps).

With all targets currently in the code enabled, we can be approaching 200 measurements per second just across the runtime. Is this feasible to do while preserving timestamp (which we would need to explicitly provide for this receiver)?

Hmm. I'll have a go at this later.

be approaching 200 measurements

@mattrutherford 200 measurements are not an issue for Prometheus. Prometheus can easily handle couple of mega bytes of metric data per scrape. See kubernetes/kube-state-metrics#498 for some numbers.

batch together all values (spans) collected since last scrape

This sounds more like we want a Histogram, right?

Would it be possible to tackle the effort within client/tracing as a follow up pull request? I would like to keep this one small. What do you think @mattrutherford @expenses @nodebreaker0-0?

I agree it's good to be clear on terminology, but that wasn't my main concern - only that if we're ripping out the Grafana server as part of the introduction of this Prometheus feature, is whether we'd lose this functionality to plot individual observations* - however that seems unavoidable based on what you said.

*For my use-case it's OK because I use the Telemetry Receiver in substrate-tracing to send the data to substrate-analytics which uses a PostgreSQL datasource, which in turn can be queried by Grafana (because I want to archive the data); however this is not trivial to set up, so the question is - do we really want to kill substrate Grafana server yet? It's a relatively new feature so maybe not many people use it, but I think it's something we should consider as part of our decision to remove it. Particularly as the current primary use-case for substrate-tracing is for profiling and that might be handy for developers to have any easy way to look at this.

Pinging @marcio-diaz @shawntabrizi @DarkEld3r in case you have opinion on this

Imo the Grafana server in its current state isn't really good at doing anything. If Prometheus isn't going to work for tracing, we should probably find something that is. I'm happy enough to leave the Grafana server in for the moment until that happens.

ok, good to know - and just to be clear that I have no problem removing it, if it's not going to be a big detriment to people - just wanted to make sure everyone was aware of full implications

I think tracing other's nodes is not (should not) the purpose of the prometheus exporter. It is purely for monitoring operator's own nodes. Extracting information from nodes operated by others is not sounding right as a perspective of information privacy.

I guess tracing functionality should remain in telemetry, and it should be also "off as default" for privacy of node operators.

And storing and querying historical data for own nodes can be done on monitoring server(grafana), and it should not burden the node itself.

Please let me know if I misunderstood this context.

OK - I can see what the misunderstanding is here and it warrants some more explanation - tracing is only capturing local data; which we then have the option to send somewhere, either Log (output tracing data via logger), Grafana server (built-in) or Telemetry. When using telemetry to send the tracing data, it is not expected (or desired) to send to the default telemetry url, so we override that via the cli, with eg: --telemetry-url 'ws://localhost:8080 9' to send the data to an analytics server (often running on the same machine, but can be anywhere you want).

primitives/consensus/babe/src/inherents.rs

Co-Authored-By: Max Inden <mail@max-inden.de>

nodebreaker0-0 · 2020-01-31T09:24:41Z

Fixed input validation in / metrics and enabling --no-prometheus.

mxinden

Thanks for introducing the --no-prometheus flag and restricting exposition to /metrics!

mxinden · 2020-02-03T17:03:22Z

utils/prometheus/src/lib.rs

+			.map_err(Error::Http)
+	} else {
+		Response::builder()
+		.status(StatusCode::NOT_FOUND)


This line and the ones below would need to be indented. See Response::builder().status(...) above.

client/tracing/src/lib.rs

mxinden · 2020-02-03T17:04:56Z

client/cli/src/params.rs

+	#[structopt(long = "prometheus-port", value_name = "PORT")]
+	pub prometheus_port: Option<u16>,
+
+	/// Disable connecting to the Substrate prometheus.


Suggested change

/// Disable connecting to the Substrate prometheus.

/// Do not expose a Prometheus metric endpoint.

One does not disable others to connect to it, but the exposition in itself.

mxinden · 2020-02-03T17:05:34Z

client/cli/src/params.rs

+
+	/// Disable connecting to the Substrate prometheus.
+	///
+	/// prometheus is on by default on global chains.


Suggested change

/// prometheus is on by default on global chains.

/// Prometheus metric endpoint is enabled by default.

mxinden · 2020-02-10T13:27:00Z

As far as I can tell ccb3179 is missing the following change in order to compile:

diff --git a/client/cli/src/params.rs b/client/cli/src/params.rs
index 128c66b30..1a4b28443 100644
--- a/client/cli/src/params.rs
+++ b/client/cli/src/params.rs
@@ -299,7 +299,6 @@ arg_enum! {
        pub enum TracingReceiver {
                Log,
                Telemetry,
-               Prometheus,
        }
 }
 
@@ -308,7 +307,6 @@ impl Into<sc_tracing::TracingReceiver> for TracingReceiver {
                match self {
                        TracingReceiver::Log => sc_tracing::TracingReceiver::Log,
                        TracingReceiver::Telemetry => sc_tracing::TracingReceiver::Telemetry,
-                       TracingReceiver::Prometheus => sc_tracing::TracingReceiver::Prometheus,
                }
        }
 }

mxinden

This looks good to me once:

The branch compiles (see small diff above).
The merge conflicts have been addressed.

Thanks for the hard work and again thanks for bearing with us.

nodebreaker0-0 · 2020-02-14T07:59:35Z

This looks good to me once:

The branch compiles (see small diff above).

The merge conflicts have been addressed.

Thanks for the hard work and again thanks for bearing with us.

Done

mxinden

This is good to go from my side. (One small comment, but that can be deferred to a follow up.)

@expenses do you have any further comments?

mxinden · 2020-02-17T11:07:38Z

client/service/src/builder.rs

+
+			let _ = to_spawn_tx.unbounded_send((
+				Box::pin(future),
+				From::from("prometheus-on-block")


I am not sure what "prometheus-on-block" relates to. I would expect something like prometheus-server or prometheus-endpoint. Can you explain your thoughts?

I didn't understand a little bit about what this was, so I copied the Telemetry, but I think it's the prometheus-endpoint

expenses

LGTM! I have a few nitpicks around style (missing newlines at EOF etc) but I don't want to delay the PR further.

NikVolf · 2020-02-17T12:00:42Z

@nodebreaker0-0 you need to update Cargo.lock so that CI happy

run cargo test --all and commit the diff

…us_v0.3

nodebreaker0-0 · 2020-02-18T04:07:33Z

Real Done!.
Thank you for joining me, see you again at Milestone 2.

gnunicorn · 2020-02-18T14:30:21Z

please merge master and we are ready to merge.

mxinden · 2020-02-19T09:37:52Z

Closing here in favor of #4981 which includes the required master merge.

I am hoping to get #4981 into master before the code freeze on Friday.

nodebreaker0-0 mentioned this pull request Dec 30, 2019

WIP: prometheus exporter in substrate #4505

Closed

nodebreaker0-0 added 3 commits January 2, 2020 22:35

Refactor rebase master prometheus_v0.3

40762b6

Milestone1: Final Version of v0.3

472aca1

no-std or warm compatibility issues, grapana-data -source code refere…

1974f94

…nce and correction,applicable

nodebreaker0-0 force-pushed the prometheus_v0.3 branch from 6686b92 to 1974f94 Compare January 2, 2020 13:37

Cargo.lock paritytech/master rebase

62a873d

expenses reviewed Jan 2, 2020

View reviewed changes

utils/prometheus/src/metrics.rs Outdated Show resolved Hide resolved

nodebreaker0-0 changed the title ~~WIP: prometheus exporter in substrate~~ R4R: prometheus exporter in substrate Jan 2, 2020

NikVolf reviewed Jan 2, 2020

View reviewed changes

utils/prometheus/src/networking.rs Outdated Show resolved Hide resolved

NikVolf reviewed Jan 2, 2020

View reviewed changes

client/service/src/builder.rs Outdated Show resolved Hide resolved

prometheus networking.rs del, grafana-data-source networking.rs pub e…

97e37e4

…dit and note

gavofyork added the A3-in_progress Pull request is in progress. No review needed at this stage. label Jan 3, 2020

mxinden reviewed Jan 6, 2020

View reviewed changes

expenses mentioned this pull request Jan 7, 2020

Remove timing based bits of grafana test #4558

Merged

expenses suggested changes Jan 8, 2020

View reviewed changes

nodebreaker0-0 and others added 6 commits January 10, 2020 14:57

chore: reflect various feedback

f3e444e

Spaces to tabs.

8145df7

Replace grafana and tidy

fca71be

Add generics

855ceed

Add photo back

70c6bcc

Re-fix spaces in primitives/consensus/babe/src/inherents.rs

5941041

expenses reviewed Jan 13, 2020

View reviewed changes

primitives/consensus/babe/src/inherents.rs Outdated Show resolved Hide resolved

hskang9 and others added 3 commits January 30, 2020 23:00

Update utils/prometheus/src/lib.rs

bb388f5

Co-Authored-By: Max Inden <mail@max-inden.de>

no-prometheus flag add

77ba252

/metrics url Input check

5e327c2

mxinden added the B1-clientnoteworthy label Feb 3, 2020

mxinden suggested changes Feb 3, 2020

View reviewed changes

remove prometheus in Tracing

ccb3179

mxinden reviewed Feb 10, 2020

View reviewed changes

nodebreaker0-0 added 3 commits February 14, 2020 13:39

remove prometheus in Tracing

152176f

Merge branch 'master' into prometheus_v0.3

8de57c2

chore: master code rebase edit

e76f200

gitlab-check-web-wasm edit code

55428b7

mxinden approved these changes Feb 17, 2020

View reviewed changes

expenses approved these changes Feb 17, 2020

View reviewed changes

nodebreaker0-0 added 3 commits February 18, 2020 00:59

From:from and cargo.lock update

d1003e6

Merge commit 'db1ab7d18fbe7876cdea43bbf30f147ddd263f94' into promethe…

5c2e085

…us_v0.3

with_prometheus_registry add background_tasks

b4b4432

JoshOrndorff mentioned this pull request Feb 17, 2020

Update tutorial on visualizing node metrics substrate-developer-hub/substrate-developer-hub.github.io#458

Closed

Merge commit '419e5fd0026cfd528cd3b327789bb0a3a8215703' into promethe…

ee421a6

…us_v0.3

mxinden mentioned this pull request Feb 18, 2020

docs/tutorials/visualizing-node-metrics: Update to Prometheus endpoint substrate-developer-hub/substrate-developer-hub.github.io#461

Merged

gnunicorn added A7-looksgoodcantmerge and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Feb 18, 2020

mxinden mentioned this pull request Feb 19, 2020

Introduce Prometheus metric endpoint replacing Grafana endpoint #4981

Merged

mxinden closed this Feb 19, 2020

	/// Disable connecting to the Substrate prometheus.
	/// Do not expose a Prometheus metric endpoint.

	/// prometheus is on by default on global chains.
	/// Prometheus metric endpoint is enabled by default.

R4R: prometheus exporter in substrate #4511

R4R: prometheus exporter in substrate #4511

Conversation

nodebreaker0-0 commented Dec 30, 2019 • edited Loading

parity-cla-bot commented Dec 30, 2019

expenses commented Jan 2, 2020

expenses commented Jan 2, 2020

nodebreaker0-0 commented Jan 2, 2020

expenses commented Jan 3, 2020

Hyung-bharvest commented Jan 3, 2020

mxinden left a comment

Choose a reason for hiding this comment

expenses left a comment

Choose a reason for hiding this comment

expenses commented Jan 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattrutherford Jan 14, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hyung-bharvest Jan 20, 2020 • edited Loading

Choose a reason for hiding this comment

mattrutherford Jan 23, 2020 • edited Loading

Choose a reason for hiding this comment

nodebreaker0-0 commented Jan 31, 2020

mxinden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxinden commented Feb 10, 2020

mxinden left a comment

Choose a reason for hiding this comment

nodebreaker0-0 commented Feb 14, 2020

mxinden left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

expenses left a comment

Choose a reason for hiding this comment

NikVolf commented Feb 17, 2020

nodebreaker0-0 commented Feb 18, 2020

gnunicorn commented Feb 18, 2020

mxinden commented Feb 19, 2020

nodebreaker0-0 commented Dec 30, 2019 •

edited

Loading

mattrutherford Jan 14, 2020 •

edited

Loading

Hyung-bharvest Jan 20, 2020 •

edited

Loading

mattrutherford Jan 23, 2020 •

edited

Loading

mxinden left a comment •

edited

Loading