-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add stagedsync headers metrics #498
Conversation
Codecov Report
@@ Coverage Diff @@
## main #498 +/- ##
==========================================
+ Coverage 72.53% 72.90% +0.36%
==========================================
Files 243 247 +4
Lines 24584 24688 +104
==========================================
+ Hits 17833 17998 +165
+ Misses 6751 6690 -61
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm supportive of this in general, but defer to @onbjerg here.
what are the implications if we run this without prometheus? are these just no-ops then?
bin/reth/src/prometheus_exporter.rs
Outdated
use metrics_exporter_prometheus::PrometheusBuilder; | ||
use metrics_util::layers::{PrefixLayer, Stack}; | ||
|
||
pub(crate) fn initialize_prometheus_exporter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of panicking here, let's add a try_
option as well, we should have eyre as error handling here so letting this return a result instead is nicer.
crates/stages/src/stages/headers.rs
Outdated
histogram!("stages.headers.request_time", start.elapsed()); | ||
counter!("stages.headers.counter", res.len() as u64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably want these metrics, but it would be nicer to extract this to a separate call, so it doesn't clutter this function here.
crates/stages/src/stages/headers.rs
Outdated
@@ -107,20 +113,23 @@ impl<DB: Database, D: HeaderDownloader, C: Consensus, H: HeadersClient, S: Statu | |||
} | |||
Err(e) => match e { | |||
DownloadError::Timeout => { | |||
increment_counter!("stages.headers.timeout_errors"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
bin/reth/src/metrics_describer.rs
Outdated
|
||
use metrics::{describe_counter, describe_histogram}; | ||
|
||
pub(crate) fn describe_metrics() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add brief, one-line, docs here.
bin/reth/src/metrics_describer.rs
Outdated
|
||
pub(crate) fn describe_metrics() { | ||
// Describe stagedsync headers metrics | ||
describe_counter!("stages.headers.counter", "Number of headers successfully retrived"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, not entirely sure if this should be here. we should find a way to describe the metrics in the crates they belong to for maximum reusability
crates/stages/src/stages/headers.rs
Outdated
@@ -92,6 +95,9 @@ impl<DB: Database, D: HeaderDownloader, C: Consensus, H: HeadersClient, S: Statu | |||
while let Some(headers) = stream.next().await { | |||
match headers.into_iter().collect::<Result<Vec<_>, _>>() { | |||
Ok(res) => { | |||
// Register elapsed time of request | |||
histogram!("stages.headers.request_time", start.elapsed()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be in the downloader itself since we might replace this with a concurrent downloader (and in that case this metric is wrong)
e35a021
to
95841ea
Compare
95841ea
to
ffcc993
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this - let's just inline the functions so we don't pay overheads when the macro is not enabled
crates/stages/src/stages_metrics.rs
Outdated
use metrics::{counter, increment_counter}; | ||
use reth_interfaces::p2p::error::DownloadError; | ||
|
||
pub(crate) fn update_headers_metrics(n_headers: u64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we inline(always) these calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of individual calls, I'd rather have a StageMetrics
struct with Gauge/Counter
s as fields that are mapped to the registered metrics. And this struct then encapsulates all the metrics that we have, this is easier to maintain than standalone metric functions.
I'm not sure how the exact workflow looks like in metrics
but imo it should be possible to register them and create a Gauge/Counter for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if this sounds good:
I will implement a struct HeaderMetrics
like so,
struct HeaderMetrics {
pub headers_counter: Counter,
pub headers_timeout_errors: Counter,
pub headers_validation_errors: Counter,
pub headers_unexpected_errors: Counter,
}
This struct could be initialized inside the HeaderStage
, in a field metrics
and when a metric should be updated, e.g. the headers counter, in the code this could be done with
self.metrics.headers_counter.increment(res.len() as u64);
Is this consitent with your suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds right. @mattsse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have refactored the metrics to encapsulate them in the HeaderMetrics
struct now. Let me know what you think @mattsse!
docs/design/metrics.md
Outdated
@@ -53,6 +56,14 @@ How the metrics are exposed to the end-user is determined by the CLI. | |||
|
|||
[^1]: The top-level namespace is added by the CLI using [`metrics_util::layers::PrefixLayer`][metrics_util.PrefixLayer]. | |||
|
|||
### Current metrics | |||
#### StagedSync Headers | |||
- `stages.headers.counter`: Number of headers successfully retrived |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `stages.headers.counter`: Number of headers successfully retrived | |
- `stages.headers.counter`: Number of headers successfully retrieved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
use metrics_util::layers::{PrefixLayer, Stack}; | ||
use std::net::SocketAddr; | ||
|
||
pub(crate) fn initialize(listen_addr: SocketAddr) -> eyre::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eventually, the exporter will be spawned with our TaskManager
.
I'd like a separate function that returns the recorder and exporter.
perhaps even a type that bundles those and allows to configure the Stack, like a builder or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be ok to create a separate issue for this and tackle it in another PR?
crates/stages/src/stages_metrics.rs
Outdated
use metrics::{counter, increment_counter}; | ||
use reth_interfaces::p2p::error::DownloadError; | ||
|
||
pub(crate) fn update_headers_metrics(n_headers: u64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of individual calls, I'd rather have a StageMetrics
struct with Gauge/Counter
s as fields that are mapped to the registered metrics. And this struct then encapsulates all the metrics that we have, this is easier to maintain than standalone metric functions.
I'm not sure how the exact workflow looks like in metrics
but imo it should be possible to register them and create a Gauge/Counter for it.
d34e682
to
835be6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work overall! Supportive of @mattsse's suggestion to use structs for metrics.
In the future, we might want to start exporting the time spent on downloading each batch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
smol nit
crates/stages/src/stages_metrics.rs
Outdated
pub fn new() -> Self { | ||
Self { | ||
headers_counter: register_counter!("stages.headers.counter"), | ||
headers_timeout_errors: register_counter!("stages.headers.timeout_errors"), | ||
headers_validation_errors: register_counter!("stages.headers.validation_errors"), | ||
headers_unexpected_errors: register_counter!("stages.headers.unexpected_errors"), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant, new, move to default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
56b4fa6
to
c309f9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just needs rebase
This reduces image build times by ~50% after the first one. Uses cargo-chef inside the image. More info in: https://morioh.com/p/987a2bda4526
1d9c8b2
to
30f6f6b
Compare
final blocker: https://github.com/paradigmxyz/reth/actions/runs/3758567946/jobs/6387061410 replace |
This PR adds the following metrics:
stages.headers.counter
: Number of headers successfully retrivedstages.headers.timeout_error
: Number of timeout errors while requesting headersstages.headers.validation_errors
: Number of validation errores while requesting headersstages.headers.unexpected_errors
: Number of unexpected errors while requesting headersstages.headers.request_time
: Elapsed time of successful header requestsThe prometheus exporter is modularized intro its own module. Also, a description for the metrics is added and the
metrics
docs is updated.