-
Notifications
You must be signed in to change notification settings - Fork 623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Prometheus metrics #8728
Conversation
b37d0d0
to
f4538a8
Compare
I have removed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good overall, just a few nits here and there
CHANGELOG.md
Outdated
@@ -8,6 +8,8 @@ | |||
|
|||
* Experimental option to dump state of every epoch to external storage. [#8661](https://github.com/near/nearcore/pull/8661) | |||
* State-viewer tool to dump and apply state changes from/to a range of blocks [#8628](https://github.com/near/nearcore/pull/8628) | |||
* Add prometheus metrics for tacked shards, block height within epoch, if is block/chunk producer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tick tack, it should say "track"
Also what exactly do you mean by block height within epoch?
chain/client/src/client_actor.rs
Outdated
last_final_block_height, | ||
last_final_ds_block_height, | ||
epoch_height, | ||
block_height_within_epoch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: For consistency I would rename block_height_within_epoch -> last_final_block_height_in_epoch and maybe move it up so the all block heights are together.
chain/client/src/info.rs
Outdated
@@ -119,6 +120,57 @@ impl InfoHelper { | |||
metrics::FINAL_BLOCK_HEIGHT.set(last_final_block_height as i64); | |||
metrics::FINAL_DOOMSLUG_BLOCK_HEIGHT.set(last_final_ds_block_height as i64); | |||
metrics::EPOCH_HEIGHT.set(epoch_height as i64); | |||
metrics::BLOCK_HEIGHT_WITHIN_EPOCH.set(block_height_within_epoch as i64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, how about FINAL_BLOCK_HEIGHT_IN_EPOCH ?
metrics::BLOCK_HEIGHT_WITHIN_EPOCH.set(block_height_within_epoch as i64); | ||
} | ||
|
||
fn record_tracked_shards(head: &Tip, client: &crate::client::Client) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you add a small comment?
chain/client/src/info.rs
Outdated
|
||
fn record_tracked_shards(head: &Tip, client: &crate::client::Client) { | ||
if let Ok(num_shards) = client.runtime_adapter.num_shards(&head.epoch_id) { | ||
(0..num_shards).for_each(|shard_id| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: personally I think for loops are much more readable but that's just my opinion, up to you
for shard_id in (0..num_shards) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @wacban . For-each is unnecessary here.
chain/client/src/info.rs
Outdated
let epoch_info = client.runtime_adapter.get_epoch_info(&head.epoch_id); | ||
epoch_info.map(|epoch_info| { | ||
for (shard_id, validators) in | ||
epoch_info.chunk_producers_settlement().into_iter().enumerate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure but I think you may not need enumerate and the for loop should handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That array holds the validators for each shard_id. I think that this is the easiest way to get the shard_id.
chain/client/src/info.rs
Outdated
epoch_info.chunk_producers_settlement().into_iter().enumerate() | ||
{ | ||
let is_chunk_producer_for_shard = validators.iter().any(|&validator_id| { | ||
epoch_info.validator_account_id(validator_id).clone() == account_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need that clone just for the comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Thanks for catching this!
chain/client/src/metrics.rs
Outdated
pub(crate) static IS_CHUNK_PRODUCER_FOR_SHARD: Lazy<IntGaugeVec> = Lazy::new(|| { | ||
try_create_int_gauge_vec( | ||
"near_is_chunk_producer_for_shard", | ||
"Number of blocks expected to be produced by a validator", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that so? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and please clarify that it's for the current epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-pasta
chain/client/src/client_actor.rs
Outdated
@@ -1280,6 +1280,12 @@ impl ClientActor { | |||
.runtime_adapter | |||
.get_epoch_height_from_prev_block(block.hash()) | |||
.unwrap_or(0); | |||
let epoch_start_height = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function does a tiny bit of work, and a whole lot of metrics work. Can that be moved to a separate function?
Also move the last_final_stuff
into that function.
chain/client/src/info.rs
Outdated
|
||
fn record_tracked_shards(head: &Tip, client: &crate::client::Client) { | ||
if let Ok(num_shards) = client.runtime_adapter.num_shards(&head.epoch_id) { | ||
(0..num_shards).for_each(|shard_id| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @wacban . For-each is unnecessary here.
chain/client/src/info.rs
Outdated
false, | ||
); | ||
metrics::TRACKED_SHARDS | ||
.with_label_values(&[shard_id.to_string().as_str()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.with_label_values(&[shard_id.to_string().as_str()]) | |
.with_label_values(&[&shard_id.to_string()]) |
chain/client/src/info.rs
Outdated
fn record_block_producers(head: &Tip, client: &crate::client::Client) { | ||
let me = client.validator_signer.as_ref().map(|x| x.validator_id().clone()); | ||
let is_bp = me | ||
.map(|account_id| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to replace map(stuff).unwrap_or(false)
with map_or(false, stuff)
?
chain/client/src/info.rs
Outdated
let me = client.validator_signer.as_ref().map(|x| x.validator_id().clone()); | ||
me.map(|account_id| { | ||
let epoch_info = client.runtime_adapter.get_epoch_info(&head.epoch_id); | ||
epoch_info.map(|epoch_info| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More natural way would be:
if let Ok(epoch_info) = get_epoch_info() {
do stuff with epoch_info
}
Maybe do the same for let me =
chain/client/src/info.rs
Outdated
let is_chunk_producer_for_shard = validators.iter().any(|&validator_id| { | ||
epoch_info.validator_account_id(validator_id).clone() == account_id | ||
}); | ||
metrics::IS_CHUNK_PRODUCER_FOR_SHARD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the metric unconditionally. Currently it's only updated if the node has a validator key and if there is an epoch. Don't want to accidentally miss setting the metric to 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
chain/client/src/metrics.rs
Outdated
pub(crate) static IS_BLOCK_PRODUCER: Lazy<IntGauge> = Lazy::new(|| { | ||
try_create_int_gauge( | ||
"near_is_block_producer", | ||
"Bool to denote if the node is currently a block producer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently
is ambiguous. This block? This epoch?
chain/client/src/metrics.rs
Outdated
pub(crate) static IS_CHUNK_PRODUCER_FOR_SHARD: Lazy<IntGaugeVec> = Lazy::new(|| { | ||
try_create_int_gauge_vec( | ||
"near_is_chunk_producer_for_shard", | ||
"Number of blocks expected to be produced by a validator", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and please clarify that it's for the current epoch.
We need to export this to improve the blockchain monitoring. Tested on a localnet: Run: python3 nearup run localnet --binary-path ../nearcore/target/debug/ --interactive Open http://127.0.0.1:3030/metrics - # HELP block_height_within_epoch Height of the block within the epoch. - # TYPE block_height_within_epoch gauge - near_block_height_within_epoch 1
* say which shards are tracked from the current node Testing * run a localnet with 3 validators and 3 shards. * go to localhost:3030/metrics - ** skipped lines ** - # HELP near_client_tracked_shards Tracked shards - # TYPE near_client_tracked_shards gauge - near_client_tracked_shards{shard_id="0"} 1 - near_client_tracked_shards{shard_id="1"} 1 - near_client_tracked_shards{shard_id="2"} 1 - ** skipped lines **
* say which shards is this node producing chunks for Testing * run a localnet with 10 validators and 4 shards. * go to localhost:3030/metrics - # HELP near_is_block_producer Bool to denote if the node is currently a block producer in the current epoch - # TYPE near_is_block_producer gauge - near_is_block_producer 1 - # HELP near_is_chunk_producer_for_shard Bool to denote if the node is a chunk producer for a shard in the current epoch - # TYPE near_is_chunk_producer_for_shard gauge - near_is_chunk_producer_for_shard{shard_id="0"} 0 - near_is_chunk_producer_for_shard{shard_id="1"} 1 - near_is_chunk_producer_for_shard{shard_id="2"} 0 - near_is_chunk_producer_for_shard{shard_id="3"} 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed comments and refactored the process_accepted_blocks
function to separate the monitoring logic.
chain/client/src/info.rs
Outdated
let epoch_info = client.runtime_adapter.get_epoch_info(&head.epoch_id); | ||
epoch_info.map(|epoch_info| { | ||
for (shard_id, validators) in | ||
epoch_info.chunk_producers_settlement().into_iter().enumerate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That array holds the validators for each shard_id. I think that this is the easiest way to get the shard_id.
chain/client/src/info.rs
Outdated
epoch_info.chunk_producers_settlement().into_iter().enumerate() | ||
{ | ||
let is_chunk_producer_for_shard = validators.iter().any(|&validator_id| { | ||
epoch_info.validator_account_id(validator_id).clone() == account_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Thanks for catching this!
chain/client/src/info.rs
Outdated
let is_chunk_producer_for_shard = validators.iter().any(|&validator_id| { | ||
epoch_info.validator_account_id(validator_id).clone() == account_id | ||
}); | ||
metrics::IS_CHUNK_PRODUCER_FOR_SHARD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
chain/client/src/metrics.rs
Outdated
pub(crate) static IS_CHUNK_PRODUCER_FOR_SHARD: Lazy<IntGaugeVec> = Lazy::new(|| { | ||
try_create_int_gauge_vec( | ||
"near_is_chunk_producer_for_shard", | ||
"Number of blocks expected to be produced by a validator", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-pasta
CHANGELOG.md
Outdated
@@ -9,6 +9,8 @@ | |||
* Experimental option to dump state of every epoch to external storage. [#8661](https://github.com/near/nearcore/pull/8661) | |||
* State-viewer tool to dump and apply state changes from/to a range of blocks [#8628](https://github.com/near/nearcore/pull/8628) | |||
* Node can restart if State Sync gets interrupted [#8732](https://github.com/near/nearcore/pull/8732) | |||
* Add prometheus metrics for tracked shards, block height within epoch, if is block/chunk producer | |||
>>>>>>> f4538a8ce (feat: changelog for PR#8728) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge artifact
557d152
to
495a828
Compare
Tested on localnet and mainnet on a canary rpc node.
The gist of metrics from the node in prod. link
For additional test results check each commit message