Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Prometheus metrics #8728

Merged
merged 5 commits into from
Mar 16, 2023
Merged

feat: add Prometheus metrics #8728

merged 5 commits into from
Mar 16, 2023

Conversation

VanBarbascu
Copy link
Contributor

@VanBarbascu VanBarbascu commented Mar 14, 2023

  • Current block height since the start of the current epoch
  • shards tracked by the current node. (currently all shards but in the feature it might change)
  • whether the node is a block producer
  • whether the node is a chunk producer and for which shard

Tested on localnet and mainnet on a canary rpc node.
The gist of metrics from the node in prod. link
For additional test results check each commit message

@VanBarbascu VanBarbascu requested a review from a team as a code owner March 14, 2023 15:04
@VanBarbascu VanBarbascu force-pushed the master branch 5 times, most recently from b37d0d0 to f4538a8 Compare March 15, 2023 15:57
@VanBarbascu
Copy link
Contributor Author

I have removed the number of expected blocks/chunks to be produced in current epoch item and I will create a different PR for it.

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall, just a few nits here and there

CHANGELOG.md Outdated
@@ -8,6 +8,8 @@

* Experimental option to dump state of every epoch to external storage. [#8661](https://github.com/near/nearcore/pull/8661)
* State-viewer tool to dump and apply state changes from/to a range of blocks [#8628](https://github.com/near/nearcore/pull/8628)
* Add prometheus metrics for tacked shards, block height within epoch, if is block/chunk producer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tick tack, it should say "track"
Also what exactly do you mean by block height within epoch?

Comment on lines 1295 to 1298
last_final_block_height,
last_final_ds_block_height,
epoch_height,
block_height_within_epoch,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: For consistency I would rename block_height_within_epoch -> last_final_block_height_in_epoch and maybe move it up so the all block heights are together.

@@ -119,6 +120,57 @@ impl InfoHelper {
metrics::FINAL_BLOCK_HEIGHT.set(last_final_block_height as i64);
metrics::FINAL_DOOMSLUG_BLOCK_HEIGHT.set(last_final_ds_block_height as i64);
metrics::EPOCH_HEIGHT.set(epoch_height as i64);
metrics::BLOCK_HEIGHT_WITHIN_EPOCH.set(block_height_within_epoch as i64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, how about FINAL_BLOCK_HEIGHT_IN_EPOCH ?

metrics::BLOCK_HEIGHT_WITHIN_EPOCH.set(block_height_within_epoch as i64);
}

fn record_tracked_shards(head: &Tip, client: &crate::client::Client) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you add a small comment?


fn record_tracked_shards(head: &Tip, client: &crate::client::Client) {
if let Ok(num_shards) = client.runtime_adapter.num_shards(&head.epoch_id) {
(0..num_shards).for_each(|shard_id| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: personally I think for loops are much more readable but that's just my opinion, up to you
for shard_id in (0..num_shards) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @wacban . For-each is unnecessary here.

let epoch_info = client.runtime_adapter.get_epoch_info(&head.epoch_id);
epoch_info.map(|epoch_info| {
for (shard_id, validators) in
epoch_info.chunk_producers_settlement().into_iter().enumerate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure but I think you may not need enumerate and the for loop should handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That array holds the validators for each shard_id. I think that this is the easiest way to get the shard_id.

epoch_info.chunk_producers_settlement().into_iter().enumerate()
{
let is_chunk_producer_for_shard = validators.iter().any(|&validator_id| {
epoch_info.validator_account_id(validator_id).clone() == account_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need that clone just for the comparison?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, Thanks for catching this!

pub(crate) static IS_CHUNK_PRODUCER_FOR_SHARD: Lazy<IntGaugeVec> = Lazy::new(|| {
try_create_int_gauge_vec(
"near_is_chunk_producer_for_shard",
"Number of blocks expected to be produced by a validator",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that so? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and please clarify that it's for the current epoch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-pasta

@@ -1280,6 +1280,12 @@ impl ClientActor {
.runtime_adapter
.get_epoch_height_from_prev_block(block.hash())
.unwrap_or(0);
let epoch_start_height = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does a tiny bit of work, and a whole lot of metrics work. Can that be moved to a separate function?
Also move the last_final_stuff into that function.


fn record_tracked_shards(head: &Tip, client: &crate::client::Client) {
if let Ok(num_shards) = client.runtime_adapter.num_shards(&head.epoch_id) {
(0..num_shards).for_each(|shard_id| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @wacban . For-each is unnecessary here.

false,
);
metrics::TRACKED_SHARDS
.with_label_values(&[shard_id.to_string().as_str()])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.with_label_values(&[shard_id.to_string().as_str()])
.with_label_values(&[&shard_id.to_string()])

fn record_block_producers(head: &Tip, client: &crate::client::Client) {
let me = client.validator_signer.as_ref().map(|x| x.validator_id().clone());
let is_bp = me
.map(|account_id| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to replace map(stuff).unwrap_or(false) with map_or(false, stuff)?

let me = client.validator_signer.as_ref().map(|x| x.validator_id().clone());
me.map(|account_id| {
let epoch_info = client.runtime_adapter.get_epoch_info(&head.epoch_id);
epoch_info.map(|epoch_info| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More natural way would be:

if let Ok(epoch_info) = get_epoch_info() {
  do stuff with epoch_info
}

Maybe do the same for let me =

let is_chunk_producer_for_shard = validators.iter().any(|&validator_id| {
epoch_info.validator_account_id(validator_id).clone() == account_id
});
metrics::IS_CHUNK_PRODUCER_FOR_SHARD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the metric unconditionally. Currently it's only updated if the node has a validator key and if there is an epoch. Don't want to accidentally miss setting the metric to 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

pub(crate) static IS_BLOCK_PRODUCER: Lazy<IntGauge> = Lazy::new(|| {
try_create_int_gauge(
"near_is_block_producer",
"Bool to denote if the node is currently a block producer",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently is ambiguous. This block? This epoch?

pub(crate) static IS_CHUNK_PRODUCER_FOR_SHARD: Lazy<IntGaugeVec> = Lazy::new(|| {
try_create_int_gauge_vec(
"near_is_chunk_producer_for_shard",
"Number of blocks expected to be produced by a validator",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and please clarify that it's for the current epoch.

Razvan Barbascu and others added 4 commits March 16, 2023 16:55
We need to export this to improve the blockchain monitoring.
Tested on a localnet:
Run: python3 nearup run localnet --binary-path ../nearcore/target/debug/ --interactive
Open http://127.0.0.1:3030/metrics

- # HELP block_height_within_epoch Height of the block within the epoch.
- # TYPE block_height_within_epoch gauge
- near_block_height_within_epoch 1
* say which shards are tracked from the current node
Testing
* run a localnet with 3 validators and 3 shards.
* go to localhost:3030/metrics

- ** skipped lines **
- # HELP near_client_tracked_shards Tracked shards
- # TYPE near_client_tracked_shards gauge
- near_client_tracked_shards{shard_id="0"} 1
- near_client_tracked_shards{shard_id="1"} 1
- near_client_tracked_shards{shard_id="2"} 1
- ** skipped lines **
* say which shards is this node producing chunks for
Testing
* run a localnet with 10 validators and 4 shards.
* go to localhost:3030/metrics

- # HELP near_is_block_producer Bool to denote if the node is currently a block producer in the current epoch
- # TYPE near_is_block_producer gauge
- near_is_block_producer 1
- # HELP near_is_chunk_producer_for_shard Bool to denote if the node is a chunk producer for a shard in the current epoch
- # TYPE near_is_chunk_producer_for_shard gauge
- near_is_chunk_producer_for_shard{shard_id="0"} 0
- near_is_chunk_producer_for_shard{shard_id="1"} 1
- near_is_chunk_producer_for_shard{shard_id="2"} 0
- near_is_chunk_producer_for_shard{shard_id="3"} 0
Copy link
Contributor Author

@VanBarbascu VanBarbascu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments and refactored the process_accepted_blocks function to separate the monitoring logic.

let epoch_info = client.runtime_adapter.get_epoch_info(&head.epoch_id);
epoch_info.map(|epoch_info| {
for (shard_id, validators) in
epoch_info.chunk_producers_settlement().into_iter().enumerate()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That array holds the validators for each shard_id. I think that this is the easiest way to get the shard_id.

epoch_info.chunk_producers_settlement().into_iter().enumerate()
{
let is_chunk_producer_for_shard = validators.iter().any(|&validator_id| {
epoch_info.validator_account_id(validator_id).clone() == account_id
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, Thanks for catching this!

let is_chunk_producer_for_shard = validators.iter().any(|&validator_id| {
epoch_info.validator_account_id(validator_id).clone() == account_id
});
metrics::IS_CHUNK_PRODUCER_FOR_SHARD
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

pub(crate) static IS_CHUNK_PRODUCER_FOR_SHARD: Lazy<IntGaugeVec> = Lazy::new(|| {
try_create_int_gauge_vec(
"near_is_chunk_producer_for_shard",
"Number of blocks expected to be produced by a validator",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-pasta

CHANGELOG.md Outdated
@@ -9,6 +9,8 @@
* Experimental option to dump state of every epoch to external storage. [#8661](https://github.com/near/nearcore/pull/8661)
* State-viewer tool to dump and apply state changes from/to a range of blocks [#8628](https://github.com/near/nearcore/pull/8628)
* Node can restart if State Sync gets interrupted [#8732](https://github.com/near/nearcore/pull/8732)
* Add prometheus metrics for tracked shards, block height within epoch, if is block/chunk producer
>>>>>>> f4538a8ce (feat: changelog for PR#8728)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge artifact

@VanBarbascu VanBarbascu force-pushed the master branch 2 times, most recently from 557d152 to 495a828 Compare March 16, 2023 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants