Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol upgrade related metrics #5339

Closed
kucharskim opened this issue Nov 17, 2021 · 6 comments
Closed

Protocol upgrade related metrics #5339

kucharskim opened this issue Nov 17, 2021 · 6 comments
Assignees
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. Groomed Node Node team T-node Team: issues relevant to the node experience team

Comments

@kucharskim
Copy link

kucharskim commented Nov 17, 2021

I would like to see via Prometheus metrics (127.0.0.1:3030/metrics) how NEAR protocol_version is progressing. Not sure can granularity be achieved within the epoch and not on the epoch boundary, but if that would be possible it would be great, like:

$ curl -s 127.0.0.1:3030/metrics | grep -e ^near_protocol_upgrade_progress
52

Above is just an example that 52% of validators upgraded to the new protocol version. If upgrade is not happening the metric is missing or 0. Probably missing if no upgrade in progress.

Another metric which would be great is how many blocks (or seconds in the future if epoch is time-based) are left to the cut-off epoch. So for example, after 80% of validators upgraded, and epoch switches:

$ curl -s 127.0.0.1:3030/metrics | grep -e ^near_protocol_upgrade_epoch_left
34207

so above shows that there is 34207 blocks (or 9h30m7s (9*60*60)+(30*60)+7) left before protocol upgrade takes effect. If upgrade is not active, metric is missing.

Based on both metrics we would like to create alarms. If metric is present with value above X (lets say 23hrs) warning alarm is created, but if only 6 hrs are left, then critical alarm is created and oncall is paged. Lack of metric - no alarm.

@janewang janewang assigned janewang and unassigned janewang Nov 17, 2021
@janewang janewang added the T-node Team: issues relevant to the node experience team label Nov 17, 2021
@nikurt nikurt self-assigned this Nov 17, 2021
@janewang janewang added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Nov 17, 2021
@nikurt
Copy link
Contributor

nikurt commented Nov 17, 2021

This issue is complementary to near/NEPs#205 and provides a short-term fix.

@kucharskim
Copy link
Author

With configurable number of epoch for upgrade I think this feature request has even more sense. Especially the second type of metric requested here - blocks (seconds) left before new protocol version is active. That creates clear signal for the node runner, time is running out for an upgrade.

@kucharskim
Copy link
Author

Giving comment in #5331 (comment) with second type of metric (blocks left for protocol switchover) it would be already too late and NEAR would panic() anyway, even if upgrade is done within last 2 epochs.

@stale
Copy link

stale bot commented Feb 16, 2022

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.

@stale stale bot added the S-stale label Feb 16, 2022
@exalate-issue-sync exalate-issue-sync bot added T-nodeX and removed T-node Team: issues relevant to the node experience team labels Jun 28, 2022
@matklad matklad added T-node Team: issues relevant to the node experience team and removed T-nodeX labels Aug 4, 2022
@nikurt
Copy link
Contributor

nikurt commented Nov 8, 2022

Fixed by #7877

@nikurt nikurt closed this as completed Nov 8, 2022
@kucharskim
Copy link
Author

Which nearcore release this is going to be included in?

@gmilescu gmilescu added the Node Node team label Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. Groomed Node Node team T-node Team: issues relevant to the node experience team
Projects
None yet
Development

No branches or pull requests

6 participants