New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Metric to Track Greatest Epoch Final View #698
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One note to use the existing metrics field, otherwise looks great 👍
Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>
Hi Danu. Would we not want to have unit tests for these changes? |
Hey Danu, you can include tests for opening the state here and for bumping the view upon committing an epoch in the epoch flow test by using the mock metrics object rather than the noop. |
Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>
Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - just a reminder, you will have to Greenlist the metric as well by following this: https://www.notion.so/dapperlabs/Prometheus-metric-greenlisting-da74d2352b8f440185b1113270ba8bc5
I am also doing it for a metric that I have added - https://github.com/dapperlabs/dapper-flow-hosting/pull/174
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments. I would like to flag that I think the metric will not have a value once we restart the consensus nodes (we sometimes do this during mainnet fires):
- the metric is only set during bootstrapping or once an EpochCommit service event is encountered. The latter could happen only once during an Epoch.
- So if we restart the consensus node, the value for the metric might not be set anymore for the remainder of the epoch, which kind of defeats its purpose.
The suggestion I made will partially mitigate this issue, as the metric is then updated on every finalized block. However, we might also consider setting the metric value when recovering from a node crash, i.e. in this method:
flow-go/state/protocol/badger/state.go
Line 398 in 861c833
func OpenState( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the revisions. Looks good. Just a couple minor comments.
Another suggested I wanted to make is:
- shall we also call
flow-go/state/protocol/badger/state.go
Line 490 in 9e2a2b6
func updateCommittedEpochFinalView(state *State) error { Bootstrap
, e.g. here:flow-go/state/protocol/badger/state.go
Line 118 in 9e2a2b6
return state, nil State
instance.
Codecov Report
@@ Coverage Diff @@
## master #698 +/- ##
=========================================
Coverage ? 56.43%
=========================================
Files ? 423
Lines ? 24816
Branches ? 0
=========================================
Hits ? 14006
Misses ? 8915
Partials ? 1895
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>
Greenlist metric in https://github.com/dapperlabs/dapper-flow-hosting/pull/198 |
Resolves https://github.com/dapperlabs/flow-go/issues/5373.