Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

op-node: Add a histogram to report current peer scores #5870

Merged
merged 4 commits into from
Jun 6, 2023

Conversation

ajsutton
Copy link
Contributor

@ajsutton ajsutton commented Jun 2, 2023

Description

Reports the current peer scores in a histogram that is replaced with each update cycle so that it only shows the current score distribution, not observations of each score as it changes. Preserving the history of observations in the histogram would effectively hide banned peer scores because they get disconnected so quickly and thus spend almost no time with such a low score but the rest of the time their connected would contribute to better score observations. Similarly "good" peers have their observations smeared from 0 up to their current score, making it impossible to see the current score.

The component scores that make up the total are reported in separate labels. Getting these values requires returning the updated values from SetScores so that SnapshotHook has the full set of scores instead of just the gossip scores. The full set of scores is already loaded as part of applying the diff so can efficiently be returned.

Additional context

Need to do some more serious testing with how this works with prometheus histogram functions given the resetting nature of the histogram is a bit unusual. If it works well, we can then remove the older score band based metrics since that data can be derived from the histogram.

Metadata

  • Fixes #[Link to Issue]

TODOs

@changeset-bot
Copy link

changeset-bot bot commented Jun 2, 2023

⚠️ No Changeset found

Latest commit: 74419cc

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@netlify
Copy link

netlify bot commented Jun 2, 2023

Deploy Preview for opstack-docs canceled.

Name Link
🔨 Latest commit 74419cc
🔍 Latest deploy log https://app.netlify.com/sites/opstack-docs/deploys/647f9812d1ab210008e3e2f2

@mergify
Copy link
Contributor

mergify bot commented Jun 2, 2023

Hey @ajsutton! This PR has merge conflicts. Please fix them before continuing review.

@codecov
Copy link

codecov bot commented Jun 4, 2023

Codecov Report

Merging #5870 (74419cc) into develop (0638daf) will decrease coverage by 2.54%.
The diff coverage is 46.15%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5870      +/-   ##
===========================================
- Coverage    43.00%   40.46%   -2.54%     
===========================================
  Files          477      320     -157     
  Lines        30816    26110    -4706     
  Branches       877        0     -877     
===========================================
- Hits         13252    10566    -2686     
+ Misses       16543    14560    -1983     
+ Partials      1021      984      -37     
Flag Coverage Δ
bedrock-go-tests 40.46% <46.15%> (-0.15%) ⬇️
common-ts-tests ?
contracts-bedrock-tests ?
contracts-tests ?
core-utils-tests ?
dtl-tests ?
fault-detector-tests ?
sdk-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
op-node/flags/p2p_flags.go 100.00% <ø> (ø)
op-node/metrics/metrics.go 3.01% <0.00%> (ø)
op-node/p2p/cli/load_config.go 0.00% <ø> (ø)
op-node/p2p/config.go 30.00% <ø> (-2.70%) ⬇️
op-node/p2p/mocks/GossipMetricer.go 0.00% <ø> (-20.00%) ⬇️
op-node/p2p/prepared.go 0.00% <ø> (ø)
op-node/p2p/store/iface.go 100.00% <ø> (ø)
op-node/p2p/mocks/ScoreMetrics.go 25.00% <25.00%> (ø)
op-node/p2p/store/records_book.go 57.60% <40.00%> (ø)
op-node/p2p/mocks/Peerstore.go 21.73% <50.00%> (+2.82%) ⬆️
... and 6 more

... and 161 files with indirect coverage changes

@ajsutton
Copy link
Contributor Author

ajsutton commented Jun 4, 2023

This does work - the heat map takes a bit of fiddling with options to make it work because most peers have a 0 score. Overtime they spread out a bit and will be more visible but there's still a heavy dominance of 0 scores in the data.

The catch is that when working with the data you need to avoid using rate even though you would typically use that with histograms (and Grafana's suggestions will insert it).

image

Will try just a plain histogram and see if rate can avoid the smear effect.

@ajsutton
Copy link
Contributor Author

ajsutton commented Jun 5, 2023

It does work when using rate and a normal histogram so that's certainly simpler.

image

But you can get the same heat map from the existing metric and the values displayed in the tooltips are then the actual number of peers in that bucket:
image

@ajsutton
Copy link
Contributor Author

ajsutton commented Jun 5, 2023

I think the two options we should consider are:

  1. Continue with the band based scoring we have now, just update it to report the component values as well as the total
  2. Use a standard histogram rather than resetting repeatedly. This will mean the current number of peers in each bucket isn't available but using rate you can still get a heat map that gives a good overview of where peer scores sit relative to each other. More specific details can be retrieved via RPC requests. Remove the existing band based scoring. This is then the simplest metric code.

I'm leaning towards option 2 just for the simplicity of the code. I've updated this PR to reflect that but can easily revert the last couple of transactions if we want to keep the band based scores or go with the resetting histogram to only show current peer scores.

@ajsutton ajsutton marked this pull request as ready for review June 5, 2023 04:54
@ajsutton ajsutton requested a review from a team as a code owner June 5, 2023 04:54
Copy link
Contributor

@protolambda protolambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, nice to have more detailed score metrics without metrics label churn of peerids.
One nit on doc comment, and I think we need to be more careful with the flag removal.

op-node/metrics/metrics.go Outdated Show resolved Hide resolved
op-node/flags/p2p_flags.go Show resolved Hide resolved
@mergify
Copy link
Contributor

mergify bot commented Jun 6, 2023

This PR has been added to the merge queue, and will be merged soon.

@mergify
Copy link
Contributor

mergify bot commented Jun 6, 2023

This PR is next in line to be merged, and will be merged as soon as checks pass.

1 similar comment
@mergify
Copy link
Contributor

mergify bot commented Jun 6, 2023

This PR is next in line to be merged, and will be merged as soon as checks pass.

@OptimismBot OptimismBot merged commit c79a1de into develop Jun 6, 2023
79 of 80 checks passed
@OptimismBot OptimismBot deleted the aj/score-histogram branch June 6, 2023 20:48
@mergify mergify bot removed the on-merge-train label Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants