Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashgraph Branch Detector #13284

Closed
cody-littley opened this issue May 14, 2024 · 13 comments · Fixed by #13518
Closed

Hashgraph Branch Detector #13284

cody-littley opened this issue May 14, 2024 · 13 comments · Fixed by #13518
Assignees
Milestone

Comments

@cody-littley
Copy link
Contributor

cody-littley commented May 14, 2024

Create a simple component that detects hashgraph branching.

Output

For the time being, the output of the branch detector will be metrics and rate-limited logging. Eventually, the node may decide to take additional actions when it observes branching.

The following metrics will be produced by the node:

  • the number of branching events per second
  • the number of unique nodes that have a non-ancient branching event
  • the fraction (out of 1.0) of nodes by weight that have a non-ancient branching event

Algorithm

  • Create a map from NodeId to the event descriptor of the most recent event for each node.
    • the map contains the latest known non-ancient event descriptor from each node in the network
    • if there are no known non-ancient events from a creator, the map has a null entry
  • for each event in topological order
    • if the event's self parent is not the in the map, flag it as a branching event
    • make an exception if the self parent is ancient and the node has a null entry in the map

Architecture

Note: the event window is wired as input to the stale event detector. Our standard for the wiring diagram is to treat any text under the component name as a "hidden input", i.e. an input where we don't draw the arrow (to reduce clutter).

branchDetector

@cody-littley cody-littley added this to the v0.51 milestone May 14, 2024
@cody-littley cody-littley self-assigned this May 14, 2024
@poulok
Copy link
Member

poulok commented May 14, 2024

Would the metric just be a single counter for each branched event detected or a counter per node? In general I think we need to move away from per-node metrics. Perhaps a single counter is sufficient, paired with a log message that says which creator branched.

@cody-littley
Copy link
Contributor Author

Would the metric just be a single counter for each branched event detected or a counter per node? In general I think we need to move away from per-node metrics. Perhaps a single counter is sufficient, paired with a log message that says which creator branched.

We could do either. If a per-node metric is not desirable, we could just go with a single metric for all nodes plus a rate limited log.

@lpetrovic05
Copy link
Member

What would the metric look like? number of nodes branching? if so, when would this value be reset?

@cody-littley
Copy link
Contributor Author

@lpetrovic05

What would the metric look like? number of nodes branching? if so, when would this value be reset?

If it were just a single metric, then I'd just count the number of branches per second. For figuring out which nodes are branching I'd use logging. One rate limited logger per node configured to log at most once per 10 minutes. The metric would alert us to the branching, and we could use the logs to figure out how severe the branching was.

@lpetrovic05
Copy link
Member

the rate limited logging might not be great. we would detect the first branch, log it, then silence for 10 min. we would not know how many nodes are branching

@poulok
Copy link
Member

poulok commented May 15, 2024

the rate limited logging might not be great. we would detect the first branch, log it, then silence for 10 min. we would not know how many nodes are branching

Maybe we need to get more details about how we count a branch. Should we would increment a counter every time a new branch is created, and decrement it every time the tip of a branch goes ancient? This would tell us how many times a node created a branch new branch instead of continuing a branched line and would be a single metric.

Would it be worth tracking the number of events in a branch in addition to this?

@cody-littley
Copy link
Contributor Author

the rate limited logging might not be great. we would detect the first branch, log it, then silence for 10 min. we would not know how many nodes are branching

I agree, could hide data. But it's probably the best we can do if we don't want to have one metric per node.

@cody-littley
Copy link
Contributor Author

cody-littley commented May 16, 2024

Maybe we need to get more details about how we count a branch. Should we would increment a counter every time a new branch is created, and decrement it every time the tip of a branch goes ancient?

This significantly complicates the implementation. My suggestion is to do something simple so that we catch branching in the short term, and consider saving deeper anaysis for a future effort.

@rbair23
Copy link
Member

rbair23 commented May 17, 2024

Does this box need to get information from hashgraph as to which round is ancient?

@rbair23
Copy link
Member

rbair23 commented May 17, 2024

Modulo any details around how metrics/logs are counted and reported, the idea is a +1 from me. Will be handy now and the basis of what we need to do in the future.

@lpetrovic05
Copy link
Member

I suggest a metric like maxNonAncientWeight. Keeps track of the maximum number of weight (as a fraction of total weight) at any given time, this can immediately tell us if an ISS was caused by branching

@cody-littley
Copy link
Contributor Author

Does this box need to get information from hashgraph as to which round is ancient?

Yes, we will wire it up to the event window wire. There is a bunch of stuff in the intake pipeline which currently subscribes to that data flow.

@cody-littley
Copy link
Contributor Author

I suggest a metric like maxNonAncientWeight. Keeps track of the maximum number of weight (as a fraction of total weight) at any given time, this can immediately tell us if an ISS was caused by branching

I will add this metric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants