-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tortoise: update opinion on the terminated layers within sliding window when received new data #3573
Comments
@countvonzero i think existing interface for rerun was based on misinformed design. we definitely can't have just that interface. and most likely that interface is not even needed, because we can't recover from partition that lasted longer than sliding window anyway |
i am not sure which interface you are referring to. do you mean this one?
? in my prototype, i changed this to
my plan is: for every ballot, we do in the following order
and there are two scenarios that will trigger rerun()
|
yes basically about rerun
i don't understand how this interface can be supported using current implementation. rerun can start either from genesis, or within sliding window. and this issue describes what we need to change for it to work within sliding window.
those two are not sufficient. as i mentioned opinion may change for other reasons |
how do you know the weight? it was just communicated over rpc? i don't think that we can use this, anybody can trigger rerun by communicating high weight and a different hash |
if rerun can start from genesis (which i assume is outside the sliding window), why can't it start from a random layer in this range
in the scenarios you listed
i think this is consistent with
i don't understand how these could happen if known ATX weight remain constant. i thought the security assumption is, once a block cross the global threshold, it's final until more ATX weight are discovered.
so i'm adding fetch API (direct P2P request) to support hash resolution. when syncer finds that it has a different accumulated hash from a peer X, it requests the list of ATXs from X. the node needs to determine how much new ATXs weight X introduced and whether the weight is enough for it to change opinions. whatever weight X claims it has, it needs to back it up by the actual ATXs. note that syncer currently download ATXs in two ways
|
tortoise votes encoding is recursive. there might be a way to bootstrap tortoise from arbitrary layer, but there are many assumptions that needs to be handled. currently such interface is not supported
there is no such assumption. there is an assumption that if global threshold is crossed and adversarial weight is cancelled - node still crosses local threshold. but otherwise crossing global threshold after counting votes from one layer doesn't guarantee any finality every ballot votes on all previous layers, therefore opinion may change because of voting even if crossed global threshold. moreover global threshold total value depends on the number of layers that can vote on a layer that supposed to be verified |
thanks for the explanation. do you think it can be simplified and made possible after mutable ballot is made possible?
right. my mental model keeps forgetting that global threshold is a moving target as layers grow. |
closes: #3569 main changes: - removed auxiliary maps to track various data about blocks and ballots. all data is stored directly in layerInfo, ballotInfo or blockInfo objects. - votes are represented as a reverse linked list. child ballot copies votes list from the base ballot and extends it with new votes. if it overwrites some of the opinions below base ballot - it creates a copy of the list starting at a layer where opinion was changed. in future there is a room for optimization by reusing list for ballots that vote consistently with local opinion. - votes outside of the window are evicted by iterating reverse linked list and dropping pointers for votes that are outside sliding window. i didn't notice any overhead with window size of 10 000 - removed 'abstained' condition that was used for verifying tortoise, as new data structure allows to inherit abstained votes without additional work other info: - this change doesn't count votes for blocks that are received after layer was terminated. change is already big and that bug fits in the scope of #3573 - representing votes this way allows to simplify the process of encoding votes, i will address it in future prs - in profile it is obvious that doing weight computation with big.Rat with large sliding window is inefficient. using https://github.com/spacemeshos/fixed allows to reduce BenchmarkRerun by almost two times and lowers memory requirement. most likely will switch to it in the future pr
extracted from #3573 this refactoring makes it unnecessary to modify layer votes for each ballot after late block is received. for example late block can be received after recovering from partition.
depends on: #3569 closes: #3575 closes: #2919 1. onBlock and onBallot refactored in a way to avoid failures if required state wasn't loaded and to support counting ballots separately onBlock will not fail if reference height wasn't computed yet. reference height is always computed together with the first layer in the epoch. this change makes it safe to call onBlock before such first layer was processed. onBallot will not fail if beacon is not yet available. beacon needs to be known only before counting ballot (in both modes). this change (and changes to onBlock) allows to decode votes before calling a method to verify a layer, in practice decoding votes will be the most expensive operation. with an exception of counting votes in full mode. this refactoring also allows to seamlessly count votes from late ballot without any additional logic in place, this will be required for #3573. 2. new TallyVotes method this method is safe to call whenever node needs to know the latest local opinion, in current codebase it is called: - to check that local opinion is consistent with previously applied blocks - before casting votes. this ensures that the node always votes according to the latest changes in the opinion
depends on #3569
once layer is verified we do not revisit opinion on that layer unless rerun is triggered. it was assumed that syncer will trigger a rerun when it notices a good reason for it. opinion may change because of the following cases:
from that list above only 2nd case can be tracked successfully by syncer, in other cases it is either impossible (4th) or inefficient (1st and 3rd) to trigger full rerun.
implementation plan
revisit opinions on terminated layer if changes are within sliding window
epoch weight
it is already computed in current implementation, but there are some changes:
OnAtx handler
add OnAtx handler to the tortoise. every time when tortoise receives new atx update expected weight by adding full atx weight or a fraction, using same computation as above. OnAtx is expected to be called before OnBallot that references such atx.
related tortoise state
changes in the counting weight
count weight for layers past verified layer within whole sliding window. notify mesh to perform a revert if opinion changed.
full tortoise doesn't need any changes. but remember to profile implementation how long it takes to count votes from a single ballot for 2 000 - 10 000 previous layers.
verifying tortoise needs to be refactored avoid changes to the total good weight when layer is verified.
The text was updated successfully, but these errors were encountered: