You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We disabled the code above because "completed" in EpochSlots doesn't necessarily mean "confirmed". Aka you can have blockstore mark "completed" slots that are still erroring on replay (InvalidTickCount for instance!). Thus InvalidTickCount errors were causing slots to continually be dumped and repaired, spamming the network. The proposed solution: Add confirmed slots to EpochSlots #10246. This PR essentially takes half of the EpochSlots and repurposes them to be "confirmed" EpochSlots. Validators then only gossip slots in these "confirmed" EpcohSlots that they saw > 2/3 of the network voting on (confirmed!). Then for 1) instead of dumping and repairing on >1/3 "completed", you would dump and repair on >1/3 "confirmed" (means at least one good validator saw >2/3 network confirm that slot) which would solve the issue with 2
Dumping a slot now also needs to purge those nodes from the HeaviestSubtreeForkChoice structures, which were introduced in repair_weight.rs and replay_stage.rs after this code for Add confirmed slots to EpochSlots #10246 was disabled
The text was updated successfully, but these errors were encountered:
Problem
Due to issue #10082, we disabled repairing alternative versions of duplicate slots.
Proposed Solution
Need to re-enable code disabled for this issue this to support dumping duplicate slots that were confirmed by network.
Remaining work items:
This disabled code here in
repair_service
: https://github.com/solana-labs/solana/blob/master/core/src/repair_service.rs#L223-L246 will dump slots and try to repair them if it sees >1/3 of people have gossiped as "completed" through EpochSlots. "completed" is when blockstore receives all the shreds.We disabled the code above because "completed" in EpochSlots doesn't necessarily mean "confirmed". Aka you can have blockstore mark "completed" slots that are still erroring on replay (InvalidTickCount for instance!). Thus InvalidTickCount errors were causing slots to continually be dumped and repaired, spamming the network. The proposed solution: Add confirmed slots to EpochSlots #10246. This PR essentially takes half of the EpochSlots and repurposes them to be "confirmed" EpochSlots. Validators then only gossip slots in these "confirmed" EpcohSlots that they saw > 2/3 of the network voting on (confirmed!). Then for 1) instead of dumping and repairing on >1/3 "completed", you would dump and repair on >1/3 "confirmed" (means at least one good validator saw
>2/3
network confirm that slot) which would solve the issue with 2Dumping a slot now also needs to purge those nodes from the
HeaviestSubtreeForkChoice
structures, which were introduced inrepair_weight.rs
andreplay_stage.rs
after this code for Add confirmed slots to EpochSlots #10246 was disabledThe text was updated successfully, but these errors were encountered: