Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable and finish work for repairing duplicate slots confirmed by the network #11713

Open
carllin opened this issue Aug 19, 2020 · 0 comments
Assignees
Milestone

Comments

@carllin
Copy link
Contributor

carllin commented Aug 19, 2020

Problem

Due to issue #10082, we disabled repairing alternative versions of duplicate slots.

Proposed Solution

Need to re-enable code disabled for this issue this to support dumping duplicate slots that were confirmed by network.

Remaining work items:

  1. This disabled code here in repair_service: https://github.com/solana-labs/solana/blob/master/core/src/repair_service.rs#L223-L246 will dump slots and try to repair them if it sees >1/3 of people have gossiped as "completed" through EpochSlots. "completed" is when blockstore receives all the shreds.

  2. We disabled the code above because "completed" in EpochSlots doesn't necessarily mean "confirmed". Aka you can have blockstore mark "completed" slots that are still erroring on replay (InvalidTickCount for instance!). Thus InvalidTickCount errors were causing slots to continually be dumped and repaired, spamming the network. The proposed solution: Add confirmed slots to EpochSlots #10246. This PR essentially takes half of the EpochSlots and repurposes them to be "confirmed" EpochSlots. Validators then only gossip slots in these "confirmed" EpcohSlots that they saw > 2/3 of the network voting on (confirmed!). Then for 1) instead of dumping and repairing on >1/3 "completed", you would dump and repair on >1/3 "confirmed" (means at least one good validator saw >2/3 network confirm that slot) which would solve the issue with 2

  3. Dumping a slot now also needs to purge those nodes from the HeaviestSubtreeForkChoice structures, which were introduced in repair_weight.rs and replay_stage.rs after this code for Add confirmed slots to EpochSlots #10246 was disabled

@carllin carllin added this to the The Future! milestone Aug 19, 2020
@carllin carllin self-assigned this Aug 19, 2020
@mvines mvines modified the milestones: The Future!, v1.4.0 Aug 19, 2020
@mvines mvines modified the milestones: v1.4.0, v1.5.0 Oct 8, 2020
@mvines mvines modified the milestones: v1.5.0, v1.6.0 Dec 17, 2020
@mvines mvines modified the milestones: v1.6.0, v1.7.0 Mar 11, 2021
@mvines mvines modified the milestones: v1.7.0, v1.8.0 May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants