Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist downloaded atx identifiers and download them in background #5553

Closed
dshulyak opened this issue Feb 8, 2024 · 0 comments
Closed

persist downloaded atx identifiers and download them in background #5553

dshulyak opened this issue Feb 8, 2024 · 0 comments
Assignees

Comments

@dshulyak
Copy link
Contributor

dshulyak commented Feb 8, 2024

in the implementation we are using request to get the set of existing activations known by peer for two purposes:

  • download all activations during initial synchronization
  • download the difference with the peer

this request is not cheap, growth linearly with the size of activations and became very problematic with the growth of the number of activations. in many cases the number of such requests was due to the other problematic sync mechanism #5522. but even with that disabled we are running it on every restart for activations known in the last epoch every time node restart. if it fails node can't sync full activations, and if node is restarted it has to be retried all over again.

overview

once we successfully downloaded N sets of activation identitiers, we can persist them in the database. N can be 1, but can be also larger so that we can ask avoid asking poorly synced node. once they are saved we don't have to continuously ask peers on restarts, and can instead download activations from the set saved in the database.

all activations that target current epoch or below have to be downloaded before downloading ballots. but for the ongoing activations, that target next epoch, we can offload them to the background thread. this way they will not block "syncedness" progress on restart and will enable faster rejoin if node was offline for short period.

also in that same background thread we can ask random peer if they learned any more atxs. this can be done rarely e.g every 2 hours.

storing identifiers

we want to avoid asking nodes for possibly invalid identifiers, as it creates trivial dos opportunity. malicious nodes may create false set of activation identifiers, send them out to the network and make everyone ask for them repeatedly.

to prevent that we should track how many times activation was asked for and failed to be downloaded. and for example stop asking for it after we tried to download it 2 times. and reset this counter every time when someone advertises that his node knows about such identifier through get_epoch_info response. and we should prioritize asking peer that advertised such identifier, this is already implemented, but this information will be also lost on restart.

@dshulyak dshulyak self-assigned this Feb 8, 2024
@dshulyak dshulyak changed the title persist download atx identifiers and update them in background persist downloaded atx identifiers and update them in background Feb 8, 2024
@dshulyak dshulyak changed the title persist downloaded atx identifiers and update them in background persist downloaded atx identifiers and download them in background Feb 12, 2024
@dshulyak dshulyak removed their assignment Feb 12, 2024
@dshulyak dshulyak self-assigned this Feb 23, 2024
spacemesh-bors bot pushed a commit that referenced this issue Mar 7, 2024
part of: #5553

when requested we ask configured number of peers for epoch info (collection of atxs from that epoch). on a successful response we save known ids, and will ask again only in 30 minutes (configurable). also on restart we check persisted data, and potentially avoiding eager queries, if last query was made close to the epoch end.

concurrently with requesting epoch info updates, we will download atxs from peers. download is scheduled in batches, so that we can report progress. if peer advertised invalid atx id, we will evict such id after reaching max number of retries (20 in the pr).

to make error checking possible i extended errors emitted by p2p/server and fetcher.
spacemesh-bors bot pushed a commit that referenced this issue Mar 7, 2024
part of: #5553

when requested we ask configured number of peers for epoch info (collection of atxs from that epoch). on a successful response we save known ids, and will ask again only in 30 minutes (configurable). also on restart we check persisted data, and potentially avoiding eager queries, if last query was made close to the epoch end.

concurrently with requesting epoch info updates, we will download atxs from peers. download is scheduled in batches, so that we can report progress. if peer advertised invalid atx id, we will evict such id after reaching max number of retries (20 in the pr).

to make error checking possible i extended errors emitted by p2p/server and fetcher.
spacemesh-bors bot pushed a commit that referenced this issue Mar 7, 2024
part of: #5553

when requested we ask configured number of peers for epoch info (collection of atxs from that epoch). on a successful response we save known ids, and will ask again only in 30 minutes (configurable). also on restart we check persisted data, and potentially avoiding eager queries, if last query was made close to the epoch end.

concurrently with requesting epoch info updates, we will download atxs from peers. download is scheduled in batches, so that we can report progress. if peer advertised invalid atx id, we will evict such id after reaching max number of retries (20 in the pr).

to make error checking possible i extended errors emitted by p2p/server and fetcher.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
1 participant