-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
persist downloaded atx identifiers and download them in background #5553
Comments
This was referenced Feb 24, 2024
spacemesh-bors bot
pushed a commit
that referenced
this issue
Mar 7, 2024
part of: #5553 when requested we ask configured number of peers for epoch info (collection of atxs from that epoch). on a successful response we save known ids, and will ask again only in 30 minutes (configurable). also on restart we check persisted data, and potentially avoiding eager queries, if last query was made close to the epoch end. concurrently with requesting epoch info updates, we will download atxs from peers. download is scheduled in batches, so that we can report progress. if peer advertised invalid atx id, we will evict such id after reaching max number of retries (20 in the pr). to make error checking possible i extended errors emitted by p2p/server and fetcher.
spacemesh-bors bot
pushed a commit
that referenced
this issue
Mar 7, 2024
part of: #5553 when requested we ask configured number of peers for epoch info (collection of atxs from that epoch). on a successful response we save known ids, and will ask again only in 30 minutes (configurable). also on restart we check persisted data, and potentially avoiding eager queries, if last query was made close to the epoch end. concurrently with requesting epoch info updates, we will download atxs from peers. download is scheduled in batches, so that we can report progress. if peer advertised invalid atx id, we will evict such id after reaching max number of retries (20 in the pr). to make error checking possible i extended errors emitted by p2p/server and fetcher.
spacemesh-bors bot
pushed a commit
that referenced
this issue
Mar 7, 2024
part of: #5553 when requested we ask configured number of peers for epoch info (collection of atxs from that epoch). on a successful response we save known ids, and will ask again only in 30 minutes (configurable). also on restart we check persisted data, and potentially avoiding eager queries, if last query was made close to the epoch end. concurrently with requesting epoch info updates, we will download atxs from peers. download is scheduled in batches, so that we can report progress. if peer advertised invalid atx id, we will evict such id after reaching max number of retries (20 in the pr). to make error checking possible i extended errors emitted by p2p/server and fetcher.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
in the implementation we are using request to get the set of existing activations known by peer for two purposes:
this request is not cheap, growth linearly with the size of activations and became very problematic with the growth of the number of activations. in many cases the number of such requests was due to the other problematic sync mechanism #5522. but even with that disabled we are running it on every restart for activations known in the last epoch every time node restart. if it fails node can't sync full activations, and if node is restarted it has to be retried all over again.
overview
once we successfully downloaded N sets of activation identitiers, we can persist them in the database. N can be 1, but can be also larger so that we can ask avoid asking poorly synced node. once they are saved we don't have to continuously ask peers on restarts, and can instead download activations from the set saved in the database.
all activations that target current epoch or below have to be downloaded before downloading ballots. but for the ongoing activations, that target next epoch, we can offload them to the background thread. this way they will not block "syncedness" progress on restart and will enable faster rejoin if node was offline for short period.
also in that same background thread we can ask random peer if they learned any more atxs. this can be done rarely e.g every 2 hours.
storing identifiers
we want to avoid asking nodes for possibly invalid identifiers, as it creates trivial dos opportunity. malicious nodes may create false set of activation identifiers, send them out to the network and make everyone ask for them repeatedly.
to prevent that we should track how many times activation was asked for and failed to be downloaded. and for example stop asking for it after we tried to download it 2 times. and reset this counter every time when someone advertises that his node knows about such identifier through get_epoch_info response. and we should prioritize asking peer that advertised such identifier, this is already implemented, but this information will be also lost on restart.
The text was updated successfully, but these errors were encountered: