Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consistent peer scoring for missing non-finalized parent #3381

Merged
merged 3 commits into from
Sep 16, 2022

Conversation

etan-status
Copy link
Contributor

When the sync queue processes results for a blocks by range request,
and the requested range contained some slots that are already finalized,
BlockError.MissingParent currently leads to PeerScoreBadBlocks even
when the error occurs on a non-finalized slot in the requested range.
This patch changes the scoring in that case to PeerScoreMissingBlocks
for consistency with range requests solely covering non-finalized slots,
and, likewise, rewinds the sync queue to the next rewindSlot.

@github-actions
Copy link

github-actions bot commented Feb 12, 2022

Unit Test Results

       12 files  ±0       860 suites  ±0   1h 13m 22s ⏱️ - 5m 3s
  1 914 tests ±0    1 766 ✔️ ±0  148 💤 ±0  0 ±0 
10 373 runs  ±0  10 167 ✔️ ±0  206 💤 ±0  0 ±0 

Results for commit 84354a9. ± Comparison against base commit 615f366.

♻️ This comment has been updated with latest results.

@zah zah deleted the branch status-im:unstable February 15, 2022 20:57
@zah zah closed this Feb 15, 2022
@etan-status etan-status reopened this Feb 15, 2022
@zah
Copy link
Member

zah commented Feb 15, 2022

@cheatfate, shall we merge this?

@cheatfate
Copy link
Contributor

No i dont think so. This PR legend is about different item and this change actually breaks proper behavior.

@etan-status
Copy link
Contributor Author

My thoughts behind it:

  • req.slot is the startSlot of beaconBlocksByRange P2P request (start/count/step).
  • failSlot is the first one that reported MissingParent when applying them in-order (ascending for Forward, and descending for Backward).
  • safeSlot is the finalized head epoch start slot.

If the range request from req.slot through req.slot+count*step-1 contains safeSlot, e.g., clock advanced, safeSlot advanced because of that, and req.slot got finalized, then, in forward case if error happens after safeSlot (i.e. in non-finalized part), req.slot <= safeSlot < failSlot <= req.slot+count*step-1, this currently leads to PeerScoreBadBlocks, no?

And for backward case, the existing logic seems a bit asymmetric as well, because req.slot refers to the lowest slot, i.e. the slot farthest away from safeSlot.

But yeah, it could definitely also be that this is not a problem as is, and/or that the proposed change is incorrect.

@cheatfate
Copy link
Contributor

if req.slot == safeSlot we are in trouble. It means that we have broken logic or database in some broken state, or we have pretty bad peer which returns incorrect response. if req.slot <> safeSlot depends on type of syncing (Forward/Backward) means that we just met legal case when peer's response are with gaps.

@etan-status
Copy link
Contributor Author

etan-status commented Feb 15, 2022

Neither req.slot nor safeSlot depend on the peer though, req.slot == safeSlot just means that the request we sent out had its startSlot at our finalized head (at time when we received the response).

So, if we send such a request, and peer sends us 30 good blocks, then 1 with missing parent, because maybe it is still doing some backfill sync and does not have one of the intermediate blocks available, we treat it as PeerScoreBadBlocks instead of PeerScoreMissingBlocks.
I understand that if the peer sends us blocks < safeSlot that do not apply, that this is indeed a bad peer. But for future blocks, we don't know if their block is bad, it's just that they may lack some data locally.

If the intention is indeed that behaviour, I still think the backwards sync condition needs an adjustment, checking for safeSlot >= req.slot + req.count * req.step to have the equivalent result. safeSlot is at the highest slot of the requested range, and req.slot is at the lowest one.

EDIT: In that example, the 30 good blocks could actually be enough to advance safeSlot due to a finality change, so safeSlot could even be > req.slot regardless of time advancing.

@etan-status etan-status marked this pull request as draft March 3, 2022 14:34
When the sync queue processes results for a blocks by range request,
and the requested range contained some slots that are already finalized,
`BlockError.MissingParent` currently leads to `PeerScoreBadBlocks` even
when the error occurs on a non-finalized slot in the requested range.
This patch changes the scoring in that case to `PeerScoreMissingBlocks`
for consistency with range requests solely covering non-finalized slots,
and, likewise, rewinds the sync queue to the next `rewindSlot`.
@etan-status
Copy link
Contributor Author

  • Rebase to unstable
  • Extend tests (only passes with the change from this PR)

@etan-status etan-status marked this pull request as ready for review June 1, 2022 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants