-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: retrieval rewrite #3428
fix: retrieval rewrite #3428
Conversation
@metacertain writes: so the definition of requestRound in current retrieval is actually continuing with a skiplist that has the overdraft entries removed by a skiplist reset, and the time difference between the beginning of 2 requestRounds is at least 600 milliseconds (with 1024 rounds this covers around 10 minutes at least) without this, we would try to retrieve too many chunks at the same time from our peers, and many of the chunks would fail because all accounting balances are saturated with pending requests after a short while (especially as an ultra light node) the current logic is, for up to 1024 times, we start with a reset skiplist that only contains nodes with real failures (either request failed to send or later error) this means the maximum number of next closest peer selections is 1024 * 32, but we will give up much sooner in case we run out of peers with no failure / overdraft or if we have enough real failures after successful requests sent we give up (as an originator) when 32 succesfully sent requests got an actual failure in any of the rounds what i see in the rewrite attempt, is that it never does a skiplist reset, so overdraft entries are never removed, and never retried when we could later perhaps get free accounting balance |
cont.. actually this 1024 limit is completely artificial and could be an infinite loop of request rounds anyway, so until the user explicitly cancels a request its still trying to continue no matter the time that will give us infinite download size on ultra light node its not that we would stay in the loop forever unless there are yet to be tried peers anyway (because of accounting blocks) |
The latest commit should address @metacertain's comments |
should we have a short mob review session for this one? |
70e6b4b
to
1e25986
Compare
23fa316
to
9e8a7f4
Compare
pkg/retrieval/retrieval.go
Outdated
|
||
lastTime := time.Now().Unix() | ||
resultC := make(chan retrievalResult, 1) | ||
retryC := make(chan struct{}, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a buffered channel?
The retry function has a default case, so its not blocking. If this is buffered we might end up starting more retrieval requests no? For eg if preemtive ticker and error from a previous request call retry at the same time, we should start only 1 new request right? In which case this should not be buffered?
dac1d18
to
a1f538f
Compare
a1f538f
to
f6a3a8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the comment, LGTM.
pkg/retrieval/retrieval.go
Outdated
} | ||
}() | ||
|
||
defer func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a second defer
call, the error check can be put at the beginning of the defer
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I am also questioning the buffered channel.
Checklist
Description
The PR simplifies the retrieval logic while maintaining preventive and on error retries.
Tests passed without any edits.
Open API Spec Version Changes (if applicable)
Motivation and Context (Optional)
Related Issue (Optional)
ethersphere/bee-backlog#36
Screenshots (if appropriate):
This change is