fix: retrieval rewrite #3428

istae · 2022-10-13T14:58:55Z

Checklist

I have read the coding guide.
My change requires a documentation update, and I have done it.
I have added tests to cover my changes.
I have filled out the description and linked the related issues.

Description

The PR simplifies the retrieval logic while maintaining preventive and on error retries.
Tests passed without any edits.

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

ethersphere/bee-backlog#36

Screenshots (if appropriate):

This change is

istae · 2022-10-17T12:42:25Z

@metacertain writes:

so the definition of requestRound in current retrieval is actually continuing with a skiplist that has the overdraft entries removed by a skiplist reset, and the time difference between the beginning of 2 requestRounds is at least 600 milliseconds (with 1024 rounds this covers around 10 minutes at least)
this currently enables an ultralightnode trying to retrieve all the chunks of a 200 MB file to keep trying for up to 10 minutes to get a peer with free accounting balance for each of these chunks, so sooner or later they will be retrieved.
the effective filesize limit that you can get with an ultralight node comes from this 10 minute window and the / second free bandwidth, so 600 * 7 KB/sec from each peer ( lets say with 64 peers this means 600 * 448 KB ~ 270 MBs )

without this, we would try to retrieve too many chunks at the same time from our peers, and many of the chunks would fail because all accounting balances are saturated with pending requests after a short while (especially as an ultra light node)

the current logic is, for up to 1024 times, we start with a reset skiplist that only contains nodes with real failures (either request failed to send or later error)
each of these times, we select up to 32 next-closest addresses and try to get accounting enable the request etc

this means the maximum number of next closest peer selections is 1024 * 32, but we will give up much sooner in case we run out of peers with no failure / overdraft or if we have enough real failures after successful requests sent

we give up (as an originator) when 32 succesfully sent requests got an actual failure in any of the rounds
we give up (as a forwarder) when 1 successfully sent request got an actual failure (and there are no rounds, just 1 time 32 next-closest addresses are selected)
we also give up if we run out of next-closest-peers and nobody on the skiplist is an overdraft entry

what i see in the rewrite attempt, is that it never does a skiplist reset, so overdraft entries are never removed, and never retried when we could later perhaps get free accounting balance
even if this exists, you still need the request rounds so that a number of peers are tried before the skiplist reset, so its not that only 1 peer is constantly waited to be retried in case of an accounting block

istae · 2022-10-17T12:42:49Z

cont..

actually this 1024 limit is completely artificial and could be an infinite loop of request rounds anyway, so until the user explicitly cancels a request its still trying to continue no matter the time

that will give us infinite download size on ultra light node

its not that we would stay in the loop forever unless there are yet to be tried peers anyway (because of accounting blocks)

istae · 2022-10-18T00:32:37Z

The latest commit should address @metacertain's comments

notanatol · 2022-10-31T11:12:52Z

should we have a short mob review session for this one?

pkg/retrieval/retrieval.go

aloknerurkar · 2023-01-27T06:46:26Z

pkg/retrieval/retrieval.go


-		lastTime := time.Now().Unix()
+		resultC := make(chan retrievalResult, 1)
+		retryC := make(chan struct{}, 1)


Why is this a buffered channel?

The retry function has a default case, so its not blocking. If this is buffered we might end up starting more retrieval requests no? For eg if preemtive ticker and error from a previous request call retry at the same time, we should start only 1 new request right? In which case this should not be buffered?

mrekucci

Aside from the comment, LGTM.

mrekucci · 2023-01-27T14:10:34Z

pkg/retrieval/retrieval.go

+		}
+	}()
+
+	defer func() {


No need for a second defer call, the error check can be put at the beginning of the defer above.

janos

LGTM, I am also questioning the buffered channel.

bee-runner bot added the pull-request label Oct 13, 2022

istae mentioned this pull request Oct 13, 2022

Retrieval protocol is unreadable ethersphere/bee-backlog#36

Closed

istae requested review from a team, aloknerurkar and notanatol and removed request for a team October 13, 2022 15:08

istae requested a review from metacertain October 19, 2022 10:17

istae force-pushed the retrieval-rewrite branch from 70e6b4b to 1e25986 Compare November 3, 2022 14:00

istae force-pushed the retrieval-rewrite branch from 23fa316 to 9e8a7f4 Compare January 23, 2023 17:11

mrekucci reviewed Jan 24, 2023

View reviewed changes

pkg/retrieval/retrieval.go Outdated Show resolved Hide resolved

istae requested a review from mrekucci January 26, 2023 12:00

aloknerurkar reviewed Jan 27, 2023

View reviewed changes

istae requested a review from aloknerurkar January 27, 2023 11:00

istae added 6 commits January 27, 2023 15:09

fix: retrieval rewrite

152a07c

fix: reset overdraft list to retry overdraft peers

404a256

fix: cleanup

b22cfdc

fix: inflight

a024bd1

chore: new inflight unit test

31146bf

fix: ctx

3ab381d

istae force-pushed the retrieval-rewrite branch from dac1d18 to a1f538f Compare January 27, 2023 12:21

aloknerurkar approved these changes Jan 27, 2023

View reviewed changes

fix: revert

f6a3a8b

istae force-pushed the retrieval-rewrite branch from a1f538f to f6a3a8b Compare January 27, 2023 14:02

mrekucci approved these changes Jan 27, 2023

View reviewed changes

fix: combined defers

99f9037

janos approved these changes Jan 28, 2023

View reviewed changes

fix: unbuffered retry

df4b388

istae requested a review from aloknerurkar January 28, 2023 08:44

aloknerurkar approved these changes Jan 28, 2023

View reviewed changes

istae merged commit e167428 into master Jan 28, 2023

istae deleted the retrieval-rewrite branch January 28, 2023 09:50

istae added this to the 1.12.0 milestone Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: retrieval rewrite #3428

fix: retrieval rewrite #3428

istae commented Oct 13, 2022 •

edited by acud

Loading

istae commented Oct 17, 2022

istae commented Oct 17, 2022

istae commented Oct 18, 2022 •

edited

Loading

notanatol commented Oct 31, 2022

aloknerurkar Jan 27, 2023

mrekucci left a comment

mrekucci Jan 27, 2023

janos left a comment

fix: retrieval rewrite #3428

fix: retrieval rewrite #3428

Conversation

istae commented Oct 13, 2022 • edited by acud Loading

Checklist

Description

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

istae commented Oct 17, 2022

istae commented Oct 17, 2022

istae commented Oct 18, 2022 • edited Loading

notanatol commented Oct 31, 2022

aloknerurkar Jan 27, 2023

Choose a reason for hiding this comment

mrekucci left a comment

Choose a reason for hiding this comment

mrekucci Jan 27, 2023

Choose a reason for hiding this comment

janos left a comment

Choose a reason for hiding this comment

istae commented Oct 13, 2022 •

edited by acud

Loading

istae commented Oct 18, 2022 •

edited

Loading