-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Retrial behavior in retrieval #1780
Conversation
bef39b9
to
58d2295
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @metacertain)
pkg/retrieval/retrieval.go
Outdated
retrieveChunkTimeout = 10 * time.Second | ||
|
||
retrieveRetryIntervalDuration = 5 * time.Second | ||
) | ||
|
||
func (s *Service) RetrieveChunk(ctx context.Context, addr swarm.Address) (swarm.Chunk, error) { | ||
func (s *Service) RetrieveChunk(ctx context.Context, addr swarm.Address, origin bool) (swarm.Chunk, error) { | ||
s.metrics.RequestCounter.Inc() | ||
|
||
v, err, _ := s.singleflight.Do(addr.String(), func() (interface{}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a chunk is already retrieved with origin false, does that singleflight here not imply that we would never try with maxPeers=8
if a origin true call happens at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, the origin flag should be part of the singleflight key.
b406e1e
to
ff99f01
Compare
Let's merge this after the upcoming release (0.6.2) |
ae5f4c8
to
e18113d
Compare
aa06a82
to
150d81f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 6 files reviewed, 2 unresolved discussions (waiting on @aloknerurkar, @metacertain, and @ralph-pichler)
pkg/retrieval/retrieval.go, line 111 at r2 (raw file):
} v, err, _ := s.singleflight.Do(flightRoute, func() (interface{}, error) {
Problem with this singleFlight pattern is that the parent context cancellation does not cancel the singleFlight requests which are being shared, resulting in goroutine leaks. This was inspected during #1891 and we decided to use @janos 's lib which adds context semantics correctly to singleFlight. I would recommend using that here.
pkg/retrieval/retrieval.go, line 187 at r2 (raw file):
requestAttempt++ timeNow := time.Now().Unix() if timeNow > lastTime {
When will this not be true?
if we are in the same second still |
150d81f
to
695324e
Compare
pkg/retrieval/retrieval.go
Outdated
if !peer.IsZero() { | ||
logger.Debugf("retrieval: failed to get chunk %s from peer %s: %v", addr, peer, err) | ||
} | ||
for requestAttempt < 5 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should stay consistent and have this as a variable at the top of the file
|
||
requestAttempt++ | ||
timeNow := time.Now().Unix() | ||
if timeNow > lastTime { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would always be true, no? What is this checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are in the same second still
we wait 600 miliseconds only so we should get into the next second within 2 rounds of requestAttempts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, can we add a test case that covers the area starting at requestAttempt++
and down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leave the judgement to you if this is necessary
bd0e3fc
to
aefba42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r4.
Reviewable status: 3 of 6 files reviewed, 2 unresolved discussions (waiting on @aloknerurkar, @esadakar, and @ralph-pichler)
pkg/retrieval/retrieval.go, line 188 at r3 (raw file): Previously, metacertain wrote…
It would be good if you can add a comment here explaining this. |
89f551e
to
fad6396
Compare
fad6396
to
f934c85
Compare
This pr attempts to mirror some of the push sync behaviour changes introduced in #1662 and to enable retrying until a chunk is attempted to be retrieved from at least one peer
This change is