Skip to content
This repository has been archived by the owner on Apr 18, 2024. It is now read-only.

Improve retry logic #151

Merged
merged 2 commits into from
Aug 24, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions caboose.go
Original file line number Diff line number Diff line change
Expand Up @@ -85,11 +85,12 @@ const DefaultLoggingInterval = 5 * time.Second
const DefaultSaturnOrchestratorRequestTimeout = 30 * time.Second

const DefaultSaturnBlockRequestTimeout = 19 * time.Second
const DefaultSaturnCarRequestTimeout = 30 * time.Minute

const DefaultSaturnCarRequestTimeout = 1 * time.Minute
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Max P95 for a successful CAR fetch over the last 24 hours in prod is 30 seconds:

https://protocollabs.grafana.net/d/bxJz6bJ4k/bifrost-gw-metrics?orgId=1&from=now-24h&to=now&editPanel=134

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that a large movie won't work anymore?

I don't think this is an acceptable limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I see what you mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willscott Reverted this change.

const DefaultSaturnMirrorRequestTimeout = 30 * time.Second

// default retries before failure unless overridden by MaxRetrievalAttempts
const defaultMaxRetries = 3
const defaultMaxRetries = 2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems fine


// default percentage of requests to mirror for tracking how nodes perform unless overridden by MirrorFraction
const defaultMirrorFraction = 0.01
Expand Down
4 changes: 2 additions & 2 deletions fetcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -236,9 +236,9 @@ func (p *pool) fetchResource(ctx context.Context, from string, resource string,
// which is the amount of time without any NEW data from the server, but
// that can be added later. We need both because a slow trickle of data
// could take a large amount of time.
requestTimeout := DefaultSaturnCarRequestTimeout
requestTimeout := time.Duration(DefaultSaturnCarRequestTimeout.Milliseconds()/int64(attempt+1)) * time.Millisecond
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduces timeout duration with each retry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems okay

if isBlockRequest {
requestTimeout = DefaultSaturnBlockRequestTimeout
requestTimeout = time.Duration(DefaultSaturnBlockRequestTimeout.Milliseconds()/int64(attempt+1)) * time.Millisecond
}

reqCtx, cancel := context.WithTimeout(ctx, requestTimeout)
Expand Down