Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use an exponential backoff on SU failed data item uploads #890

Closed
arielmelendez opened this issue Jul 17, 2024 · 3 comments
Closed

Use an exponential backoff on SU failed data item uploads #890

arielmelendez opened this issue Jul 17, 2024 · 3 comments
Labels
enhancement New feature or request su ao Scheduler Unit

Comments

@arielmelendez
Copy link

arielmelendez commented Jul 17, 2024

The code in question:
https://github.com/permaweb/ao/blob/main/servers/su/src/domain/clients/uploader.rs#L75-L90

Retrying a failing request 100 times in 1 second intervals can be a harmful pattern for connecting with external services. At the scale that AO SUs operate, this behavior can cause substantial heartburn for downstream services. It would be great if an exponential backoff pattern could be used here. Thanks!

@TillaTheHun0 TillaTheHun0 added enhancement New feature or request su ao Scheduler Unit labels Jul 18, 2024
@ppedziwiatr
Copy link
Collaborator

I would also consider adding some jitter here :) Also I believe the ultimate solution is to move the dataItems delivery to a separate/background job (#877)

@VinceJuliano
Copy link
Collaborator

Yes, we should have exponential backoff, but then also a robust way to ensure the item makes it through the upload eventually if it fails.

@VinceJuliano
Copy link
Collaborator

VinceJuliano commented Jul 22, 2024

In uploader.rs, add exponential backoff to the loop that retries the upload, if it reaches some specified number of retries, say 10, add it to a persistent queue of items that need to be tried again later. And have a background process pulling from this queue and uploading.

uploader.rs will need access to store.rs, so store.rs will become a dependency of uploader.rs, new methods will need to be added to store.rs to persist the list of items that need to be retried. Perhaps they can be stored in rocksdb with a different key

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request su ao Scheduler Unit
Projects
None yet
Development

No branches or pull requests

6 participants