-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement task retry with delay #5263
base: master
Are you sure you want to change the base?
Implement task retry with delay #5263
Conversation
ffb10d0
to
3cecbf0
Compare
Signed-off-by: mucahitkantepe <mucahitkantepe@gmail.com>
Signed-off-by: mucahitkantepe <mucahitkantepe@gmail.com>
4602d27
to
1fe65cc
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5263 +/- ##
==========================================
+ Coverage 58.60% 59.44% +0.84%
==========================================
Files 568 336 -232
Lines 51121 25289 -25832
==========================================
- Hits 29958 15033 -14925
+ Misses 18748 8741 -10007
+ Partials 2415 1515 -900
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@mucahitkantepe , can we help you move this PR out of draft so that we can get it properly reviewed? |
@eapolinario sure, this is not the complete implementation as it's lacking the tests/docs etc but as the codebase is new for me, I wanted to make sure the logic I added is at the right place |
sleepDuration := time.Until(nodeStatus.GetTaskNodeStatus().GetLastPhaseUpdatedAt().Add(currentNode.GetRetryStrategy().RetryDelay.Duration)) | ||
if sleepDuration > 0 { | ||
logger.Infof(currentNodeCtx, "Sleeping for [%v] before retrying", sleepDuration) | ||
time.Sleep(sleepDuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oooh we do not want to add sleep to the event loop. We just need to return and not actually do a retry till the time has elapsed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, I thought this was part of a green thread.
If we just return and the delay is configured to be let's say 1 hour. Will propeller receive the same event in a loop for an hour that will overload it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it wont overload, it wont check it for next refresh interval. Another way to do this is to implement a timer.wheel that can check for active timers and fire the necessary workflow event. We can do that on a separate goroutine
@@ -180,4 +180,7 @@ message RetryStrategy { | |||
// Number of retries. Retries will be consumed when the job fails with a recoverable error. | |||
// The number of retries must be less than or equals to 10. | |||
uint32 retries = 5; | |||
|
|||
// Delay between retries. | |||
google.protobuf.Duration retry_delay = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we make this more elaborate -
min_delay
max_delay
exponent
we can simply implement exponent of 1 for now and we can add other exponents later? we can help
Tracking issue
Closes #2333
Related to flyteorg/flytekit#2368