Add LongSuccessResetBackoff path in backoff function#179
Add LongSuccessResetBackoff path in backoff function#179evanj80-illumio merged 9 commits intomainfrom
LongSuccessResetBackoff path in backoff function#179Conversation
Let's say we fail 7 times, then succeed for 45 minutes minutes before failing again. Sometimes this 45 minute period of good behavior should be considered a success, even though the function didn't return "no error". ActionTimeToConsiderSuccess is used in the case that the happy-path of our function is blocking forever. A function like this may have issues that require exponentialBackoff but should be considered recovered after the action runs for ActionTimeToConsiderSuccess. If you don't configure an ActionTimeToConsiderSuccess then this feature is ignored
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
This make sense to me! What do you think we should make this value to start. Also I thought this backoff logic already handled this case? |
I think 15 minutes and 30 minutes are fine. I will make them configurable with these default values, though. Realistically the important thing to hit is: can it connect and send a few resources. We basically know that after a few seconds. But I'd rather fail open then fail closed. If we restart streams every 5 minutes it could be normal it could be bad. If we restart streams every 24 hours its definitely fine. Where do u draw the line? I took my best guess, that's how I came up with 15 minutes. As for the 30 minute delay - well if it's a real issue that requires re-auth then it should be less than 30 minutes for the 10 failures. (we could spend 10*1min in backoff) |
There was a problem hiding this comment.
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
internal/controller/streams_retry.go:132
- [nitpick] The commit title refers to a 'LongSuccessBackoffReset' path, while the function is named 'LongSuccessResetBackoff'. Consider renaming the function to 'LongSuccessBackoffReset' for naming consistency.
func (s *state) LongSuccessResetBackoff() {
That was not my observation. We could be 7 Actions are |
LongSuccessBackoffReset path in backoff functionLongSuccessResetBackoff path in backoff function
|
As of #180 (review) listing is expected to take longer. 15 minutes as a default may be too small. I will increase it to 1 hour. And then I will increase the severe error time to 2 hours. cc @ganesh-talla thanks for these comments! |
…k a success If it takes 20 minutes to ingest resources but then we fail to watch, then 15 mintues is too short. Bumping this up seemed like a good idea. And then instead of hardcoding it, I made it configurable
LongSuccessResetBackoff path in backoff functionLongSuccessResetBackoff path in backoff function
Co-authored-by: Romain Lenglet <romain.lenglet@berabera.info>
evanj80-illumio
left a comment
There was a problem hiding this comment.
LGTM 🥳 Thanks for the general spacing clean up as well!
f9ba7f1
Let's say we fail 7 times, then succeed for 45 minutes minutes before failing again. Sometimes this 45 minute period of good behavior should be considered a success, even though the function didn't return "no error". This is the `LongSuccessBackoffReset` path To use this feature, configure the `exponentialBackoff` with the opts field `ActionTimeToConsiderSuccess`. A function that runs forever in the happy path should be considered recovered after the action runs for `ActionTimeToConsiderSuccess`. If you don't configure an ActionTimeToConsiderSuccess then this feature is ignored. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Romain Lenglet <romain.lenglet@berabera.info>
Let's say we fail 7 times, then succeed for 45 minutes minutes before failing again. Sometimes this 45 minute period of good behavior should be considered a success, even though the function didn't return "no error". This is the
LongSuccessBackoffResetpathTo use this feature, configure the
exponentialBackoffwith the opts fieldActionTimeToConsiderSuccess. A function that runs forever in the happy path should be considered recovered after the action runs forActionTimeToConsiderSuccess.If you don't configure an ActionTimeToConsiderSuccess then this feature is ignored.