Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distinguish between fatal and non-fatal retryable errors #1384
Distinguish between fatal and non-fatal retryable errors #1384
Changes from all commits
22175d7
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about this choice - why shouldn't a fatal error count towards the number of retries? This would result in taking more retries than the policy calls for. Under pathological conditions (eg, perhaps there is a bug in how we count pixels), we could be retrying until we run out of orchestrators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yondonfu made this decision, perhaps he will be able to explain this better than I would.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally thought it made sense to only count non-fatal errors towards the # of retries since we were only appending results that trigger a non-fatal error to
sv.results
. But, I see now that the # of retries do not need to match up with the length ofsv.results
and that always incrementing the retry count regardless of the error associated with a result would better match the expected behavior given a policy with a max # of retries. My mistake @mk-livepeer !On a related note, I also originally thought it made sense to only append results that trigger a non-fatal error to
sv.results
because these results could still be valid (perhaps they were misclassified by the verifier). However, I now realize that results that triggerErrPixelMismatch
could also be valid - the # of reported pixels might be wrong, but the video/audio content could be fine. The only error that should make a result ineligible to be appended tosv.results
isErrAudioMismatch
because if this error is triggered then we know with certainty that the audio was tampered so this result should not be eligible for insertion into a playlist.So, I think the categorization of errors is actually:
Results that trigger a fatal non-usable error [1] should be excluded from
sv.results
. Results that trigger a fatal usable error or a non-fatal error can be included insv.reuslts
.[1] Open to better naming here.
TLDR:
sv.results
if it triggers anErrAudioMismatch
errorThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I restored the count field and previous handling such that fatal errors will count towards the retries count.
Your TLDR says
ErrAudioMismatch
but the same holds forErrPixelMismatch
, no? I believe the PR in it's current state is correct, if not please clarify.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
See this section of my previous comment:
I think the following update to the conditional evaluated prior to appending a result to
sv.results
should work:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make that change, then we'll no longer be using the
IsFatal()
function nor considering theFatal
error type at all. Should I go ahead and revert it all?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also had to change @j0sh 's test to use
ErrAudioMismatch
instead ofErrPixelMismatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do still ultimately want to distinguish between fatal vs. non-fatal retryable errors because we'll want to re-add Os back to the working set for non-fatal retryable errors after finishing a segment - right now we just remove Os for all retryable errors. As already mentioned, we can handle re-adding Os back to the working set for non-fatal retryable errors outside of this PR though.
I do see that given the latest changes that the distinction between fatal vs. non-fatal retryable errors wouldn't actually be useful until we do the above. I'm ok with leaving that out of this PR then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're only defining
ErrAudioMismatch
as a fatal error now which allows results that triggerErrPixelMismatch
to be appended tosv.results
- ok, that works for now.We eventually might need an error type that describes both
ErrAudioMismatch
andErrPixelMismatch
- this error type would indicate that the orchestrator definitely did something wrong (the broadcaster would not re-add these orchestrators to its working set - see this comment). But, we can address that outside of this PR when we actually handle re-adding orchestrators back to the working set.