Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish between fatal and non-fatal retryable errors #1384

Merged
merged 1 commit into from
Mar 2, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion verification/epic.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ var VerifierPath string
var ErrMissingSource = errors.New("MissingSource")
var ErrVerifierStatus = errors.New("VerifierStatus")
var ErrVideoUnavailable = errors.New("VideoUnavailable")
var ErrAudioMismatch = Retryable{errors.New("AudioMismatch")}
var ErrAudioMismatch = Fatal{Retryable{errors.New("AudioMismatch")}}
yondonfu marked this conversation as resolved.
Show resolved Hide resolved
var ErrTampered = Retryable{errors.New("Tampered")}

type epicResolution struct {
Expand Down
17 changes: 13 additions & 4 deletions verification/verify.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ type Retryable struct {
error
}

type Fatal struct {
kyriediculous marked this conversation as resolved.
Show resolved Hide resolved
Retryable
}

var ErrPixelMismatch = Retryable{errors.New("PixelMismatch")}
var ErrPixelsAbsent = errors.New("PixelsAbsent")
var errPMCheckFailed = errors.New("PM Check Failed")
Expand Down Expand Up @@ -126,7 +130,7 @@ func (sv *SegmentVerifier) Verify(params *Params) (*Params, error) {
}

// Check pixel counts
if (err == nil || IsRetryable(err)) && res != nil && params.Results != nil {
if (err == nil || (err != ErrAudioMismatch && IsRetryable(err))) && res != nil && params.Results != nil {
pxls := res.Pixels
if len(pxls) != len(params.Results.Segments) {
pxls, err = countPixelParams(params)
Expand All @@ -146,9 +150,9 @@ func (sv *SegmentVerifier) Verify(params *Params) (*Params, error) {
}
sv.count++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about this choice - why shouldn't a fatal error count towards the number of retries? This would result in taking more retries than the policy calls for. Under pathological conditions (eg, perhaps there is a bug in how we count pixels), we could be retrying until we run out of orchestrators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yondonfu made this decision, perhaps he will be able to explain this better than I would.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why shouldn't a fatal error count towards the number of retries?

I originally thought it made sense to only count non-fatal errors towards the # of retries since we were only appending results that trigger a non-fatal error to sv.results. But, I see now that the # of retries do not need to match up with the length of sv.results and that always incrementing the retry count regardless of the error associated with a result would better match the expected behavior given a policy with a max # of retries. My mistake @mk-livepeer !

On a related note, I also originally thought it made sense to only append results that trigger a non-fatal error to sv.results because these results could still be valid (perhaps they were misclassified by the verifier). However, I now realize that results that trigger ErrPixelMismatch could also be valid - the # of reported pixels might be wrong, but the video/audio content could be fine. The only error that should make a result ineligible to be appended to sv.results is ErrAudioMismatch because if this error is triggered then we know with certainty that the audio was tampered so this result should not be eligible for insertion into a playlist.

So, I think the categorization of errors is actually:

  • Fatal: The orchestrator definitely did something wrong
    • Usable: The result is still usable
    • Non-usable: The result is not usable
  • Non-fatal: Unclear if the orchestrator did something wrong

Results that trigger a fatal non-usable error [1] should be excluded from sv.results. Results that trigger a fatal usable error or a non-fatal error can be included in sv.reuslts.

[1] Open to better naming here.

TLDR:

  • We should keep a running count of the # of retries that is incremented regardless of the error triggered by a result
  • We should exclude a result from sv.results if it triggers an ErrAudioMismatch error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restored the count field and previous handling such that fatal errors will count towards the retries count.

Your TLDR says ErrAudioMismatch but the same holds for ErrPixelMismatch, no? I believe the PR in it's current state is correct, if not please clarify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restored the count field and previous handling such that fatal errors will count towards the retries count.

👍

Your TLDR says ErrAudioMismatch but the same holds for ErrPixelMismatch, no?

See this section of my previous comment:

However, I now realize that results that trigger ErrPixelMismatch could also be valid - the # of reported pixels might be wrong, but the video/audio content could be fine. The only error that should make a result ineligible to be appended to sv.results is ErrAudioMismatch because if this error is triggered then we know with certainty that the audio was tampered so this result should not be eligible for insertion into a playlist.

I think the following update to the conditional evaluated prior to appending a result to sv.results should work:

if err != ErrAudioMismatch && IsRetryable(err) {
    ...
    sv.results = append(sv.results, r)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make that change, then we'll no longer be using the IsFatal() function nor considering the Fatal error type at all. Should I go ahead and revert it all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also had to change @j0sh 's test to use ErrAudioMismatch instead of ErrPixelMismatch

Copy link
Member

@yondonfu yondonfu Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do still ultimately want to distinguish between fatal vs. non-fatal retryable errors because we'll want to re-add Os back to the working set for non-fatal retryable errors after finishing a segment - right now we just remove Os for all retryable errors. As already mentioned, we can handle re-adding Os back to the working set for non-fatal retryable errors outside of this PR though.

I do see that given the latest changes that the distinction between fatal vs. non-fatal retryable errors wouldn't actually be useful until we do the above. I'm ok with leaving that out of this PR then.


// Append retryable errors to results
// Append non-fatal retryable errors to results
// The caller should terminate processing for non-retryable errors
if IsRetryable(err) {
if !IsFatal(err) && IsRetryable(err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're only defining ErrAudioMismatch as a fatal error now which allows results that trigger ErrPixelMismatch to be appended to sv.results - ok, that works for now.

We eventually might need an error type that describes both ErrAudioMismatch and ErrPixelMismatch - this error type would indicate that the orchestrator definitely did something wrong (the broadcaster would not re-add these orchestrators to its working set - see this comment). But, we can address that outside of this PR when we actually handle re-adding orchestrators back to the working set.

r := SegmentVerifierResults{params: params, res: res}
sv.results = append(sv.results, r)
}
Expand All @@ -166,9 +170,14 @@ func (sv *SegmentVerifier) Verify(params *Params) (*Params, error) {
return nil, err
}

func IsFatal(err error) bool {
_, fatal := err.(Fatal)
return fatal
}

func IsRetryable(err error) bool {
_, retryable := err.(Retryable)
return retryable
return retryable || IsFatal(err)
}

func (sv *SegmentVerifier) sigVerification(params *Params) error {
Expand Down
42 changes: 42 additions & 0 deletions verification/verify_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,39 @@ import (
"github.com/livepeer/lpms/ffmpeg"
)

func TestFatalRetryable(t *testing.T) {
errNonRetryable := errors.New("NonRetryable")
assert.False(t, IsRetryable(errNonRetryable))
assert.False(t, IsFatal(errNonRetryable))

errRetryable := Retryable{errors.New("Retryable")}
assert.True(t, IsRetryable(errRetryable))
assert.False(t, IsFatal(errRetryable))

errFatalRetryable := Fatal{Retryable{errors.New("FatalRetryable")}}
assert.True(t, IsRetryable(errFatalRetryable))
assert.True(t, IsFatal(errFatalRetryable))

errNonFatalRetryable := Retryable{errors.New("NonFatalRetryable")}
assert.True(t, IsRetryable(errNonFatalRetryable))
assert.False(t, IsFatal(errNonFatalRetryable))

// check that these Retryable errors are classified
// correctly as Fatal / Non-fatal:

// check ErrAudioMismatch
assert.True(t, IsRetryable(ErrAudioMismatch))
assert.True(t, IsFatal(ErrAudioMismatch))

// check ErrPixelMismatch
assert.True(t, IsRetryable(ErrPixelMismatch))
assert.False(t, IsFatal(ErrPixelMismatch))

// check ErrTampered
assert.True(t, IsRetryable(ErrTampered))
assert.False(t, IsFatal(ErrTampered))
}

type stubVerifier struct {
results *Results
err error
Expand Down Expand Up @@ -217,6 +250,15 @@ func TestVerify(t *testing.T) {
assert.Equal(ErrPixelsAbsent, err)
assert.Equal("mno", string(res.ManifestID)) // Still return best result

// A high score should still fail if audio checking fails
sv.policy = &Policy{Verifier: &stubVerifier{
results: &Results{Score: 21.0, Pixels: []int64{-1, -2}},
err: ErrAudioMismatch,
}, Retries: 2}
res, err = sv.Verify(&Params{ManifestID: "vws", Results: data})
assert.Equal(ErrAudioMismatch, err)
yondonfu marked this conversation as resolved.
Show resolved Hide resolved
assert.Equal("mno", string(res.ManifestID))

// Check *not* retryable; should never get a result
sv = NewSegmentVerifier(&Policy{Verifier: verifier, Retries: 1}) // reset
verifier.err = errors.New("Stub Verifier Non-Retryable Error")
Expand Down