B eagerly switches to another O if it has used that O before and thinks it will be faster even if the current O has a real-time latency score #2660

yondonfu · 2022-11-21T15:14:05Z

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Background

After B runs discovery in order to create a pool of Os for selection, the following rules apply for selection that are relevant for this issue:

A "known session" is defined as a session that uses an O that B has sent a segment to before and that has a last known latency score
An "unknown session" is defined as a session that uses an O that B has not sent a segment to before and that does not have a last known latency score
A latency score is stored for each BroadcastSession and is calculated as the ratio of the round trip transcode time for a segment (upload + transcode + download) to the segment duration
If there are zero known sessions, B selects from the list of unknown sessions [1]
If there are existing known sessions, but the one with the best latency score is > the threshold (currently set to 1.0), B selects from the list of unknown sessions [2]
If there are existing known sessions and the one with the best latency score is <= the threshold, B selects the known session with the best latency score [3]
B keeps track of "segments in flight" for a session. If there is a segment in flight for a session then, B will only re-use the session for the next segment if the criteria here are fulfilled. Otherwise, B will not re-use the session

[1]

go-livepeer/server/selection.go

Line 137 in a5045a2

return s.selectUnknownSession(ctx)

[2]

go-livepeer/server/selection.go

Line 142 in a5045a2

return s.selectUnknownSession(ctx)

[3]

go-livepeer/server/selection.go

Line 145 in a5045a2

return heap.Pop(s.knownSessions).(*BroadcastSession)

Problem

Suppose the known session list for B is [O1.LatencyScore = 0.7. O2.LatencyScore = 0.8] and the following scenario:

B submits segment N to O1
The known session list = [O2.LatencyScore = 0.8]
B receives the results for segment N
completeSession is called for O1 and the SegsInFlight field is cleared for the session since there is no longer a segment in flight.
The latency score of O1 for segment N was 0.9 so now the known session list = [O2.LatencyScore = 0.8, O1.LatencyScore = 0.9]
B submits segment N + 1 to O2 because it has the best latency score even though O1 had a latency score that met the acceptable threshold which causes a swap from O1 to O2 for the stream

This type of swapping behavior can be problematic because:

If the switch is to an O that was previously used, but no longer has an active transcoding session there will be overhead for initializing a new session which increases latency
Video players may not handle data coming in at very different rates i.e. the last segment came in in Xs, this segment came in in Ys where X and Y are quite different
There are likely going to be diminishing returns once you're faster than real-time i.e. if writing transcoded data (so that it can be delivered) is slower than transcoding speed

Describe the solution you'd like
A clear and concise description of what you want to happen.

B should stick with the last O that it used for transcoding a stream as long as it met the acceptable threshold.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

The use of a last known latency score for each session also contributes to this problem because if an O is slightly slower than real-time for a single segment then a swap to another O could be triggered which may not be ideal because:

The new O may not do better esp given the overhead of session initialization for the first segment it receives
Being slightly slower than real-time might not be that big of a deal for a single segment

A better scoring mechanism for the session would incorporate latency scores over time - something like a moving average or a EWMA. This type of latency score formulation would dampen the impact of a single slightly slower than real-time segment on selection which can help stabilize selection for streams. I suspect this would be a bigger change though and I think it can be tackled separately/in parallel to this issue in #1232.

Additional context
Add any other context or screenshots about the feature request here.

github-actions bot added the status: triage this issue has not been evaluated yet label Nov 21, 2022

yondonfu added type: bug Something isn't working area: broadcasting status: backlog this issue has been triaged and will be worked on. refer to the issue for timeline and removed status: triage this issue has not been evaluated yet labels Nov 21, 2022

yondonfu self-assigned this Nov 23, 2022

yondonfu added status: core contributors working on it in progress and removed status: backlog this issue has been triaged and will be worked on. refer to the issue for timeline labels Nov 23, 2022

yondonfu mentioned this issue Nov 24, 2022

B re-use last session if it passes latency score threshold check #2666

Merged

5 tasks

yondonfu closed this as completed in #2666 Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

B eagerly switches to another O if it has used that O before and thinks it will be faster even if the current O has a real-time latency score #2660

B eagerly switches to another O if it has used that O before and thinks it will be faster even if the current O has a real-time latency score #2660

yondonfu commented Nov 21, 2022 •

edited

B eagerly switches to another O if it has used that O before and thinks it will be faster even if the current O has a real-time latency score #2660

B eagerly switches to another O if it has used that O before and thinks it will be faster even if the current O has a real-time latency score #2660

Comments

yondonfu commented Nov 21, 2022 • edited

yondonfu commented Nov 21, 2022 •

edited