Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

B eagerly switches to another O if it has used that O before and thinks it will be faster even if the current O has a real-time latency score #2660

Closed
yondonfu opened this issue Nov 21, 2022 · 0 comments · Fixed by #2666
Assignees

Comments

@yondonfu
Copy link
Member

yondonfu commented Nov 21, 2022

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Background

After B runs discovery in order to create a pool of Os for selection, the following rules apply for selection that are relevant for this issue:

  • A "known session" is defined as a session that uses an O that B has sent a segment to before and that has a last known latency score
  • An "unknown session" is defined as a session that uses an O that B has not sent a segment to before and that does not have a last known latency score
  • A latency score is stored for each BroadcastSession and is calculated as the ratio of the round trip transcode time for a segment (upload + transcode + download) to the segment duration
  • If there are zero known sessions, B selects from the list of unknown sessions [1]
  • If there are existing known sessions, but the one with the best latency score is > the threshold (currently set to 1.0), B selects from the list of unknown sessions [2]
  • If there are existing known sessions and the one with the best latency score is <= the threshold, B selects the known session with the best latency score [3]
  • B keeps track of "segments in flight" for a session. If there is a segment in flight for a session then, B will only re-use the session for the next segment if the criteria here are fulfilled. Otherwise, B will not re-use the session

[1]

return s.selectUnknownSession(ctx)

[2]
return s.selectUnknownSession(ctx)

[3]
return heap.Pop(s.knownSessions).(*BroadcastSession)

Problem

Suppose the known session list for B is [O1.LatencyScore = 0.7. O2.LatencyScore = 0.8] and the following scenario:

  1. B submits segment N to O1
  2. The known session list = [O2.LatencyScore = 0.8]
  3. B receives the results for segment N
  4. completeSession is called for O1 and the SegsInFlight field is cleared for the session since there is no longer a segment in flight.
  5. The latency score of O1 for segment N was 0.9 so now the known session list = [O2.LatencyScore = 0.8, O1.LatencyScore = 0.9]
  6. B submits segment N + 1 to O2 because it has the best latency score even though O1 had a latency score that met the acceptable threshold which causes a swap from O1 to O2 for the stream

This type of swapping behavior can be problematic because:

  1. If the switch is to an O that was previously used, but no longer has an active transcoding session there will be overhead for initializing a new session which increases latency

  2. Video players may not handle data coming in at very different rates i.e. the last segment came in in Xs, this segment came in in Ys where X and Y are quite different

  3. There are likely going to be diminishing returns once you're faster than real-time i.e. if writing transcoded data (so that it can be delivered) is slower than transcoding speed

Describe the solution you'd like
A clear and concise description of what you want to happen.

B should stick with the last O that it used for transcoding a stream as long as it met the acceptable threshold.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

The use of a last known latency score for each session also contributes to this problem because if an O is slightly slower than real-time for a single segment then a swap to another O could be triggered which may not be ideal because:

  • The new O may not do better esp given the overhead of session initialization for the first segment it receives
  • Being slightly slower than real-time might not be that big of a deal for a single segment

A better scoring mechanism for the session would incorporate latency scores over time - something like a moving average or a EWMA. This type of latency score formulation would dampen the impact of a single slightly slower than real-time segment on selection which can help stabilize selection for streams. I suspect this would be a bigger change though and I think it can be tackled separately/in parallel to this issue in #1232.

Additional context
Add any other context or screenshots about the feature request here.

@github-actions github-actions bot added the status: triage this issue has not been evaluated yet label Nov 21, 2022
@yondonfu yondonfu added type: bug Something isn't working area: broadcasting status: backlog this issue has been triaged and will be worked on. refer to the issue for timeline and removed status: triage this issue has not been evaluated yet labels Nov 21, 2022
@yondonfu yondonfu self-assigned this Nov 23, 2022
@yondonfu yondonfu added status: core contributors working on it in progress and removed status: backlog this issue has been triaged and will be worked on. refer to the issue for timeline labels Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant