Connection quality #1490

boks1971 · 2023-03-04T06:06:40Z

An attempt to redo connection quality so that it is not too optimistic.

@davidzhao @cnderrauber @paulwe There is still one bit missing, but wanted to get this out to get 👀 and feedback. I will look at how to wire up the missing bit. Will also leave inline notes.

Also, requires a bunch more testing. And I am guessing there will be some iterations needed 😄 .

Overview

Still look at all tracks (both up stream and down stream) to determine quality for a participant.
Change: use minimum of all track qualities (instead of average) as representative quality.
New qualityScorer module.
- This module could operate in two modes.
  - Packet loss, RTT, jitter based quality evaluation.
  - Bit rate based quality evaluation. This is used only for camera tracks. As publishers do not vary audio bit rate based on bandwidth/other constraints, not including it in bit rate based evaluation. Screen share tracks are dropped due to very high variance in bit rates based on content. There is probably some algorithm which can handle high variance bit rates and still figure out if a bitrate is much less than expected at the time of evaluation, but this version does not.
- Lower of two scores is used as representative score in the evaluation window.
- When quality drops, there is some hysteresis before quality climbs back up, but a drop in quality is immediately applied.
Removed: rtcscore-go dependency and the old mos.go module.

Commentary on bit rate based quality evaluation:
The idea behind bit rate based measurement is that the sender is expected to stream one or more layers at some expected bit rate. For example, with up track, the expected bit rate is sum of all layer bit rates till max expected layer. Then the actual bitrate (across all layers till max expected layer) is an indicator of publisher's ability to satisfy that expectation. For example, if a layer cannot be published, actual bit rate will be lower compared to expected bit rate and that will drop quality. So, using aggregate bit rate as an indicator of publisher's ability to publish at expected/advertised bitrate.

The idea behind this is to make it work well for simulcast track/SVC tracks with multiple layers, but also for the down track which forward only certain layers. Basically, by recording expected bitrate, common method can be applied irrespective of track type.

The missing bit:
Have not hooked up bitrate transition for down track. With down track, need to propagate ideal layer bit rate as transitions into the scorer. This is probably quite a few lines of code change (including in test). So, postponing that for now. For now, down tracks will purely use packet loss based quality measurement.

With score normalization, the quality indicator showed good under conditions which should have normally showed some badness. So, a few things in this PR - Do not normalize scores - Pick the weakest link as the representative score (moving away from averaging) - For down track direction, when reporting delta stats, take the number of packets sent actually. If there are holes in the feed (upstream packet loss), down tracks should not be penalised for that loss. State of things in connection quality feature - Audio uses rtcscore-go (with a change to accommodate RED codec). This follows the E-model. - Camera uses rtcscore-go. No change here. NOTE: THe rtscore here is purely based on bits per pixel per frame (bpf). This has the following existing issues (no change, these were already there) o Does not take packet loss, jitter, rtt into account o Expected frame rate is not available. So, measured frame rate is used as expected frame rate also. If expected frame rate were available, the score could be reduced for lower frame rates. - Screen share tracks: No change. This uses the very old simple loss based thresholding for scoring. As the bit rate varies a lot based on content and rtcscore video algorithm used for camera relies on bits per pixel per frame, this could produce a very low value (large width/height encoded in a small number of bits because of static content) and hence a low score. So, the old loss based thresholding is used.

boks1971 · 2023-03-04T06:07:23Z

pkg/rtc/dynacastmanager.go

-		DynacastPauseDelay: d.params.DynacastPauseDelay,
-		Logger:             d.params.Logger,
+		MimeType: mime,
+		Logger:   d.params.Logger,


unrelated clean up. Just roaming the code and noticed.

boks1971 · 2023-03-04T06:08:55Z

pkg/rtc/wrappedreceiver.go

-	}
-	return 0, 0
-}
-


Removing unused stuff

boks1971 · 2023-03-04T06:15:34Z

pkg/sfu/connectionquality/scorer.go

+		return
+	}
+
+	// take median of scores in a longer window to prevent quality reporting oscillations


This is the hysteresis. I am not too happy about responsiveness of this. Have set it to 25 seconds of wait if quality drops to POOR, i. e. we will take median quality 25 seconds after POOR starts. And that wait is 15 seconds for GOOD. So, if things are transitioning from POOR -> GOOD -> EXCELLENT, this could take 40 seconds. That might be fine, but there should be better ways. Let me know if you can think of something.

Reason for 25 and 15 seconds is that the analysis window is 5 seconds and wanted to have enough windows to take median.

seems fine to me especially for first pass. if it feels too slow reacting, we can come back to it.

if this flapped too much the ui updates might be distracting so having a bit of delay might be better user experience

yeah, that's the goal. Just don't want it to be too slow 😄

boks1971 · 2023-03-04T06:18:04Z

pkg/sfu/streamtrackermanager.go

@@ -264,62 +264,6 @@ func (s *StreamTrackerManager) SetMaxExpectedSpatialLayer(layer int32) int32 {
 	return prev
 }

-func (s *StreamTrackerManager) DistanceToDesired() int32 {


Cleaning up unused stuff.

boks1971 · 2023-03-04T06:27:48Z

pkg/rtc/participant.go

 	for _, subTrack := range subscribedTracks {
-		if subTrack.IsMuted() || subTrack.MediaTrack().IsMuted() {
-			continue
-		}


Removing the muted check from here. The quality scorer is notified of mute transitions and reports quality accordingly (i. e. when muted EXCELLENT is reported). So, leaving it completely to quality scorer to report based on its current state.

paulwe · 2023-03-04T06:49:27Z

pkg/sfu/connectionquality/connectionstats.go

-		select {
-		case <-cs.done:
+		<-tk.C


this is super minor but we should standardize on using selects. i'll fix the place i did this in room the other day. there's no reason to wait for the next tick to exit these loops.

That would be great to not wait. Please let me know how to do that.

the way you had it before was good

select { case <-cs.done: return case <-ticker.C: // work... }

cool, will change it back. Was trying to get rid of one field in the struct 😄

oh, we could use dc's fuse if you want something more ergonomic. i guess this doesn't avoid the struct field though :/

@paulwe Changed in this commit. Please check when you get a chance.

paulwe · 2023-03-04T06:53:29Z

pkg/sfu/connectionquality/connectionstats.go

+//
+// For video:
+//   o No in-built codec repair available, hence same for all codecs
+func getPacketLossWeight(mimeType string, isFecEnabled bool) float64 {


how were these constants picked?

Empirical. There is the e-model, an ITU-T standard, but not tuned for modern codecs. rtcscore was an attempt at a model, but that is a bit complex and also does not consider packet loss for video codecs. So, this is a result of filtering a bunch of those inputs/ signals into something simple.

yeah i assumed as much i was just asking out of curiosity. red is like 2 or 3x redundant broadcasts? and fec is some sort of error correcting code. do the qualities represent some probability of having undecodable segments of some length within a sample?

rtcscore: https://github.com/ggarber/rtcscore

Shishir ported that to Go: https://github.com/livekit/rtcscore-go

yeah i assumed as much i was just asking out of curiosity. red is like 2 or 3x redundant broadcasts? and fec is some sort of error correcting code. do the qualities represent some probability of having undecodable segments of some length within a sample?

Yes, these are representative. Ideally, we would get data from the edge about the following

how many packets/samples had to be concealed.

how many were recovered using FEC.
But, those are harder to do. For example, an audio track could get forwarded to multiple subscribers. Each subscriber could behave differently. To get purely upstream quality, SFU would have to try to decode (including a jitter buffer) to really figure out what is the quality of the upstream track.

RED used to be 3x (every packet includes 2 previous packets), but I thought I saw 2 packets recently. Not sure if something has changed. If it is 3, we can lose two packets out of every three and still be able to decode 100%. But, losses are bursty and quality is not great beyond 10% empirically (from user reports). We can measure loss characteristic (isolated vs bursty) and apply a better weight based on pattern in the analysis window. But, that is more compute. We do measure that for the whole stream, but been avoiding doing that in analysis windows as it is per packet processing.

Opus has in-built FEC. It adds FEC bits based on loss. Based on loss reported in RTCP Receiver Report, Opus will add FEC bits if negotiated. Again, it does not do great against bursty losses, but does quite well when having isolated losses (https://opus-codec.org/examples/).

So, this combination of numbers is a filtering of all the factors and is a balance between available data/available knowledge on this matter (which I am probably privy to only 1% or less)/implementation considerations.

- track no layer expected case - always update transition - always call updateScore

davidzhao

awesome work!!

davidzhao · 2023-03-05T07:04:34Z

pkg/sfu/connectionquality/scorer.go

+		return
+	}
+
+	// take median of scores in a longer window to prevent quality reporting oscillations


seems fine to me especially for first pass. if it feels too slow reacting, we can come back to it.

boks1971 added 18 commits February 24, 2023 16:05

clean up

cbb90bb

update rtcscore pointer

cc10415

fix tests

bf012f0

log lines reformat

35fb3c9

Merge remote-tracking branch 'origin/master' into raja_raw_scores

1024f84

Merge remote-tracking branch 'origin/master' into raja_raw_scores

7765c7c

Merge remote-tracking branch 'origin/master' into raja_raw_scores

382f3b5

WIP commit

40a7f3f

WIP commit

aa0b90e

update mute of receiver

77786ac

WIP commit

4c9653a

WIP commit

2bc2917

start adding tests

dd7f7b0

take min score if quality matches

278e6d4

start adding bytes based scoring

b425188

clean up

3e60124

Merge remote-tracking branch 'origin/master' into raja_cq

17f1da4

boks1971 requested review from davidzhao, cnderrauber and paulwe March 4, 2023 06:06

boks1971 commented Mar 4, 2023

View reviewed changes

pkg/rtc/wrappedreceiver.go

}

return 0, 0

}

Copy link

Contributor Author

boks1971 Mar 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing unused stuff

boks1971 commented Mar 4, 2023

View reviewed changes

more clean up

ec2fc16

boks1971 commented Mar 4, 2023

View reviewed changes

paulwe reviewed Mar 4, 2023

View reviewed changes

Use Fuse

45a420f

boks1971 added 3 commits March 4, 2023 14:49

log quality drop

8a538b4

clean up debug log

bac96d8

- Use number of windows for wait to make things simpler

ccc2d84

- track no layer expected case - always update transition - always call updateScore

davidzhao approved these changes Mar 5, 2023

View reviewed changes

boks1971 merged commit 9e327b1 into master Mar 5, 2023

boks1971 deleted the raja_cq branch March 5, 2023 07:25

boks1971 mentioned this pull request Mar 5, 2023

Bitrate based quality tracking for DownTrack #1491

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection quality #1490

Connection quality #1490

boks1971 commented Mar 4, 2023 •

edited

boks1971 Mar 4, 2023

boks1971 Mar 4, 2023

boks1971 Mar 4, 2023

davidzhao Mar 5, 2023

paulwe Mar 5, 2023

boks1971 Mar 5, 2023

boks1971 Mar 4, 2023

boks1971 Mar 4, 2023

paulwe Mar 4, 2023

boks1971 Mar 4, 2023

paulwe Mar 4, 2023

boks1971 Mar 4, 2023

paulwe Mar 4, 2023 •

edited

boks1971 Mar 4, 2023

paulwe Mar 4, 2023

paulwe Mar 4, 2023

boks1971 Mar 4, 2023

paulwe Mar 4, 2023 •

edited

boks1971 Mar 4, 2023

boks1971 Mar 4, 2023

davidzhao left a comment

davidzhao Mar 5, 2023

Connection quality #1490

Connection quality #1490

Conversation

boks1971 commented Mar 4, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulwe Mar 4, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulwe Mar 4, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidzhao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

boks1971 commented Mar 4, 2023 •

edited

paulwe Mar 4, 2023 •

edited

paulwe Mar 4, 2023 •

edited