Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video Freezing in Chrome #156

Open
stongo opened this issue Mar 1, 2016 · 39 comments
Open

Video Freezing in Chrome #156

stongo opened this issue Mar 1, 2016 · 39 comments

Comments

@stongo
Copy link

stongo commented Mar 1, 2016

Using new versions of the bridge, have been experiencing random freezing of the video channel. Doesn't occur for every participant.
I've included a webrtc-internals graph showing it happening just before 7:55pm. Nothing abnormal in bridge logs or our clients log (not using jitsi meet). We ended up rolling back to version 564 where this does not happen.
talky-jvb-video-freeze

@jitsi-developers
Copy link

Just before 7:55 nacks and plis increase and received frame rate drops to
0, but I notice some peculiar behavior starting at 7:54. You don't seem to
be using simulcast. Are you using RTCP termination? Could you please enable
fine logging at the bridge and logging at the client and share the log
files with us?

On Tue, Mar 1, 2016 at 2:59 PM, Marcus Stong notifications@github.com
wrote:

Using new versions of the bridge, have been experiencing random freezing
of the video channel. Doesn't occur for every participant.
I've included a webrtc-internals graph showing it happening just before
7:55pm. Nothing abnormal in bridge logs or our clients log (not using jitsi
meet). We ended up rolling back to version 564 where this does not happen.
[image: talky-jvb-video-freeze]
https://cloud.githubusercontent.com/assets/1449748/13441660/683e8f46-dfc6-11e5-823e-4d6514f96681.png


Reply to this email directly or view it on GitHub
#156.


dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

@stongo
Copy link
Author

stongo commented Mar 1, 2016

We aren't using simulcast, it's true.

I'm glad you actually bring up RTCP termination. The documentation is outdated, and we aren't sure how to make it work.

We'd like to enable what was org.jitsi.impl.neomedia.rtcp.termination.strategies.HighestQualityRTCPTerminationStrategy but setting it according to the doc fails.

Maybe setting it correctly might fix the issue?

@jitsi-developers
Copy link

HQRTS no longer exists, I would suggest to either enable
BasicRTCPTerminationStrategy or disable RTCP termination completely (just
don't set anything). The correct way to set the BRTS is this:

org.jitsi.videobridge.rtcp.strategy=org.jitsi.impl.neomedia.rtcp.termination.strategies.BasicRTCPTerminationStrategy

On Tue, Mar 1, 2016 at 3:15 PM, Marcus Stong notifications@github.com
wrote:

We aren't using simulcast, it's true.

I'm glad you actually bring up RTCP termination. The documentation is
outdated, and we aren't sure how to make it work.

We'd like to enable what was
org.jitsi.impl.neomedia.rtcp.termination.strategies.HighestQualityRTCPTerminationStrategy
but setting it according to the doc fails.

Maybe setting it correctly might fix the issue?


Reply to this email directly or view it on GitHub
#156 (comment)
.


dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

@stongo
Copy link
Author

stongo commented Mar 1, 2016

We tried with BasicRTCPTermination, but it was throwing an index out of bounds exception.
One change I did make which seems to help was removing org.jitsi.impl.neomedia.transform.srtp.SRTPCryptoContext.checkReplay=false
Testing in our staging environment the issue seems to have gone away, but stage isn't always reliable for reproducing the bug. I'll plan a deploy tomorrow to production turning off rtcp termination and removing the checkReplay setting and update this issue again.
Thanks for the help!

@stongo
Copy link
Author

stongo commented Mar 3, 2016

Video is still freezing. It's pretty easy to reproduce right now on https://talky.io with 3+ callers as I haven't rolled back yet.
I also have a webrtc-internals dump if that would help

@jitsi-developers
Copy link

Hi Marcus, in order to help us understand the situation we need logs from
the bridge, the sip-communicator.properties file you use to configure the
bridge and screenshots from the webrtc-internals page (because it's much
quicker than having to graph the raw data).

Best,
George

On Thu, Mar 3, 2016 at 10:34 AM, Marcus Stong notifications@github.com
wrote:

Video is still freezing. It's pretty easy to reproduce right now on
https://talky.io as I haven't rolled back yet.
I also have a webrtc-internals dump if that would help


Reply to this email directly or view it on GitHub
#156 (comment)
.


dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

@jitsi-developers
Copy link

P.S. before you get any logs from the bridge, it would be helpful to set
the global log level to FINE by editing the logging.properties file.

On Thu, Mar 3, 2016 at 10:40 AM, George Politis gp@jitsi.org wrote:

Hi Marcus, in order to help us understand the situation we need logs from
the bridge, the sip-communicator.properties file you use to configure the
bridge and screenshots from the webrtc-internals page (because it's much
quicker than having to graph the raw data).

Best,
George

On Thu, Mar 3, 2016 at 10:34 AM, Marcus Stong notifications@github.com
wrote:

Video is still freezing. It's pretty easy to reproduce right now on
https://talky.io as I haven't rolled back yet.
I also have a webrtc-internals dump if that would help


Reply to this email directly or view it on GitHub
#156 (comment)
.


dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

@stongo
Copy link
Author

stongo commented Mar 3, 2016

George, thanks for the help! I think the graph above should suffice then.
Here's the sip-communicator.properties:

org.jitsi.videobridge.TCP_HARVESTER_MAPPED_PORT=443
org.jitsi.videobridge.TCP_HARVESTER_PORT=4443
org.jitsi.videobridge.STATISTICS_TRANSPORT=pubsub
org.jitsi.videobridge.ENABLE_STATISTICS=true
org.jitsi.videobridge.STATISTICS_INTERVAL=15000
org.jitsi.videobridge.PUBSUB_SERVICE=pubsub.foo.bar
org.jitsi.videobridge.PUBSUB_NODE=videobridge
org.jitsi.videobridge.SINGLE_PORT_HARVESTER_PORT=-1
org.ice4j.ice.harvest.ALLOWED_INTERFACES=bond0

Here's a sampling of the logs https://ghostbin.com/paste/adfg9

@fippo
Copy link
Member

fippo commented Mar 3, 2016

george: http://fippo.github.io/webrtc-dump-importer/ gives you nice graphs in a matter of seconds. Zoomable even.

@damencho
Copy link
Member

damencho commented Mar 3, 2016

Hey @fippo is there a way to make those dumps from js code, I'm asking whether it is possible to do the dumps while selenium testing? Thanks.

@fippo
Copy link
Member

fippo commented Mar 3, 2016

@damencho do you know your own code? :-p
traceablepeerconnection was built exactly for this. I suppose you can also open webrtc-internals in an extra tab in selenium but never tried it.

@jitsi-developers
Copy link

Thanks @fippo!

@marcus The log snapshot that you shared is filled with XMPP ping timeouts
and re-transmission requests from the clients. There could be something
wrong with our NACK termination implementation (which is enabled by default
in recent versions of the bridge) or it could be something wrong with the
network.

You can add the following 2 lines to the sip-communicator.properties file
to disable NACK termination :

org.jitsi.service.neomedia.VideoMediaStream.REQUEST_RETRANSMISSIONS=false
org.jitsi.videobridge.DISABLE_NACK_TERMINATION=true

Try that and let us know how it goes.

Best,
George

On Thu, Mar 3, 2016 at 10:53 AM, Marcus Stong notifications@github.com
wrote:

George, thanks for the help! I think the graph above should suffice then.
Here's the sip-communicator.properties:

org.jitsi.videobridge.TCP_HARVESTER_MAPPED_PORT=443
org.jitsi.videobridge.TCP_HARVESTER_PORT=4443
org.jitsi.videobridge.STATISTICS_TRANSPORT=pubsub
org.jitsi.videobridge.ENABLE_STATISTICS=true
org.jitsi.videobridge.STATISTICS_INTERVAL=15000
org.jitsi.videobridge.PUBSUB_SERVICE=pubsub.foo.bar
org.jitsi.videobridge.PUBSUB_NODE=videobridge
org.jitsi.videobridge.SINGLE_PORT_HARVESTER_PORT=-1
org.ice4j.ice.harvest.ALLOWED_INTERFACES=bond0

Here's a sampling of the logs https://ghostbin.com/paste/adfg9


Reply to this email directly or view it on GitHub
#156 (comment)
.


dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

@bgrozev
Copy link
Member

bgrozev commented Mar 3, 2016

Here's a sampling of the logs https://ghostbin.com/paste/adfg9

2016-03-03 11:46:34.065 WARNING: [90782] org.jitsi.videobridge.transform.RtxTransformer.warn() Cannot find SSRC for RTX, retransmitting plain.

This could well indicate a problem with packet retransmissions, which could explain the freeze. We don't run into it, because we haven't yet enabled RTX.

One reason for the bridge not finding the SSRC could be that it wasn't signaled to it. Can you include more of the logs? Specifically the RECV/SENT lines. Also make sure you are using a recent bridge version (which includes Lance's fix).

@bgrozev
Copy link
Member

bgrozev commented Mar 3, 2016

Just found a little bug, preparing a fix. You may want to delay your testing a bit.

@stongo
Copy link
Author

stongo commented Mar 3, 2016

okay great. I'll test on our stage site as soon as you let me know.
we had been using one of the latest versions with Lance's fix too, just so you know.

@bgrozev
Copy link
Member

bgrozev commented Mar 3, 2016

Videobridge 672 includes the fix.

@stongo
Copy link
Author

stongo commented Mar 4, 2016

Deployed 672 with and without suggested NACK settings, and unfortunately it doesn't work at all now

org.jitsi.impl.osgi.framework.launch.FrameworkImpl.startLevelChanged() Error changing start level
org.osgi.framework.BundleException: BundleActivator.start
    at org.jitsi.impl.osgi.framework.BundleImpl.start(BundleImpl.java:313)
    at org.jitsi.impl.osgi.framework.launch.FrameworkImpl.startLevelChanged(FrameworkImpl.java:460)
    at org.jitsi.impl.osgi.framework.startlevel.FrameworkStartLevelImpl$Command.run(FrameworkStartLevelImpl.java:126)
    at org.jitsi.impl.osgi.framework.AsyncExecutor.runInThread(AsyncExecutor.java:111)
    at org.jitsi.impl.osgi.framework.AsyncExecutor.access$000(AsyncExecutor.java:17)
    at org.jitsi.impl.osgi.framework.AsyncExecutor$1.run(AsyncExecutor.java:220)
Caused by: java.lang.NoClassDefFoundError: net/java/sip/communicator/impl/protocol/jabber/extensions/colibri/HealthCheckIQ
    at org.jitsi.videobridge.VideobridgeBundleActivator.start(VideobridgeBundleActivator.java:59)
    at org.jitsi.impl.osgi.framework.BundleImpl.start(BundleImpl.java:293)
    ... 5 more
Caused by: java.lang.ClassNotFoundException: net.java.sip.communicator.impl.protocol.jabber.extensions.colibri.HealthCheckIQ
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 7 more
2016-03-03 23:00:00.766 SEVERE: [17] org.jitsi.videobridge.stats.PubSubStatsTransport.publishStatistics().282 Failed to publish to PubSub node: videobridge - it does not exist yet

Going to rollback one version and test with NACK settings as well.

@stongo
Copy link
Author

stongo commented Mar 4, 2016

670 with suggested NACK settings passes staging tests.
Will deploy to production tomorrow morning and report back

@bgrozev
Copy link
Member

bgrozev commented Mar 4, 2016

Not sure what the problem with 672 is, possibly just the package was not properly built. In any case, if you get a chance to test this on 672+ without disabling NACK termination, please let us know.

@stongo
Copy link
Author

stongo commented Mar 4, 2016

Still freezing in production on 670 with NACK changes. Will give a release > 672 a try

@jitsi-developers
Copy link

Hi Marcus, did you have any better luck with jvb > 672? I've had a
discussion with Boris and please note that it is NOT a good idea to disable
NACK termination as I suggested initially, so if you still have problems,
please remove the two NACK termination related configuration options from
sip-communicator.properties file and try again.

On Fri, Mar 4, 2016 at 12:39 PM, Marcus Stong notifications@github.com
wrote:

Still freezing in production on 670 with NACK changes. Will give a release

672 a try


Reply to this email directly or view it on GitHub
#156 (comment)
.


dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

@stongo
Copy link
Author

stongo commented Mar 11, 2016

Still freezing in 681

@stongo
Copy link
Author

stongo commented Mar 11, 2016

Disabling RTX seems like a promising fix. Wasn't able to reproduce freezing on stage. Will confirm for sure with production deploy Monday.

@stongo
Copy link
Author

stongo commented Mar 18, 2016

Been running 681 for most of the week with RTX disabled in production.
Our feedback form didn't have one report of freezing and our friday update conference also was good.
Seems to be fixed!

@bgrozev
Copy link
Member

bgrozev commented Mar 19, 2016

Thanks for the feedback, @stongo! We should be looking into enabling RTX in jitsi-meet in the next couple of weeks, will let you know if we find any issues.

@davidertel
Copy link
Contributor

@stongo how did you disable RTX as you mentioned above?

@bgrozev
Copy link
Member

bgrozev commented Apr 14, 2016

An update on this: we've been working on RTX in the last couple of weeks. We fixed multiple issues, and as far as we know current videobridge versions work correctly with RTX. So, I think this is ready for testing.

We are not yet enabling it in jitsi-meet, because we are running into some problems managing SDP when muting/unmuting (these are jitsi-meet specific issues).

@stongo
Copy link
Author

stongo commented Apr 14, 2016

@bgrozev awesome, I'll give it a try on staging again and let you know

@davidertel are you using Jitsi Meet or something else?

@jitsi-developers
Copy link

We've been looking at retransmission in general (not necessarily
out-of-band RTX, but in band via nack as well) and have noticed that, when
we limit the bandwidth on clients, chrome seems to do a poor job of obeying
the detected bandwidth. Because of this, when loss occurs (due to chrome
sending more bits than it should be), lots of nacks start up and chrome can
refuse to retransmit the lost packets due to it detecting that it's sending
too much data via retransmitting. We thought maybe this was h264 specific
but we were able to repro with vp8 as well. From looking at the
bweforvideo graphs, chrome seems to properly detect the correct amount of
bandwidth, but regularly sends more. The way this gets played out is lots
of periods of frozen video on the receiver.

Just a heads up on something we've seen...we want to gather some more data
and get a bug filed on chrome.

On Thu, Apr 14, 2016 at 12:39 PM, Marcus Stong notifications@github.com
wrote:

@bgrozev https://github.com/bgrozev awesome, I'll give it a try on
staging again and let you know

@davidertel https://github.com/davidertel are you using Jitsi Meet or
something else?


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#156 (comment)


dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

@xdumaine
Copy link
Contributor

@stongo We (Dave and I and company) are using a web client with jingle.js via a focus controller a la talky (but it's our own node.js focus controller).

@bgrozev
Copy link
Member

bgrozev commented Apr 14, 2016

You need to remove the "a=rtpmap:XXX RTX/90000" lines from the SDP you pass to your clients.

@xdumaine
Copy link
Contributor

I've found that more often than not, filing the bug early leads to quicker results. The Chrome team is helpful in identifying workarounds and fixes. We can nudge them for feedback. If you have a dump showing that behavior, let's get it filed with all the info we have. I'll try to get one as well.

@fippo
Copy link
Member

fippo commented Apr 15, 2016

no chrome bug here...

@xdumaine
Copy link
Contributor

xdumaine commented Apr 15, 2016

We're getting freezing without including rtx in the sdp.

type: offer, sdp: v=0
o=- 1460724079835 1460724079848 IN IP4 0.0.0.0
s=-
t=0 0
a=group:BUNDLE video audio data
m=video 1 UDP/TLS/RTP/SAVPF 100 116 117
c=IN IP4 0.0.0.0
a=rtcp:1 IN IP4 0.0.0.0
a=ice-ufrag:apt0u1agcv17b5
a=ice-pwd:1tpk3nc33btjm76p0h6mmk0pt7
a=fingerprint:sha-1 86:1A:F4:87:D2:95:59:62:38:6C:54:E0:A1:0C:C1:AA:08:B4:DF:0F
a=setup:actpass
a=sendrecv
a=mid:video
a=rtcp-mux
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtcp-fb:100 goog-remb
a=rtpmap:116 red/90000
a=rtpmap:117 ulpfec/90000
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=candidate:1 1 SSLTCP 2130706431 172.18.27.111 4443 typ host generation 0
a=candidate:3 1 UDP 2130706431 172.18.27.111 10000 typ host generation 0
a=candidate:2 1 SSLTCP 1694498815 54.165.101.244 4443 typ srflx raddr 172.18.27.111 rport 4443 generation 0
a=candidate:4 1 UDP 1677724415 54.165.101.244 10000 typ srflx raddr 172.18.27.111 rport 10000 generation 0
m=audio 1 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8
c=IN IP4 0.0.0.0
a=rtcp:1 IN IP4 0.0.0.0
a=ice-ufrag:apt0u1agcv17b5
a=ice-pwd:1tpk3nc33btjm76p0h6mmk0pt7
a=fingerprint:sha-1 86:1A:F4:87:D2:95:59:62:38:6C:54:E0:A1:0C:C1:AA:08:B4:DF:0F
a=setup:actpass
a=sendrecv
a=mid:audio
a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=candidate:1 1 SSLTCP 2130706431 172.18.27.111 4443 typ host generation 0
a=candidate:3 1 UDP 2130706431 172.18.27.111 10000 typ host generation 0
a=candidate:2 1 SSLTCP 1694498815 54.165.101.244 4443 typ srflx raddr 172.18.27.111 rport 4443 generation 0
a=candidate:4 1 UDP 1677724415 54.165.101.244 10000 typ srflx raddr 172.18.27.111 rport 10000 generation 0
m=application 1 DTLS/SCTP 5000
c=IN IP4 0.0.0.0
a=ice-ufrag:apt0u1agcv17b5
a=ice-pwd:1tpk3nc33btjm76p0h6mmk0pt7
a=fingerprint:sha-1 86:1A:F4:87:D2:95:59:62:38:6C:54:E0:A1:0C:C1:AA:08:B4:DF:0F
a=setup:actpass
a=sctpmap:5000 webrtc-datachannel 1024
a=mid:data
a=candidate:1 1 SSLTCP 2130706431 172.18.27.111 4443 typ host generation 0
a=candidate:3 1 UDP 2130706431 172.18.27.111 10000 typ host generation 0
a=candidate:2 1 SSLTCP 1694498815 54.165.101.244 4443 typ srflx raddr 172.18.27.111 rport 4443 generation 0
a=candidate:4 1 UDP 1677724415 54.165.101.244 10000 typ srflx raddr 172.18.27.111 rport 10000 generation 0

@brianh5
Copy link
Contributor

brianh5 commented Apr 15, 2016

We just repro'd freezes on apprtc by limiting uplink bandwidth on one sender to 1.5mbps. We see it with h264 and vp8...it detects the available send bandwidth correctly, but with rtx regularly goes over it which causes more loss and freezes (chrome will also refuse to send rtx if it's bandwidth is too high, so I think there's a bad cycle here that causes problems). We just got some screenshots from webrtc-internals and are going to file something today.

@brianh5
Copy link
Contributor

brianh5 commented Apr 15, 2016

Filed against chrome here https://bugs.chromium.org/p/webrtc/issues/detail?id=5797

@bradrlaw
Copy link

Where do we stand on this issue? The linked issue to chrome appears closed without anything resolved? We are running into this issue constantly making jitsi unusable for any production type use. This is happening with our own installs, regardless of patch level, as well as the demo at http://meet.jit.si.

A symptom is extremely high packet loss once an endpoint has less than 1.5mbps available. The video will intermittently freeze for upwards of 5 to 15 (or more) seconds.

@joelbrewer
Copy link

Any update on this? We are considering a switch to jitsi-videobridge -- however, random freezing on Chrome could be a non-starter.

@bbaldino
Copy link
Member

In regards to my previous comment about the chrome issue ("brianh5" above), we found that the way we were simulating low bandwidth links was inaccurate (network simulator on mac, for example, will do loss to simulate a lower-bandwidth link, but not add any delay--chrome keys quite a bit on the delay to lower the bandwidth estimation), once we properly simulated things we no longer saw that issue so told them they could close that bug.

I also found a bug a couple months ago in the porting of the webrtc bandwidth estimation logic on the bridge that was failing to take delay into account (jitsi/libjitsi#212). Fixing that resulted in much better performance on links with high delay (common for poor links that also have low bw). Other than those 2 scenarios, I wasn't aware of any other freezing issues with chrome.

bbaldino added a commit to bbaldino/jitsi-videobridge that referenced this issue Jan 22, 2020
* add ability to parse bandwidth from a string

* allow space between amount and unit

* tweak test

* change units string to lower case

* add case test
bbaldino added a commit to bbaldino/jitsi-videobridge that referenced this issue Sep 24, 2020
* add ability to parse bandwidth from a string

* allow space between amount and unit

* tweak test

* change units string to lower case

* add case test
JonathanLennox pushed a commit to JonathanLennox/jitsi-videobridge that referenced this issue Jun 1, 2022
* add ability to parse bandwidth from a string

* allow space between amount and unit

* tweak test

* change units string to lower case

* add case test
This was referenced Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests