Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High latency when audio is enabled. #7

Closed
danisla opened this issue Jan 5, 2022 · 58 comments · Fixed by #96
Closed

High latency when audio is enabled. #7

danisla opened this issue Jan 5, 2022 · 58 comments · Fixed by #96
Labels
bug Something isn't working encoding Audio or video encoders but not the OS interfaces funding Requires funding to implement help wanted External contribution is required performance Performance or latency issues, not critical but impacts usage transport Underlying media or data transport protocols upstream Requires upstream development from dependencies

Comments

@danisla
Copy link
Member

danisla commented Jan 5, 2022

Enabling audio results in high latency (700-1000ms) after first connection. It eventually drops down to something more reasonable (100-250ms).

@danisla danisla added the bug Something isn't working label Jan 5, 2022
@danisla danisla added the help wanted External contribution is required label Jan 18, 2022
@ehfd
Copy link
Member

ehfd commented Feb 3, 2022

xpra-html5, a similar project with a different direction, seems to want to decode audio in a worker or use Web Audio API.

Meh. Probably best to use Web Audio API (Xpra-org/xpra-html5#64) instead.
FWIW, this fork for aurora is more up to date: https://github.com/Kukunin/aurora.js - https://github.com/Kukunin/opus.js

More technologies available to explore: Xpra-org/xpra-html5#56

@danisla
Copy link
Member Author

danisla commented Feb 4, 2022

The xpra-html5 project has very high latency for audio (10-45 seconds). Not sure if this is because of an outdated opus decoder or just the nature of the websocket and latency.

If we want to keep the audio in sync with the video, the gstreamer pipeline has to be driven by the clock of the audio stream.

If it's just an issue with how PulseAudio is being integrated with GStreamer, maybe there is some bug, or we look into using PipeWire with PulseAudio to see if we can reduce the latency.

I was also thinking about offering a decoupled audio/video experience, where we send the audio in a separate stream from the video, sacrificing audio synchronization. This could be done with a new webrtc transceiver and leveraging most/all of the browsers MediaSource capabilities for decoding and buffering. Another option would be to send opus packets over a data channel and decode them with a WASM OPUS decoder like what Aurora is doing.

The bigger issue with MediaSource in the browser is that it's designed for voice applications (video conferencing) so there are algorithms optimized to maintain audio continuity and synchronization with latency as the tradeoff. Unfortunately, these algorithms and their jitter buffers cannot be configured.

@totaam
Copy link

totaam commented Feb 23, 2022

If we want to keep the audio in sync with the video, the gstreamer pipeline has to be driven by the clock of the audio stream.

Audio cannot have any jitter or it sounds terrible so the receiver always maintains a buffer. It's the video that must be synced with the audio.
The xpra server detects video content and then delays video frames (and only the video part of the window, nothing else) by the same amount.

If it's just an issue with how PulseAudio is being integrated with GStreamer ...

Unlikely. The server-side latency is very small.

or we look into using PipeWire with PulseAudio to see if we can reduce the latency

The latency problems are almost always client side. I very much doubt that using pipewire server-side will help with anything.

... and leveraging most/all of the browsers MediaSource capabilities for decoding and buffering

xpra-html5 does use the mediasource API... YMMV!

Another option would be to send opus packets over a data channel and decode them with a WASM OPUS decoder like what Aurora is doing

That's what xpra does in "non-native" mode.

Unfortunately, these algorithms and their jitter buffers cannot be configured

Indeed!

@ehfd
Copy link
Member

ehfd commented Feb 24, 2022

The latency problems are almost always client side. I very much doubt that using pipewire server-side will help with anything.

This is consistent with our observations so far.

@totaam Thanks a lot for visiting!

@xhejtman
Copy link
Contributor

Hi, at my setup, I see latency about 200ms total. I did some improvements to the code, it is available here:
https://github.com/CERIT-SC/webrtc

nvimage contains gstreamer plugin, that uses FBC + NVENC to directly grab and encode frames on GPU. Oficially, FBC is available on 'Tesla' GPUs only (patches for desktop cards exist though).

the nvimage is added into streamer that is slightly modified version of yours. It adds full support for NACKs and retransmissions that helps a lot. Also this patch for gstreamer might be needed as well:

--- gstreamer/subprojects/gst-plugins-bad/ext/webrtc/gstwebrtcbin.c.orig        2022-04-14 20:48:20.800615949 +0200
+++ gstreamer/subprojects/gst-plugins-bad/ext/webrtc/gstwebrtcbin.c     2022-04-15 00:17:46.560747338 +0200
@@ -2676,7 +2676,7 @@
     gst_sdp_media_add_attribute (media, "rtpmap", str);
     g_free (str);

-    str = g_strdup_printf ("%u apt=%d", pt, target_pt);
+    str = g_strdup_printf ("%u apt=%d;rtx-time=125", pt, target_pt);
     gst_sdp_media_add_attribute (media, "fmtp", str);
     g_free (str);
   }

I also use gstreamer 1.20.1, that seems to be far better compared to 1.19.1.

I also tried to setup H265, but no luck so far, and it seems to be a waste of time as chrome does not support H265 and H265 from NVENC seems to be not supported on Safari (only remaining browser with h265 support).

What might be actually interesting is to utilize moonlight clients for receiving stream as moonlight uses rtp/rtsp/sdp under hood.

@danisla
Copy link
Member Author

danisla commented Apr 18, 2022

@xhejtman This great, thank you so much for the info and for sharing the nvimage element. Looking forward to getting this integrated. Would love to collaborate on a PR.

Any interest in merging this into the gst-plugins-bad upstream repo as a new element?

Does the support for NACKs and retransmissions as well as the rtx-time patch help with latency and stability?

@xhejtman
Copy link
Contributor

witcher

As of NACKs, I can play linux port of The Wither 2 now ;) see above. RTT to coturn server is about 11ms and I am on ~100Mbps line at home.

Without NACKs, if a packet is lost, you need to wait for the next keyframe to recover which introduces image lags/stuttering. With NACKs, I see like 2-3 keyframes in very long time and all lost packets are retransmitted in time. So it looks pretty good.

There remain small issues, mostly with mouse. Without mouse lock (ctrl-shift-left button), it is completely unplayable, with mouse lock, it ok if only the mouse sensitivity is better - mouse motion is very slow (I have local 4k resolution, 1920x1080 remote). Also, automatic remote resize is terrible idea in this case, as game wants to change resolution and gstreamer as well and they fight each other :)

I would like to merge nvimage into gstreamer, however, it requires NVIDIA Video Capture SDK installed and it is licensed only to data-center GPUs. So not sure, if it would be accepted. Also, the code is a bit mess as it needs to workaround thread issues - API must not be called from different threads.

@ehfd
Copy link
Member

ehfd commented Apr 19, 2022

Is it possible to take advantage of NACKs and restransmissions without including features that officially requires datacenter GPUs?
Looks marvelous nonetheless!

@xhejtman
Copy link
Contributor

Is it possible to take advantage of NACKs and restransmissions without including features that officially requires datacenter GPUs?

Yes, NACKs and nvimage are two totally independent things.

@danisla
Copy link
Member Author

danisla commented Apr 19, 2022

@xhejtman love the demo of running Witcher 2 on Selkies! Nice work.

I'll test the FBC+NVENC soon if I can get a copy of the NVIDIA Video Codec SDK... It would be nice if this element could use dynamic loading, similar to how the nvcodec element works.

I'm happy to include the patch for rtx-time as I don't think it will break anything, though I'm not sure what the default value is, or if gstreamer has an explicit implementation of it. This is how it's defined in RFC 4588:

rtx-time: indicates the time in milliseconds (measured from the
time a packet was first sent) that the sender keeps an RTP packet
in its buffers available for retransmission.

Regarding the mouse lock issue, if you don't mind opening a new issue for that, I can walk you through how to fix it with the uinput mouse emulation feature.

@xhejtman
Copy link
Contributor

you can get sdk here
https://developer.nvidia.com/capture-sdk

@xhejtman
Copy link
Contributor

Regarding the mouse lock issue, if you don't mind opening a new issue for that, I can walk you through how to fix it with the uinput mouse emulation feature.

perhaps extend issue #28 ?

@danisla danisla mentioned this issue Apr 19, 2022
2 tasks
@ehfd
Copy link
Member

ehfd commented Aug 25, 2022

@ehfd
Copy link
Member

ehfd commented Aug 31, 2022

@ehfd
Copy link
Member

ehfd commented Sep 9, 2022

Using webrtcsink instead of webrtcbin may substantially improve or fix this issue.
https://github.com/centricular/webrtcsink

@ehfd
Copy link
Member

ehfd commented Sep 11, 2022

centricular/gstwebrtc-demos#102
Interestingly, this issue highlights the same problems as us, and this might be the solution.
Why not just turn off the audio syncing?
@danisla @xhejtman

@Xosrov
Copy link

Xosrov commented Sep 19, 2022

Hey, I'm working on a similar project with the C API for Windows and I've come across the exact same problem. Starting the stream with audio introduces a noticeable lag to it after 2-3 seconds that persists after that. All other elements in the pipeline had no effect on this problem so something in webrtcbin seems to be the problem. Here's what I've tried so far with no avail:

  • Setting sync=false in internal webrtcbin clocksyncs
  • Disabling audio FEC from encoder and webrtc transceivers
  • Reducing max-size-packets in internal webrtcbin rtprtxsend
  • Reducing latency in internal webrtcbin rtpbin
  • Reducing processing-deadline and max-lateness in internal webrtcbin nicesink
  • Using a different audio source like audiotestsrc is-live=1 instead of wasapi2src
  • Adding queues with leaky=downstream in different points of the pipeline to prevent potential congestions

I'm still gonna mess around with it, but If no solution exists i suppose the best thing to do is just send audio over a new datachannel or websocket connection. At least until the source of the problem is found.

@ehfd
Copy link
Member

ehfd commented Sep 21, 2022

@Xosrov Thanks for reaching out! I did not contact the devs at GStreamer about this yet, so I still have to ask them. Meanwhile, Neko got a head start compared to us, so it might be worth checking what they do.

@ehfd
Copy link
Member

ehfd commented Sep 21, 2022

@Xosrov We have a contributor working to implement Windows functionalities by the way.

@Xosrov
Copy link

Xosrov commented Sep 21, 2022

@ehfd In the past few hours I've actually found somewhat of a solution to the whole lag issue and will keep you updated after I come up with a less drastic fix.
I actually have some ideas to improve this projects as well! Though i don't have much time these coming days I'll try to help out as much as i can especially with the Windows implementation.

@ehfd
Copy link
Member

ehfd commented Sep 21, 2022

@Xosrov Please come to the Discord (button in README.md)! @callTx is developing right now.

@ehfd
Copy link
Member

ehfd commented Sep 21, 2022

I noticed that self.webrtcbin.set_property("latency", 0) might do something. I can't test right now, but this part is not in the code anywhere.
This does nothing. Even set drop-on-latency of the rtpbin embedded inside webrtcbin.

@ehfd
Copy link
Member

ehfd commented Oct 12, 2022

Turning of synchronization with pulsesrc.set_property("provide-clock", False) and pulsesrc.set_property("do-timestamp", False) improved the latency from what was around 200 ms in a good network to 100 ms.

@ehfd
Copy link
Member

ehfd commented Oct 16, 2022

The reported latency in the side panel is calculated by summing the latency components reported by the WebRTC stats. It's effectively network latency + video jitter buffer latency + audio jitter buffer latency.

The code is here: https://github.com/selkies-project/selkies-gstreamer/blob/master/addons/gst-web/src/app.js#L378

This probably needs to be double-checked to make sure we aren't double-counting the audio and video latency, since with AV sync they may be overlapping. It might be that we need to take the max(audio, video jitter buffer latency) rather than adding each separately. I haven't confirmed this yet though.

@danisla It seems that Latency under Video stats seem to only measure the distance between the fronting reverse proxy or web server, rather than the host itself. When using HAProxy, this is the case where it shows precisely the expected latency between the client to the ingress instead of the host, and the current max latency is more accurate.

@ehfd
Copy link
Member

ehfd commented Oct 18, 2022

I am starting to believe that for both WebTransport and WebRTC, using custom Python solutions outside of GStreamer such as https://github.com/aiortc/aioquic or https://github.com/aiortc/aiortc is becoming plausible, since as long as an RTP stream is generated with GStreamer, WebRTC/WebTransport operations can definitely be separated.

This can be done soon instead of waiting for GStreamer.

This is because using GStreamer plugins for transport is slowly showing less flexibility and exhibiting performance issues that takes time to fix.

While the core developers ourselves are short of time to do this in the meantime, and we will focus maintaining the GStreamer native webrtcbin/webrtcsink pipelines, people are welcome to set bounties or help out in this way instead of implementing a GStreamer plugin. A substantial bounty may allow us to implement this ourselves.

Please follow up at #48.

@Xosrov
Copy link

Xosrov commented Oct 18, 2022

@ehfd

I have some things to add to this from personal experience. Here are 2 ways one might approach what you've suggested:

  1. Using rtpbin + appsink. This is coveted since rtpbin also handles RTCP; however, I ran into some issues with this before and I'm not sure if it works properly.
  2. Using appsink with raw RTP packets. here is a simple example with Pion in Go. I've had success with streaming RTP over UDP to a central Go server, with no lag/sync issues. RTCP needs to be handled manually though.

I personally think it's more beneficial long-term to focus on fixing the problems in GStreamer though. These solutions make a lot of sense when you separate the WebRTC and encoding servers, but when It's all on the same system, GStreamer alone has the potential to deliver better performance.

@ehfd
Copy link
Member

ehfd commented Oct 19, 2022

I agree, it will be cleaner if we can make it work right with GStreamer fully. But that means we need C programmers instead of Python.

@ehfd
Copy link
Member

ehfd commented Oct 27, 2022

webrtcsink is now in https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/tree/main/net/webrtc by Mathieu Duponchelle.

@ehfd
Copy link
Member

ehfd commented Oct 27, 2022

https://twitter.com/gstreamer/status/1584612758141947904

@danisla Is WHIP/WHEP relevant for us, or no?

@ehfd
Copy link
Member

ehfd commented Nov 14, 2022

Issue on GStreamer, please follow the discussion:

https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1261

@ehfd
Copy link
Member

ehfd commented Aug 22, 2023

From @m1k1o:

We face it as well, and found out it must be something with client's internal audio jitter buffer that we are not able to control using JavaScript.
But it only seems to be reproducible in Google Chrome, and only for the first couple of seconds (latency is around 800-1200ms). Then the latency drops to ~400ms.
In Firefox it seems to behave better. Consistently lower latency.

Must understand whether it is the web browser's issue. Perhaps, the Parsec web clients listed right above may provide insights.
It uses WebAssembly for Opus decoding.

@ehfd
Copy link
Member

ehfd commented Aug 23, 2023

https://www.w3.org/2018/12/games-workshop/slides/21-webtransport-webcodecs.pdf

Looks like WebTransport + WebCodecs (in implementation within Firefox) is the modern alternative to WebAssembly or MSE. In progress to be implemented in Firefox.

@ehfd
Copy link
Member

ehfd commented Sep 9, 2023

https://www.w3.org/2018/12/games-workshop/slides/21-webtransport-webcodecs.pdf

Looks like WebTransport + WebCodecs (in implementation within Firefox) is the modern alternative to WebAssembly or MSE. In progress to be implemented in Firefox.

Some input from @wanjohiryan in this regard would be extremely valuable.

@danisla
Copy link
Member Author

danisla commented Sep 9, 2023

I’ve been tracking WebTransport closely and prototyping it locally off and on the past year.

It’s interesting but not ready for the Selkies Project yet. Right now, it works well in a tightly controlled production environment. However, today it requires a lot of configuration, valid certificates, browser command line arguments and OS exceptions that make working it locally very challenging.

@ehfd
Copy link
Member

ehfd commented Sep 10, 2023

I’ve been tracking WebTransport closely and prototyping it locally off and on the past year.

It’s interesting but not ready for the Selkies Project yet. Right now, it works well in a tightly controlled production environment. However, today it requires a lot of configuration, valid certificates, browser command line arguments and OS exceptions that make working it locally very challenging.

Self-signed certificates look possible but complicated. By the way, is mainly using WebSocket that bad (for a fall-back protocol) compared to WebRTC? - Since it's what Guacamole, NoVNC, etc uses.

https://docs.libp2p.io/concepts/transports/webtransport/

When connecting to a WebSocket server, browsers require the server to present a TLS certificate signed by a trusted CA (certificate authority). Few nodes have such a certificate, which is the reason that WebSocket never saw widespread adoption in the libp2p network. libp2p WebTransport offers a browser API that includes a way to accept the server’s certificate by checking the (SHA-256) hash of the certificate (using the serverCertificateHashes option), even if the certificate is “just” a self-signed certificate. This allows us to connect any browser node to any server node, as long as the browser knows the certificate hash in advance (see WebTransport in libp2p for how WebTransport addresses achieve this).

WebTransport in libp2p
WebTransport multiaddresses are composed of a QUIC multiaddr, followed by /webtransport and a list of multihashes of the node certificates that the server uses.

For instance, for multiaddress /ip4/192.0.2.0/udp/123/quic/webtransport/certhash/, a standard local QUIC connection is defined up until and including /quic. Then, /webtransport/ runs over QUIC. The self-signed certificate hash that the server will use to verify the connection.

The WebTransport CONNECT request is sent to an HTTPS endpoint. libp2p WebTransport server use /.well-known/libp2p-webtransport. For instance, the WebTransport URL of a WebTransport server advertising /ip4/192.0.2.0/udp/1234/quic/webtransport/ would be https://192.0.2.0:1234/.well-known/libp2p-webtransport?type=noise (the ?type=noise refers to the authentication scheme using Noise).

@wanjohiryan
Copy link

Oh, hey Kim @ehfd it's been a minute

Yeah, i can second what @danisla is saying... local development for webtransport is hectic at the moment especially for anything working on with the browser (server-side quic development is the easiest).

For example, for warp we must generate self-signed certs every 14 days, as well as a fingerprint hash as a workaround to having to execute bash scripts each time we need to refresh our site in order to force Chrome to allow QUIC connections.

Also, there is currently no way to load balance QUIC connections, UDP load balancers are great and all but they take away the QUIC's ability for roaming support (you can change internet connections, e.g mobile data to Wifi, without having to reinitiate the handshake to the server)

QUIC is shiny, promising and all... however we are not there yet

@ehfd
Copy link
Member

ehfd commented Sep 12, 2023

Thanks for all the information. The spec makes things very complicated.
@wanjohiryan

@ehfd
Copy link
Member

ehfd commented Sep 20, 2023

Perhaps, using good old WebSockets might not be that bad?

@alxlive
Copy link

alxlive commented Sep 21, 2023

The reported latency in the side panel is calculated by summing the latency components reported by the WebRTC stats. It's effectively network latency + video jitter buffer latency + audio jitter buffer latency.
The code is here: https://github.com/selkies-project/selkies-gstreamer/blob/master/addons/gst-web/src/app.js#L378
This probably needs to be double-checked to make sure we aren't double-counting the audio and video latency, since with AV sync they may be overlapping. It might be that we need to take the max(audio, video jitter buffer latency) rather than adding each separately. I haven't confirmed this yet though.

I borrowed some of the code from your project and used it in ours to compare using 2 webrtcbins vs 1 with sync enabled/disabled (via do-timestamp). Here are the results for video after a few minutes of playing colorful animation scenes:

  1. Audio & Video in one webrtcbin and sync enabled
{
  "connectionPacketsReceived": 98305,
  "connectionPacketsLost": 9134,
  "connectionBytesReceived": "144.38 MBytes",
  "connectionBytesSent": "0.66 MBytes",
  "connectionVideoLatency": 136,
  "connectionCodec": "VP9",
  "connectionVideoDecoder": "libvpx",
  "connectionResolution": "1280x720",
  "connectionFrameRate": 60
}
  1. Audio & Video in one webrtcbin and sync disabled
{
    "connectionPacketsReceived": 99821,
    "connectionPacketsLost": 11039,
    "connectionBytesReceived": "151.52 MBytes",
    "connectionBytesSent": "0.64 MBytes",
    "connectionVideoLatency": 125,
    "connectionCodec": "VP9",
    "connectionVideoDecoder": "libvpx",
    "connectionResolution": "1280x720",
    "connectionFrameRate": 45
}
  1. Audio & Video in different webrtcbins
{
    "connectionPacketsReceived": 85268,
    "connectionPacketsLost": 8331,
    "connectionBytesReceived": "125.72 MBytes",
    "connectionBytesSent": "0.43 MBytes",
    "connectionVideoLatency": 46,
    "connectionCodec": "VP9",
    "connectionVideoDecoder": "libvpx",
    "connectionResolution": "1280x720",
    "connectionFrameRate": 67
}

It doesn't seem like do-timestamp is doing much for me, the latency for the first two kept rising from 30 all the way to 100+ but in the third it stayed pretty much the same after a few mins. The problem is definitely in webrtcbin as everything else is the same. P.S. don't pay too much attention to the framerate. My network is very lossy so it varies a lot around 60.

Hi @Xosrov, do you have an implementation of using two webrtcbins in selkies-gstreamer? Could you publish this for others to try it?

But as I re-read your message I see that you said "I borrowed some of the code from your project and used it in ours". Which project are you referring to? Is there any code we could take a look at?

@Xosrov
Copy link

Xosrov commented Sep 21, 2023

@alxlive Hey! I was referring to the javascript code for generating the reports in my post. The code adding the actual WebRTCbin element is in this C file.
I don't have an implementation in selkies ready, but I can get a simple but working version done by the end of the week so we can compare.

@alxlive
Copy link

alxlive commented Sep 21, 2023

That'd be great! Would be awesome to confirm that your fix works in selkies too.

@ehfd
Copy link
Member

ehfd commented Sep 29, 2023

That'd be great! Would be awesome to confirm that your fix works in selkies too.

Merged in main.

@ehfd
Copy link
Member

ehfd commented Feb 22, 2024

For future reference, make sure to have the web interface ensure that the audio WebRTC stream and the video WebRTC stream are both initialized and running before starting the interface.
Make sure to fail or reload if either one of the streams fails.

Sometimes, the audio stream fails solely and the video stream keeps going on without the audio.

This still hasn't been 100% fixed and is tracked at #109.
However, it's just a matter of the web interface checking both audio and video streams are alive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working encoding Audio or video encoders but not the OS interfaces funding Requires funding to implement help wanted External contribution is required performance Performance or latency issues, not critical but impacts usage transport Underlying media or data transport protocols upstream Requires upstream development from dependencies
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants