Skip to content
Sean DuBois edited this page May 17, 2020 · 18 revisions

Pion WebRTC Media API

This document details a completely new media API for Pion WebRTC. The current media API has deficiencies that prevent it from being used in a few production workloads. This document doesn't aim to modify/extend the existing API, we are looking at it with fresh eyes.

Adding Comments

I encourage everyone to comment on this page! When adding comments add them in italics and include your GitHub username I believe this API can be improved by doing X -- Sean-Der

API Requirements

API Users

If you can think of more use cases please provide them, this list is not exhaustive!

Sending pre-recorded content to viewer(s)

A user has audio/video file on disk and wants to send the content to many viewers. There will be no congestion control, you will have some loss handling (NACK). If the remote viewer doesn't support the codec we offer handshaking will fail.

Relaying RTP Traffic (with no feedback)

A user has an existing RTP feed (RTSP camera), and wants to send the content to many viewers. There will be no congestion control, you will have some loss handling (NACK). If the remote viewer doesn't support the codec we offer handshaking will fail.

Sending live generated content

A user will be encoding content and sending to many viewers, this could be an MCU, capturing a webcam or desktop (like github.com/nerdism/neko). There will be congestion control, and packet loss handling (NACK/PLI). The user should be informed of the codecs the remote supports, and then be able to generate on the fly what is requested.

Ingesting WebRTC for Later Playback

A user wants to save media from a remote peer to disk. This could be for playback later, or some other async task. We need to ensure the best experience possible by providing loss handling, and congestion control. Latency doesn't matter as much.

Ingesting WebRTC for Live Playback

A user wants to consume media from a remote peer live. This could be used for processing (like GoCv) or playing back live. We need to ensure the best experience possible by providing loss handling, and congestion control. We will also need to be careful to not add much latency, this could hurt the entire experience.

Relaying WebRTC Traffic

Users should be able to build the classical SFU use cases. For each Peer you will have one PeerConnection, and transfer all tracks across that. If possible we should support Simulcast and SVC. However if nothing is supported we should just request the lowest bitrate that works for all peers. Beyond that we should pass everything through and let de-jitter happen on each receiver side. This needs more research.

Code that works in native and web

Users should be able to write idiomatic WebRTC code that works in both their native and Web applications. They should be able to call getUserMedia and have it work across both platforms. This portability is also very important for our ability to test.

API Features

An exact API will be defined below, this is a high level of what the user interaction will look like.

Sending Media

Set supported codecs at PeerConnection Level

A user on startup will declare what codecs they will support.

The user can add/remove from a list of RTCRtpCodecCapability

This allows us to express

  • All codecs (H264, Opus, VPx)
  • Attributes of that codec (packetization, profile)
  • RTCPFeedback (NACK, REMB)
Create a MediaStreamTrack

A user creates a MediaStreamTrack by either calling mediadevices.getUserMedia() OR creating a Track via webrtc.NewTrack(kind RTCCodeType, id, label string, func(RtpSender, supportedCodecs []RTCRtpCodecCapability) (RTCRtpCodecCapability, error)

Tracks must match MediaStreamTrack, so codec/ssrc will no longer be defined at the Track level.

Add a MediaStreamTrack to the PeerConnection

No change from the current Pion API, peerConnection.AddTrack(track)

On SetRemoteDescription a callback is fired on MediaStreamTrack with a RtpSender and supported codecs

Every time a PeerConnection that has added that track has finished signaling a callback is fired. Only then do we know the intersection of codecs. We can't pick H264 (or VPx) until we know the other side supports it.

func(sender RtpSender, supportedCodecs []RTCRtpCodecCapability) (RTCRtpCodecCapability, error) {
    if (len(supportedCodecs) == 0) {
      return fmt.Errorf("No supported codecs")
    }

    fanOutSlice = append(sender, fanOutSlice)
}

The example above shows the typical fan-out case. We get a new RtpSender, and then we add it to a list that another goroutine is looping and writing. When one of the RTPSenders returns io.EOF it removes it from the list. This was possible with the Pion API today, but here are the problems it does solve.

SSRC/PayloadType will be internally managed

Juggling these values makes the API hard to use. Browsers use different PayloadTypes, so this creates a lot of pain for users. It is also hard to debug when an SSRC is wrong.

Codec can be chosen on the fly

You don't know if the remote supports H264/VP9/AV1. You now can pick which codec you prefer out of all the intersections.

RTP and RTCP must be tightly coupled

The current API doesn't allow us to implement congestion control or error correction easily. By instead giving the user direct access to the RTPSender they have the hooks they need.

WriteSample should take time.Duration instead of (samples uint32)

The user shouldn't need to do the math. Internally we should convert it to a sample rate and pass to pion/rtp

Handling Jitter, Loss and Congestion

SettingEngine allows a user to define pass their own JitterBuffer and CongestionController

We will provide a sensible default, but these will both be interfaces that a user just has to satisfy. This is out of the scope of this document, the only thing we need to ensure is that it is possible without a API break.

A user can then go and interact with the JitterBuffer/CongestionController as they wish. If they want to mutate it at runtime or modify values. This will allow them to choose how much loss they are willing to tolerate etc.. This will also be helpful for building an SFU. You can have a CongestionController where you can set the upper bound being the lowest of all recievers. The REMB is then constructed and sent back to the reciever.

RTPSender will have callbacks for RTCP Feedback results

We will put two callbacks on the RTPSender, and the user can ignore them if they wish. These aren't portable, but I think putting them in the SettingEngine is the wrong thing to do.

RtpSender.OnBitrateSuggestion(func(bitrate float) {
})

RtpSender.OnKeyframeRequest(func() {
})

API In Action

Webcam capture that works in WASM and Go mode

This will capture a video device and will work in WASM or Go mode. When running in WASM mode the VP8 selection has no impact though. In the future if the WebRTC API allows that we will support it though.

func main() {
    // We only want to send VP8
	s := webrtc.SettingEngine{
		Codecs: []RTCRtpCodecCapability{
          webrtc.RTCRtpCodecCapabilityDefaultVP8,
        },
	}
	api := webrtc.NewAPI(webrtc.WithSettingEngine(s))

	peerConnection, err := api.NewPeerConnection(webrtc.Configuration{})

    track, err :=mediaDevices.GetUserMedia({Video: true})

    peerConnection.AddTrack(track)
}

Fan-out video from one PeerConnection to many

Distributing pre-recorded content

TODO/Questions

  • How do accomplish SVC?
  • How do we accomplish Simulcast