Skip to content
This repository has been archived by the owner on Jan 19, 2022. It is now read-only.

RTC_API_Proposal

fluffy edited this page Jul 9, 2011 · 41 revisions

RTC API Proposal

CJ - Cullen comments in emphasis and start with CJ. Feel free to remove but seemed easier to put them inline here

The WhatWG proposal for real time media streams presents many fine ideas, as does an extension to the Streams API presented in that document (as proposed to the W3C audio working group). This proposal builds on those two documents to present an API for media capture and transmission in web browsers.

The primary motivations for this document are:

  1. Some use-cases are not satisfied with either of the earlier proposals.
  2. Some aspects of earlier proposals are amenable to simplification, and others may present unique implementation challenges, which this proposal takes into account.
  3. Firefox already supports a rich Audio API for manipulating streams and we would like to ensure that subsequent work on video and real time communication plays well with other media APIs.

Use cases

For purposes of designing this API, we present the following use-cases. We omit the use-cases that do not pertain to the RTC working group (such as local-only media capture or audio processing), but it suffices to state that from an implementation perspective, it is important to consider all media-related APIs for coherence, and that the API proposed in this document does take those use-cases into account, even though they are not presented here.

  • Simple video and voice calling site
  • Broadcasting real time video & audio streams
  • Browser based MMORPG that enables player voice communication (push to talk)
  • Video conferencing between 3 or more users (potentially on different sites)
  • [Fill in more use cases from IETF document]

API Specification

The API proposed in this section is intended to be the baseline that should be provided by the browser and to give web applications the maximum amount of flexibility. Some use-cases (such as a simple video chat application) may be fulfilled by a simpler API more intuitive to web developers; however, it is hoped that such an API may be built on top of the proposed baseline. We do not preclude that a simpler API be specified by the working group, but suggest that it be mandatory for browsers to implement the following specification to ensure that all targeted use-cases are satisfied.

We split the specification into three distinct portions for clarity: definition of media streams, obtaining device access, and establishing peer connections. Implementation of all three sections are required for an end-to-end solution satisfying all the use-cases targeted.

Media streams

A media stream is an abstraction over a particular window of time-coded audio or video data (or both). The time-codes are in the stream's own internal timeline. The internal timeline can have any base offset but always advances at the same rate as real time. Media streams are not seekable in any direction. CJ This is not a huge deal for me but I find it weird that a single track could have both audio and video. Take video with stereo audio. I think of this as three tracks, a video track, and a left, and right audio track. Also if we had a high res and low res version of the same video, we could model this as two tracks. If we had right and left images for 3D video, two tracks. The tracks may all be coming from the same file or container with the coded information as obviously there are coding advantages to joined coding of highly correlated information. But from the API point of view and way users sees them. Seems like separate tracks make this clear. I'm not really worked up about how we do this but I think we need a clear strategy so that when something a bit weird, like DTMF (my canonical example of weirdness), comes along we will know if it goes in an existing track or a new track.

interface MediaStream {
    readonly attribute DOMString label;
    readonly attribute double currentTime;

    MediaStreamTrack[] tracks;
    MediaStreamRecorder record(in DOMString type);
    void stop();

    const unsigned short LIVE = 1;
    const unsigned short BLOCKED = 2;
    const unsigned short ENDED = 3;
    readonly attribute unsigned short readyState;

    attribute Function onReadyStateChange;   
    ProcessedMediaStream createProcessor(in optional Worker);
}

When the readyState of a media stream is LIVE, the window is advancing in real-time. When the state is BLOCKED, the stream does not advance (the user-agent may replace this with silence until the stream is LIVE again); and ENDED implies no further data will be received on the stream. Every stream has an associated set of tracks:

CJ - I'm pretty sure there is a typo on the RFC 8421 number

 interface MediaStreamTrack {
     readonly attribute MediaStream stream;
     readonly attribute boolean audio;
     readonly attribute boolean video;
     
     readonly attribute DOMString type; // RFC 8421 
     readonly attribute DOMString label;

     attribute Function onTypeChanged;
     attribute DOMString[] supportedTypes;
     attribute MediaStreamTrackHint hint; **CJ - perhaps hints instead of hint**

     readonly attribute double volume; **CJ - we need to be clear on volume means and units here. The voice people might think that volume refers to the average level of sound in this stream, not the gain we want applied**
     void setVolume(in double volume, in double optional startTime, in double optional duration);

     const unsigned short ENABLED = 1;
     const unsigned short DISABLED = 2;
     attribute unsigned short state;

     readonly attribute MediaBuffer buffer;
 };

The audio attribute is set if the stream carries audio data, and the video attribute is set if it carries video data (if both are set, audio & video are both included in the video buffer, depending on the encapsulation format). MediaBuffer allows web applications to access the underlying media data:

interface MediaBuffer {
    readonly attribute DOMString type; // RFC 8421 **CJ typo**

CJ Do we want to add a way to get a sequence number here, some codecs you can reconstruct without knowing something about the ordering of the packets and it will arrive out of order some times Object getBufferData(args); // codec specific (may return the next Ogg packet in the stream, for example) };

The programmer can also provide "hints" to the MediaStreamTrack as to the kind of data it is carrying. The MediaStreamTrack's type may then change to accommodate the provided hints, and if this is done, the onTypeChanged event handler will be called. CJ I think this hints is going the right direction of something simple enough that people could use - I'm sure lots of work will be need but lets remind everyone "less is more"

interface MediaStreamTrackHint {
    attribute boolean isMusic;
    attribute boolean isSpokenVoice;

    unsigned short AUDIO_BROADBAND = 1; **CJ - upon reflection, I don't like these, they produce a sharp cutoff and hard to know which is right. Lately I have been more of fan of algorithm that can be parameterized into a gradual change than a few modes - lets think more about what the user of the API knows, and what they want to accomplish. **
    unsigned short AUDIO_NARROWBAND = 2;
    attribute unsigned short audioBand;

    attribute unsigned long videoWidth;
    attribute unsigned long videoHeight;
    attribute unsigned long videoFrameRate;**CJ - worth noting we are seeing more cameras support 72 fps or more**

    unsigned short VIDEO_SLOW_MOVING = 1;
    unsigned short VIDEO_FAST_MOVING = 2;
    attribute unsigned short videoType;

    attribute double percentageCPU;
    attribute double percentageGPU;
};

Streams can be associated with existing HTML media elements such as <video> and <audio>, and video streams with <canvas>. Each of these tags may serve as either the input or output for a media stream, by setting or getting the stream attribute as appropriate.

partial interface HTMLMediaElement {
    attribute MediaStream stream;
};
partial interface HTMLCanvasElement {
    attribute MediaStream stream;
};

Streams can be recorded to files, which can then be accessed via the DOM File APIs:

interface MediaStreamRecorder {
    readonly attribute MediaStream stream;
    void getRecordedData(in Function onsuccess, in Function onerror);
    void stop();
};
function onsuccess(DOMString type, DOMFile file);
function onerror(DOMString error);

The type argument passed to the onsuccess callback is a string as defined in RFC8421. (This is the same format for the type attribute in MediaBuffer).

Device access

MediaStreams can be obtained from <video>, <audio> and <canvas> elements; but they can also be obtained from a user's local media devices such a webcam or microphone:

interface NavigatorMedia {
    void getMediaStream(in boolean video, in boolean audio, in Function onsuccess, in optional Function onerror);
};
Navigator implements NavigatorMedia;

function onsuccess(MediaStream stream);

const unsigned short PERMISSION_DENIED = 1;
const unsigned short RESOURCE_BUSY = 2;
const unsigned short RESOURCE_UNAVAILABLE = 3;
function onerror(unsigned short errorCode);

The caller may set the values of 'audio' and 'video' to true if they require those inputs. If either of the requested inputs were not available, the success callback is still called; thus the application must check the type attribute of the resulting tracks in the stream handed to it to verify whether the stream contains only audio, only video, or both. If hardware to fulfil the request is unavailable the error callback is invoked with RESOURCE_UNAVAILABLE, but if hardware is available and is currently being used by another application, RESOURCE_BUSY is returned. Additionaly, the user-agent may choose to offer the user to select a local file to act as the source of the media stream in place of real hardware.

Peer connections

A peer connection provides a UDP channel of communication between two user-agents.

constructor PeerConnection(DOMString config, Function sendSignal, optional DOMString negotiationServerURN)
interface PeerConnection {
    void receivedSignal(DOMString msg);

    const unsigned short LISTENING = 1;
    const unsigned short OPENING = 2;
    const unsigned short INCOMING = 3;
    const unsigned short ACTIVE = 4;
    const unsigned short CLOSED = 5;
    readonly attribute unsigned short readyState;

    void addLocalStream(in MediaStream stream);
    void removeLocalStream(in MediaStream stream);
    readonly attribute MediaStream[] localStreams;
    readonly attribute MediaStream[] remoteStreams;

    void open();
    void accept();
    void close();
    void send(in DOMString text);

    attribute Function onMessage;
    attribute Function onRemoteStreamAdded;
    attribute Function onRemoteStreamRemoved;
    attribute Function onReadyStateChange;
};
Window implements PeerConnection;

The configuration string gives the address of a STUN or TURN server used to establish the connection. sendSignal is a function that is provided by the caller which will be called when the user-agent needs to transport and out of band signalling message to the remote peer. When a message is received from the remote peer via this channel, it must be sent to the user-agent by calling receivedSignal(). The ordering of messages is important.

Examples

Simple video calling between two users A and B. A is making a call to B:

// User-agent A executes
<video id="localPreview"/><video id="remoteView"/>
<script>
navigator.getMediaStream(true, true, function(stream) {
    document.getElementById("localPreview").stream = stream;
    var conn = new PeerConnection("STUN foobar.net:3476", sendToB);

    function sendToB(msg) { // send via XHR to B }
    function gotFromB(msg) { conn.receivedSignal(msg); }

    conn.addLocalStream(stream);
    conn.onRemoteStreamAdded = function(remoteStream) {
        document.getElementById("remoteView").stream = remoteStream;
    };

    conn.open();
});
</script>

// User-agent B executes
<video id="localPreview"/><video id="remoteView"/>
<script>
navigator.getMediaStream(true, true, function(stream) {
    document.getElementById("localPreview").stream = stream;
    var conn = new PeerConnection("STUN foobar.net:3476", sendToA);

    function sendToA(msg) { // send via XHR to A }
    function gotFromA(msg) { conn.receivedSignal(msg); }

    conn.addLocalStream(stream);
    conn.onRemoteStreamAdded = function(remoteStream) {
        document.getElementById("remoteView").stream = remoteStream;
    };
    conn.onReadyStateChanged = function() {
        if (conn.readyState == INCOMING) conn.accept();
    };
});
</script>

Broadcasting real-time video & audio streams:

// This code runs on the "server". Some other part of the web page magically paints the game to the canvas
<canvas id="hockeyGame"/>
<script>
function sendToPeer(msg) { // Out of band send }
function gotFromPeer(msg) { conn.receivedSignal(msg); // Out of band receive }

var conn = navigator.createPeerConnection("TURNS example.org", sendToPeer);
conn.addLocalStream(document.getElementById("hockeyGame").stream);
conn.open(); 
</script>

// All clients subscribing to the broadcast run this code.
// TURN server does the job of initiating onRemoteStreamAdded for every client?
<video id="gameStream"/>
<script>
    function sendToPeer(msg) { // Out of band send }
    function gotFromPeer(msg) { conn.receivedSignal(msg); // Out of band receive }

    var conn = navigator.createPeerConnection("TURNS example.org", sendToPeer);
    conn.onRemoteStreamAdded = function(stream) {
        document.getElementById("gameStream").stream = stream;
    };

    conn.accept(); // You can also call accept() if readyState is not INCOMING
    // Implies that when the transition from LISTENING -> INCOMING is made, simply accept
</script>

Browser based MMORPG that enables player voice communication (push to talk):

// All players
<button id="ptt"/>
<audio id="otherPlayers"/>
<script>
var mixer;
var worker = new Worker("muxer.js");
navigator.getMediaStream(true, false, function(stream) {
    function sendToPeer(msg) { // Out of band send }
    function gotFromPeer(msg) { conn.receivedSignal(msg); // Out of band receive }

    var conn = navigator.createPeerConnection("STUNS game-server.net:3345");
    conn.addLocalStream(stream);

    conn.onRemotetreamAdded = function(remoteStream) {
        if (!mixer) mixer = remoteStream.createProcessor(worker); // StreamProcessor API TBD
        else mixer.addInput(remoteStream);
    };
    conn.accept();

    var streaming = false;
    document.getElementById("ptt").onclick = {
        if (!streaming) {
            streaming = true;
            stream.readyState = stream.LIVE;
        } else {
            streaming = false;
            stream.readyState = stream.BLOCKED;
        }
    };
});
document.getElementById("otherPlayers").stream = mixer.outputStream;
Clone this wiki locally