Skip to content

Latest commit

 

History

History
161 lines (114 loc) · 7.26 KB

explainer.md

File metadata and controls

161 lines (114 loc) · 7.26 KB

video SEI event Explainer

What is SEI

SEI is defined as User Data message in video bitstream. we can use it to transmit data with video content.

Problem and Motivation

In these years, interactions are frequently used in live activities. Such as shop tickets and Livestreaming quiz and so on. The most convenient way to keep activities sync with livestream is to use SEI (Supplemental Enhancement Information) of H.264.

When we use MSE to play livestream, it's easy to get access to H.264 NAL units by demux live video format. but when we use iOS Safari or WebView on iOS, we can only use <video src="{HLS_URL}"> to play live stream. So there is no chance that we can get access to the raw content of live stream (for example, SEI).

Therefore, We propose a new video sei event to solve this problem.

Key-use cases

Render element synchronously with video frame in <video> elements

Sometimes during the live activity, we'd have some interaction with audiences, for example: subtitles, Face recognition based stickers, question forms or goods for sale. Those information and SEI can be both produced with a NTP timestamp, so that we can know when to render it at the Webapp side.

Calculate end to end delay from broadcaster to client

End to end delay is important to measure the live experience, especially for outside live and e-commerce livestreaming, in which scenario broadcasters concern about the feedback from The audience. so end to end dalay is an key indicator to measure the CDN quality.

Get speaker information when watching live stream from video conference

In video conference senario, there would be more than 1 speakers, when the conference is recorded as an video file, we want to replay it to get some information, and to know who is talking, and other user information of the speaker, we want use the SEI to get the information, and render it synchronously with the video track.

Body or face information generated by media server

Some times the realtime body or face recongnization is not efficent for Web browsers, in particular for those old devices or mobile phones. So we need the server side to run the Algorithm and put the information into the SEI.

Proposed Solution

Get SEI information from web video, with loose accurate timestamp information so we can use it to sync with video.currentTime.

Can be used with Media Source Extensions™ (w3.org), parse the livestream by a JavaScript demuxer and a remuxer, to get the fmp4 stream with SEI data. And SEI event would be triggered when Web Video parsing the H.264 NALs.

Can be used with both WebCodecs API and Media Source Extensions™ (w3.org) , demux live stream and generate EncodedVideoChunks, and pass it to SourceBuffer directly.

We can also use WebCodecs to process coded video frames to get the SEI information, a relevant issue is: here

Proposed#1 New video event

A new SEI event structure is defined as follow:

interface SEIEvent: Event {
  type: 'sei';
  mediaTime: number;
  byteLength: number;
  copyTo: (dest: Uint8Array) => void 
};

we can receive this event when video element has parsed SEI information from video bitstream. So we can deal with the event by its attributes and functions.

  • mediaTime: The media presentation timestamp (PTS) in seconds of the frame presented (e.g. its timestamp on the video.currentTime timeline)
  • byteLength: Length of SEI payloaded data in byte
  • copyTo: copy SEI data to a typed array to process it.

Example

  let seiList = [];  

  function parseSEI () {
    // parse pre defined SEI structure
  }

  function renderSEI () {
    // render SEI content 
  }

  video.addEventListener('sei', (e) => {
    const seiData = new Uint8Array(e.byteLength);
    e.sei.copyTo(seiData);
    seiList.push({
      data: parseSEI(seiData),
      timestamp: e.timestamp
    })
  })

  video.addEventListener('timeupdate', e => {
    const curTime = e.target.currentTime;

    renderSEI(curTime, seiList);
  })

Work with video rvfc

The rvfc is a callback triggered when video element had rendered a frame. we can get know of the mediaTime of the frame, to render a accurate SEI data.

function draw(now, metadata) {
  const mediaTime = metadata.mediaTime;
  renderSEI(mediaTime, seiList);
  video.requestVideoFrameCallback(draw);  
}
video.requestVideoFrameCallback(draw);

Limitation

Not available when using with EME

If you are using EME for encrypted media source playback, SEI data may not be accessible, because EME module only emits decoded video frame, not AVC samples

Not suitable for high accuracy scenes

We get the sei timestamp when parsing the AVC bitstream, but when rendering, it's difficult to sync SEI with the exact frame. So if you want to render SEI information and concern about the synchronization between video frame and SEI, we suggest you to carry the SEI data only using Keyframe.

Performance issues

As some application inject SEI to every frame, event and callback is not a good way to access SEI, or it would block the javascript main thread.

Proposed#2 Use DataCue

DataCue is a proposed web API to allow support for timed metadata, i.e., metadata information that is synchronized to audio or video media.

We can use Datacue to handle the SEI information,When video parsed a SEI nal unit, it would generate a DataCue and add it to textTrack.

Example

Here is an example for application to deal with the SEI cue from a video element.

const cueEnterHandler = (event) => {
  const cue = event.target;
  console.log('cueEnter', cue.startTime, cue.endTime);
};

const cueExitHandler = (event) => {
  const cue = event.target;
  console.log('cueExit', cue.startTime, cue.endTime);
};

const addCueHandler = (event) => {
  const cue = event.cue;

  cue.onenter = cueEnterhandler;
  cue.onexit = cueExitHandler;
};

const video = document.getElementById('video');

video.textTracks.addEventListener('addtrack', (event) => {
  const textTrack = event.track;

  if (textTrack.kind === 'metadata') {
    textTrack.mode = 'hidden';

    textTrack.addEventListener('addcue', addCueHandler);
  }
});

When you want to know the all SEI cues in video timeline, you can use activeCues to access them:

const metadataTrack = getMetaDataTrack(video)
const activeCues = metadataTrack.activeCues;
const seiCues = activeCues.filter(cue => cue.type === 'org.mpeg.sei') // Type used here need to be updated

Limitation

A cue should be generated with a non-zero duration, when used for SEI information, the startTime and the endTime may be the same. As far as I have tested on Safari, that wouldn't be an error, but it's also need to be considered when other platform wants to implement DataCue API.

Recommended usage

What proposal to use is depends on the frequency of SEI message and and the accuracy you want.

If you want to use SEI message in a per-frame frequency, DataCue is good for you, and no need to listen to the addcue event, use textTrack.activeCues is good for you.

If you want to render SEI with the exact frame, maybe you can use the WebCodecs so you can control the rendering. Or you can use sei event with rvfc to reduce the margin of unsync to 1 or 2 frame. If you can accept the error of 200 to 300ms