Skip to content
This repository has been archived by the owner. It is now read-only.
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
325 lines (259 sloc) 14.8 KB

Video Server Design

by Alexander Eichhorn (,

Stream Concept

Here's a try on a concise and practical definition of a stream. It's supposed to help us identifying how to handle and manage many streams originating from different sources over time.

A stream is

  • a single, unique, sequential, unidirectional, uninterrupted and long-running flow of data items
  • a stream is generated by a single entity (the streaming source)
  • a stream may be forwarded, processed, cached, and archived by multiple entities (streaming servers)
  • a stream is displayed live, time-shifted or on-demand by no, one, or multiple entities (streaming clients)

This definition implies in particular

  • the timeline of a stream is continuous and gap-free
  • an intended or unintended interruption during generation (capture) results in the end of a stream
  • restarting a data-flow after interruption generates a new stream
  • a data item (image, text, sound) can be part of multiple streams
  • timeline manipulations (trim, concat) create new streams
  • mixing of multiple streams creates new streams (such as mixing video with audio)


Required Server Features

  1. receive, forward and store live video from mobile devices and fixed cameras
  2. support parallel ingest sessions from multiple sources (each with individual start times)
  3. support dynamic sessions (initiated on request by a source)
  4. keep an infinite archive of live streams
  5. HTTPS mode
  6. support stream ingest modes
  • live: HTTP POST individual encoded frames
  • upload: HTTP POST chunked video files, continuations, byte-range
  1. support stream consumption modes
  • HLS: index and segment file download
  • Progressive Download: HTTP GET with byte range?
  • Download: HTTP GET?
  1. support seeking and replay of archived video
  2. generate, store and serve thumbnail images of each video stream
  3. generate, store and serve poster image(s) of each video stream
  4. authenticate streaming source and client
  5. connection throttling (connections per IP per time)
  6. bandwidth throttling (bytes per stream per time)
  7. upload continuation (byte-range)
  8. auto-compress text file types (such as m3u8, mpd)
  9. check video format conformance (TS headers, H264 NALU headers, SPS, PPS)
  10. websocket for uploading raw h264 video frames and TS packets (may avoid HTTP POST overhead)

Required Video Format Features

  • keyframes should be aligned with later segmentation boundaries to support pseudo-random access (2 sec?)
  • no audio necessary (for now, although the downstream pipeline would support it)
  • H264/AVC inside MPEG2-TS over HTTP(S) as transport
  • raw H264/AVC-TS (NALU stream) over HTTP(S) as transport
  • avoid re-encoding at server
  • container format may be repacked into at server

Ingest Format and Codec

The ingest format should be a H264 profile/level that is supported by mobile encoders and playable by Web browsers to avoid transcoding.

  • HTML5 spec defines the following H264 video and audio formats
    • avc1.42E01E, mp4a.40.2 (H.264 Constrained baseline profile video (main and extended video compatible) level 3.0 and Low-Complexity AAC audio)
    • avc1.4D401E, mp4a.40.2 H.264 Main profile video level 3.0 and Low-Complexity AAC audio)
    • avc1.58A01E, mp4a.40.2 (H.264 Extended profile (baseline-compatible) level 3.0 and Low-Complexity AAC audio)
    • avc1.64001E, mp4a.40.2 (H.264 'High' profile video (incompatible with main, baseline, or extended profiles) level 3.0 and Low-Complexity AAC)
  • Android support : avc1.42001E (H.264 Baseline Profile Level 3.0)
  • iOS support: Baseline, Main 3.0/3.1 (>=iOS 4.0), Baseline 4.1, Main 3.2/4.1 (>= iOS 5.0), High 4.0/4.1 (iOS >= 6.0)

Distribution Format

On the browser side it seems H264 is supported by all major browsers on all major desktop and mobile platforms (2013.03)

  • iOS >= H.264 Baseline 3.0, see Apple documentation
  • Android >= H.264 Baseline 3.0, see supported formats
  • Chrome >= H.264 Baseline 3.0
  • Firefox >= H.264 Baseline 3.0 (>=nightly 20)
  • Safari >= H.264 Baseline 3.0
  • IE >= H.264 Baseline 3.0 (>=9.0)

Video System Architecture

  +---------------+     +--------+      +------------+     +--------------+     +---------+
  |    Source     | --> | Ingest | -->  | Transcoder | --> | Distribution | --> |   Web   |
  | (Android/iOS) |  :  | Server |  :   |  (FFMpeg)  |  :  |    Server    |  :  | Browser |
  +---------------+  :  | (HTTP) |  :   +------------+  :  |    (HTTP)    |	 :  +---------+
                     :  +--------+  :                   :  +--------------+  :
                     :              :                   :                    :
              H264 in MPEG2-TS   Loopback       M3U8, TS, MP4, OGV,       HTTP GET
                  over HTTP    Pipe/UDP/RTP       WEBM, JPG Files         Download

Server Endpoints

POST /v1/ingest/:videoid/avc              # live ingest point for raw H264 AVC bitstreams
POST /v1/ingest/:videoid/ts               # live ingest point for MPEG2-TS streams
PUT  /v1/ingest/:videoid/chunk            # chunked file upload ingest point
GET  /v1/videos/:videoid/video.mp4        # mp4 (H264, AAC) file download
GET  /v1/videos/:videoid/video.ogv        # ogg (Theora, Vorbis) file download
GET  /v1/videos/:videoid/video.webm       # Webm (VP8, Vorbis) file download
GET  /v1/videos/:videoid/index.m3u8       # HLS index file download
GET  /v1/videos/:videoid/index.mpd        # DASH index file download
GET  /v1/videos/:videoid/hls/:segment     # HLS named segment download
GET  /v1/videos/:videoid/dash/:segment    # DASH named segment download
GET  /v1/videos/:videoid/thumb/:thumbid   # preview thumbnail image download
GET  /v1/videos/:videoid/poster/:posterid # video poster image download
GET  /v1/videos/:videoid/play             # HTML5 video tag player (test and demo only)

Server Ingest API

The API assumes the client obtained a valid video source token from a backend app, including

  • URI of the to be created video stream
  • timestamp to verify token freshness
  • consumer (IP address + port) to identify valid sources
  • cryptographically signed hash for authenticity and integrity verification

The ingest HTTP server accepts incoming video upload requests, checks source credentials, and starts an individual transcoder per incoming stream. The transcoder segments the stream and stores HLS compatible files plus other video formats at a location accessible by the distribution web server.

Supported formats for incoming live streams

  • transport protocol: HTTP(S)
  • packetisation format (signalled via appropriate MIME-TYPE)
    • raw H264/AVC Annex-B NALUs (using nalu start code)
    • H264/AVC packed in MPEG2-TS
  • encoding format: 640x360 H264/AVC baseline profile 3.0, 600kbit - 1200kbit
  • no audio
  • see Apple HLS Recommendations for details

Ingest HTTP Server Tasks

  • authenticate source
  • prepare transcoder pipe
  • configure and start ffmpeg transcoder
  • pipe raw data streamed from source into transcoder
  • gracefully handle source disconnection and cleanup (shutdown transcoder, remove pipe)

Transcoder Tasks

  • write HLS/DASH segment index file
  • get raw data from pipe
  • optionally transcode raw data (correct GOP/I-frame distance, H264 Profile/Level)
  • segment stream into files
  • update index file
  • push files to Distribution HTTP Server


  • URI, timestamp, consumer of the ingest resource plus application server signature
  • use HTTP auth headers for token passing
  • report auth error (stale token, invalid token)

Ingest Video Dataflow

  • Initialisation
    • authenticate source
    • create directory for video id
    • send HTTP 200 on success to make source start the stream
    • start transcoder
  • On incoming data
    • receive stream data (frames, etc.) from source on ingest endpoint
    • acknowledge reception HTTP 200
    • push raw data to transcoding pipe
  • End of Stream
    • EOS signalling with empty HTTP POST body from source
    • source can simply close connection (no handshake required)
    • on connection failure wait for source reconnection, and fail after a timeout

Server Distribution API

Segment files must be entirely written before they become accessible by clients. This adds a conceptual latency of one segment duration plus time to distribute segments to downstream webservers. For on-demand video files

  • segment based streaming
  • use HTTP server/caching infrastructure

Distribution Server Tasks

  • authenticate client
  • deliver stored index and media files
  • set proper mime types and caching headers

Server Rate Limiting

Ingest endpoints are rate limited to avoid uploading too much data. What's limited is the

  • total number of parallel ingest sessions (10)
  • cummulative ingest bandwidth (10000 kbit)
  • connection limit per source IP (100 each 15 min)
  • upload per source IP (128MB each 15min)

HTTP Response Headers

  • X-Rate-Limit-Limit: the rate limit ceiling for that given request
  • X-Rate-Limit-Remaining: the number of requests left for the 15 minute window
  • X-Rate-Limit-Reset: the remaining window before the rate limit resets in UTC epoch seconds

Performance Monitoring

Measure performance-related events observable at the server's ingest and distribution enpoints. This data is useful for debugging connection problems and to gain a insights into network, protocol and client performance. Anonymize clients (hash source IP+port).

Client phases

  • connecting and joining --> join time
  • playing --> play time
  • paused
  • buffering --> buffering time
  • stopped

Interesting performance metrics

  • client timing statistics (min, mean, median, max)
    • pre-roll time (first request send to play button press) [msec]
    • start-up time (play button press to playback start) [msec]
    • first stall/rebuffer event duration [msec]
    • cummulated stall/rebuffer durations [msec]
    • stall/rebuffer frequency (min, mean, median, max) [Hz]
    • rebuffering rate (total buffering time to total play time) [%]
    • bandwidth fluctuations (max-min bandwidth)/average bandwidth [%]
    • seek-to-play time (seek event to playback start) [msec]
    • live E2E latency (capture-to-display time) [msec]
    • requested video data (requests may be aborted or fail) [msec]
    • fetched/served video duration (data sent to client) [msec]
    • play time = completed/played video duration [msec]
    • join time = pre-roll + start-up
  • resource availability
    • download timing [mean, peak, min bandwidth]
  • client engagement (event counters, possibility to aggregate with stream popularity)
    • #seeks, #seek-to-live
    • #play, #pause,
    • #stalls
    • #views started
    • #views complete
  • stream popularity per stream id (possibility to split by region, device, browser)
    • live sessions served [total]
    • archived sessions served [total]
    • per stream id popularity [%]
    • per video format popularity [%]
  • server performance rolling over time (query periods: hour, day, month, etc.)
    • inbound live sessions [total]
    • inbound data rate [bps]
    • outbound live sessions [total]
    • outbound archived sessions [total]
    • outbound data rate [bps]
    • total bytes served [bytes]
    • total live time served [sec]
    • total archived time served [sec]

Source-related measures (ingest side)

  • source browser type/version
  • location (GeoIP)
  • provider (ISP, GeoIP)
  • session start time
  • session end time
  • request start time (HTTP POST and chunked transfer)
  • request end time
  • request volume

Client-related measures (download side)

  • browser type/version
  • browser API support
  • download events: [filename + offset + length] request start, request finish
  • player events: [type, time] seek, seek-to-live, start, play, pause, stall, complete
  • cummulated times: preroll, startup, first stall, sum stalls, seek-to-play, played

Server-related measures (distribution side)

  • request start [id, time, client(browser, location, provider), file(id, format), offset, size]
  • request end [id, time]


Copy Live Stream from Source to Segmented Files

This example assumes the incoming stream is already encoded at the correct H264 profile/level and that it's GOP structure (key frames) is aligned with the segmentation points. If not, segmentation will happen at key-frames resulting in segments of different duration.

ffmpeg -y -v error -i pipe:0 -f segment -codec copy -map 0 -segment_time 2 -segment_format \
mpegts -segment_list_flags +live -segment_list_type hls -individual_header_trailer 1 \
-segment_list ${index_path}/${video.uid}/index.m3u8 ${segment_path}/${video.uid}/%09d.ts
  • -v error log errors only
  • -i filename input (named pipe for live streaming from HTTP server)
  • -codec copy copy encoded audio and video tracks
  • -map 0 (in->out mapping) use the first input stream for the single output in our case
  • -f segment output format is HLS segmented files
  • -segment_time <sec> segment duration in seconds
  • -segment_list <m3u8-file> index file location
  • -segment_format mpegts use MPEG2TS as container format for segment files (required by Apple HLS)
  • -segment_list_flags +live create a live-friendly index file
  • -segment_list_size 0 how many segments to keep in the index file (0=all)
  • -segment_list type index file format (m3u8, csv, flat)
  • %09d.ts pattern for generation of segment filenames (counter is automatically increased when writing)

Transcode Live Stream from Source into distribution format

ffmpeg -y -v error -i pipe:0 -an -c:v libx264 -b:v 600k -vpre ipod320 \
-flags -global_header -map 0 -f segment -segment_time 2 -segment_format mpegts \
-segment_list ${index_path}/${video.uid}/index.m3u8 ${segment_path}/${video.uid}/%09d.ts
  • -vstats_file file dump video coding statistics to file.
  • -force_key_frames expr:gte(t,n_forced*2) force key frames every 2 seconds
  • -flags -global_header place global headers in extradata instead of every keyframe.

Output periodic thumbnail images

ffmpeg -y -v quiet -i pipe:0 -f image2 160x90 -vsync 1 -vf fps=fps=1/25 thumb/%09d.jpg
  • -i filename input (named pipe for live streaming from HTTP server)
  • -vsync 1 video sync method 'cfr' to duplicate and drop frames to achieve exactly the requested constant framerate.
  • -r rate video frame rate which may be a rational number, e.g. 0.5
  • -f image2 use image output (codec is guessed from filename extension)
  • -s res output resolution, e.g. 160x90
  • %09d.jpg filename template
You can’t perform that action at this time.