TTML in MP4 and MPEG-DASH guidelines
Latest commit c078760 Sep 10, 2016 @rbouqueau committed on GitHub typos

README.md

TTML in MP4 and MPEG-DASH guidelines

About

Authors:

Contributors:

Introduction

This document describes different workflows for the delivery of TTML content in MP4 and MPEG-DASH. It tries to provide hints on how to build such workflows based on existing tools. Its goal is to drive the development of TTML tools so that maximum interoperability is achieved.

History and profiles for TTML: EBU work, HbbTV 2.0, IMSC

About the technologies used in this article:

  • MP4 is a container format standardized by MPEG.
  • MPEG-DASH is an adaptive bitrate (ABR) streaming standard allowing delivery of content using conventional HTTP web servers also standardized by MPEG.
  • TTML (Timed Text Markup Language) is a subtitling format designed by W3C.
  • EBU-TT-D is a profile of TTML designed for live and on demand distribution of subtitles over IP based networks. EBU-TT-D restricts TTML. EBU-TT-D is designed by the EBU.
  • HbbTV 2.0 is a standard for hybrid digital TV delivery. HbbTV 2.0 mandates the implementation of EBU-TT-D, MP4 and MPEG-DASH.
  • IMSC is a pair of TTML profiles, one for text and one for images designed for subtitles and captions. The IMSC Text profile is a superset of EBU-TT-D.

Overview of the workflow

Generally speaking, we can assume that TTML workflows follow the architecture provided by the following image: Image of Workflow

In this workflow, the MP4 packager and DASH packager could be the same tool, as it is the case with MP4Box. Similarly, the DASH Access Engine and the MP4 Parser and the TTML Renderer could be the same tool, or separate tools such as respectively DASH.js, MP4Box.js and a TTML to HTML rendering tool.

Producing TTML content over MP4 and DASH

Given this workflow, there are several options to produce, package and deliver TTML content over MP4 and DASH. All options have in common that they try to minimize the quantity of downloaded data during the streaming session: this means avoiding downloading the same TTML content multiple times; and at the same time not requiring the download of the whole TTML content to start the session (especially relevant for live applications). Packaging the TTML content of the entire session as a single DASH segment is indeed not optimal. Packaging of the TTML requires the content to be spread over multiple DASH segments. This can be useful for seeking or for inserting ads between segments.

DASH segments are typically of constant duration and aligned across audio and video representations. This is not a strict requirement though. Since TTML content does not have a constant rate of change, segmentation of TTML content may lead to either variable duration segments or to data duplication across segments. Such duplication should be avoided and limited, possibly to the last sample of a segment containing some data that is present in the first sample of the next segment.

Note: preliminary figures show that subtitles account for 0.0004% to 0.07% of the whole "programme" bandwith. As a consequence we don't think debate about storage, network and distribution costs are sensible.

Need for a TTML Segmenter

In above workflow it may be difficult for tools that have only simple TTML capabilities, to process a TTML document for the purpose of creating small, self contained, non-timewise-overlapping TTML documents. The TTML Segmenter segments one (or more) TTML input document(s) into output TTML documents, each containing only the timed data needed for presentation within its segment time to avoid unnecessary data duplication

The example below shows a TTML document with successive p elements overlapping in time:

<?xml version="1.0" encoding="UTF-8"?>
<tt:tt  xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xmlns:tts="http://www.w3.org/ns/ttml#styling" xmlns:tt="http://www.w3.org/ns/ttml" ttp:timeBase="media" xml:lang="de" ttp:cellResolution="50 30">
    <tt:head>
        <tt:styling>
            <tt:style xml:id="textWhite" tts:color="#ffffff" tts:backgroundColor="#000000" tts:fontSize="160%" tts:fontFamily="monospace" />
            <tt:style xml:id="paragraphAlign" tts:textAlign="left" />
        </tt:styling>
        <tt:layout>
            <tt:region xml:id="top" tts:origin="10% 10%" tts:extent="80% 80%" tts:displayAlign="before"/>
            <tt:region xml:id="bottom" tts:origin="10% 10%" tts:extent="80% 80%" tts:displayAlign="after" />    
        </tt:layout>
    </tt:head>
    <tt:body>
        <tt:div>
            <tt:p xml:id="subtitle1" region="top" begin="00:00:00.000" end="00:00:04.000" style="paragraphAlign">
                <tt:span style="textWhite">Title at the top.</tt:span>
            </tt:p>
            <tt:p xml:id="subtitle2" region="bottom" begin="00:00:02.000" end="00:00:06.000" style="paragraphAlign">
                <tt:span style="textWhite">Text at the bottom.</tt:span>
            </tt:p>
        </tt:div>
    </tt:body>
</tt:tt>

Some workflows may decide that the TTML Authoring tool will post-process the TTML content to produce those non-timewise-overlapping TTML documents with a fine granularity to support the smallest segment duration and take care of timebase conversions. Such a post-processing TTML Segmenting tool would make the task of tools down the chain easier. Other workflows may decide to leave the segmentation to tools down the chain like the DASH packager because the segment duration is only known at that level in the workflow. Yet other workflows may use tools in-between to make the TTML authoring DASH-unaware and the DASH processing TTML-unaware. Depending on the design choice, the interface between the tools in the workflow will not be the same.

The TTML content may be segmented in the following ways:

  • A single TTML document is created for the entire streaming session. The content is marked as redundant in all but the first sample or segment (technical details below in this document).
  • The TTML Segmenter attempts to split the input documents using a strategy to choose the best times at which to begin and end each segment based on the content and other heuristics such as maximum segment size in data or in time. This strategy would typically try to avoid any content overlapping with other segments but this may not always be possible. In this case the output of the TTML segmenter is both a set of TTML documents and some kind of manifest indicating the times of each segment in turn, suitable for use within the packager.
  • The TTML Segmenter splits the input documents into predefined segment durations. For each segment it selects all of the content that overlaps in time with the period of interest, and then selects all referenced styles, regions etc in the head. Some segments may contain no content. Some adjacent segments may duplicate some or all of their content.

Interface between MP4 Parser and TTML Renderer

The MP4 standard assumes that only one sample at a time is active. This means that the MP4 parser will deliver one TTML document at a time to the TTML renderer and will assume that the previous TTML document will be replaced by the new one, and that it will be used for a given duration. This standard behavior thus constrains the upper part of the workflow, in the sense that samples cannot overlap in time and therefore the contained TTML document should not overlap in time. This improves interoperability by reducing the number of choices left.

Some optimizations at the MP4 level allow for the MP4 Parser to indicate that a new TTML document is the same as the previous one i.e. using the sample_has_redundancy field of the sdtp box. In this case the new document only extends the duration of the previous one. This can be useful when a TTML document has been duplicated between the last sample of a segment and the first sample of the next segment. We don't expect TTML samples to be big in size so repeating them seems acceptable.

Interface between TTML Authoring Tool and MP4 Packager

There are several possibilities here. To achieve interoperability, workflow designers have to choose a strategy and make sure the tools are the right ones. This depends on the TTML Authoring tool. This tool may produce:

  • A single TTML document valid for the entire streaming session. If so, either the MP4 packager will have to split the TTML document into multiple samples, or the DASH packager will have to split the sample into multiple samples and segments to avoid unnecessary downloads. This task can be complex for general TTML documents, but it can be simpler for some profiles, such as EBU-TT-D. Hence, the workflow architecture may differ depending on the type of TTML documents.
  • Multiple non-timewise-overlapping TTML documents. If the TTML authoring tool is aware of the target DASH segment duration, it should ideally provide one TTML document per segment. If the TTML authoring tool is not aware of the DASH delivery parameters, it should try to produce the TTML documents with the smallest duration that cannot be further split.

Examples

Examples can be found here.

Conclusion

If you have any feedback, remarks and questions. Please free to contact us directly or via our github project page. Thank you.