EPUB 2 sunset

Dave Cramer edited this page Aug 13, 2018 · 32 revisions

EPUB 2 sunset

Latest Draft: 13 August 2018

Editors

Ric Wright
Luc Audrain
Dave Cramer
George Kerscher

Abstract

Although EPUB 3 has been a recommended specification since 2011, many publishers are still creating EPUB 2 files. This white paper describes the many advantages of moving to EPUB 3, and recommends that all publishers stop creating EPUB 2.

Introduction

EPUB has been around since 1999, although it was known as OEB back then. What we know as EPUB 2 was finalized in 2007. OEB and EPUB have always had the great strength of adapting the content to the reader, in marked contrast to PDF. And they were built on existing standards like XHTML 1.1 and CSS 2, although EPUB has always relied on "profiles" of these standards, often supporting only a subset.

EPUB 3 was designed in 2010 by IDPF with help from DAISY Consortium on modern and sustainable standards from the Open Web Platform (HTML5/CSS3) to address digital publishing needs in several areas, including:

  • Structured documents
  • Enhanced typography
  • Better user experience
  • Support of any language
  • Improved accessibility
  • Richer navigation
  • MathML support

So EPUB 3 was chartered in May of 2010 and became a recommendation in 2011. However, despite the specification becoming a recommendation years ago, a significant proportion of EPUBs being produced these days are still EPUB 2. The intent of this white paper is to explore why this might be and to show why a migration to EPUB 3 makes sense in so many ways.

Why Switch to EPUB 3?

Why should EPUB 2 producers switch to EPUB 3? The short answer is to enable delivery of richer, more accessible content. EPUB 3 is simply more capable, more expressive, more powerful. The following sections examine various use-cases, which show how EPUB 3, based on today's web technology, leaves EPUB 2 behind.

1. Document structure and semantics

EPUB 2 content documents are based on XHTML 1.0 (Transitional) which is essentially HTML 4 expressed as XML.

EPUB 3 is based on the XML serialization of HTML5. HTML5 has a much richer vocabulary for expressing the content of complex documents, including the section, header, figure, and aside elements. HTML5 also includes native support for audio and video. This has benefits for both creating and consuming content. Most publishers have production tools that work with structured documents. It's now easier to transform those documents into HTML5, and to maintain this critical semantic information for use in other products and systems. This also allows EPUB 3 reading systems to provide new features based on this richer markup, as we will see below.

Content creators can also leverage EPUB 3's Structural Semantics Vocabulary, which allows even richer tagging of document content than is built into HTML5.

2. Enhanced typography

As EPUB 2 has limited support for high quality in typography, eBook composition for even simple text has been downgraded to a level that is not acceptable in the printing industry. EPUB 3, on the other hand, has enabled high quality text rendering by leveraging the ongoing improvements of the CSS3 standard, including drop caps and image text wrapping.

3. Better user experience

The more semantic markup that HTML5 provides can be leveraged by EPUB 3 reading systems to create better user experiences. For example, using HTML5's <aside> element with EPUB's structural semantics vocabulary (epub:type), several reading systems have implemented pop-up footnotes, so that readers don't lose their place in the main text while navigation back and forth to notes. There is an excellent article by Liz Castro on how this can be done.

4. Support of any language

EPUB 2 cannot support right-to-left text, bidirectional (bidi) text, or vertical writing. EPUB 3 supports virtually every language in the world, including Arabic, Chinese, horizontal and vertical Japanese, and Hebrew. RTL and LTR languages can be mixed.

5. Accessibility Conformance Requirements

EPUB 2 has some basic accessibility features, but is not capable of full compliance with the latest Web accessibility guidelines (WCAG), which depend on the richer semantics offered by HTML5 and included in EPUB 3.

EPUB 3 now has a formal accessibility standard, which provides guidance for authors, and enables certification of quality for readers. The DAISY Consortium has even created an accessibility checker for EPUB 3.

EPUB 3 also supports media overlays, which provide synchronized audio narration, widely used for persons with print disabilities. In EPUB 3, these types of books are created by using Media Overlay Documents to describe the timing for the pre-recorded audio narration and how it relates to the EPUB Content Document markup. The file format for Media Overlays is defined as a subset of SMIL, a W3C recommendation for representing synchronized multimedia information in XML.

6. Rich Navigation

One of the criticisms of EPUB 2 was that its support for navigation was fairly primitive. The NCX provided the machine-readable table of contents (TOC). The guide element provided optional links to specific sections of an ebook. But the NCX could not be formatted, not even to include an italic word. Authors had to live with the NCX’s limitations, or else provide a second, redundant TOC in HTML.

EPUB 3 replaced the NCX with the HTML nav element, which can be displayed to the reader with the full power of HTML/CSS and processed by the reading system as the NCX was before.

EPUB 3 also supports more specialized nav elements. The page-list nav element maps print page numbers to locations in the EPUB, which is crucial for accessibility in contexts like classrooms. Landmarks nav elements can provide links to the fundamental structures of a publication.

7. MathML Support

EPUB 3 also introduced support for a subset of MathML, which enables authors to create EPUB documents with markup that is rendered as mathematical equations.

Sunsetting EPUB 2 production

Publishing Industry EPUB 2 Use Cases

1. Novels and Essays

Novels and essays are the perfect example of highly textual books where EPUB 3 is a must. By enabling high quality typography and layout, readers are provided with first-class text composition that brings the pleasure of reading to the forefront - the way it should be!

2. STM Publishing

With CSS3 layout techniques, text and graphics contents can be highly designed while still keeping responsive capabilities to adapt different screen sizes. Specialized content like mathematical equations benefit from the MathML support of EPUB 3.

Official Recommendations

Several official bodies endorse or even recommend EPUB 3 for textual works in digital form:

  • Library of Congress: "The Library of Congress Recommended Formats Statement (RFS) includes EPUB 3 as a preferred format for textual works in digital form. » in LC preferences

  • DAISY Consortium: in Baseline for Accessible EPUB 3

Appendices

Appendix A - EPUB 2 vs. EPUB 3 Features

The following table provides a summary of the key features added in EPUB 3. For more detailed information, please see the official IDPF document here as well as the links below.

Feature Comment
HTML5 support EPUB 3 still requires the XML serialization
SVG documents in the spine In EPUB 2, SVG documents had to be embedded in an XHTML page. However, support for this feature is limited.
Support for MathML XHTML Content Documents support embedded MATHML but limit its usage to a restricted subset of the full MathML markup language.
Fixed Layout
Navigation TOC is now required in HTML. A NCX is still permitted but a TOC is a requirement
Accessibility Most notably the inclusion of ARIA attributes for making dynamic content accessible
Linking The IDPF has established a registry of linking schemes. EPUBCFI is the first scheme added to the registry, and can be used for linking into, between and within Publications. Reading System support for this scheme is required.
Scripting EPUB 3 Reading Systems may optionally support scripting, which was explicitly discouraged in EPUB 2. Scripted content must be identified as such in the package manifest [Publications30]
Audio and video Support for audio and video embedded via the HTML5 audio and video elements is strongly encouraged. Reading Systems should support at least one of the MP4/H.264 and WebM/VP8 video codecs. For audio, MP3 support is required, MP4 support is recommended
Media overlays This specification, EPUB Media Overlays 3.0, defines a usage of [SMIL] (Synchronized Multimedia Integration Language), the Package Document, the EPUB® Style Sheet, and the EPUB Content Document for representation of audio synchronized with the EPUB Content Document.
Additional modules from CSS3 EPUB 3 defines a profile of CSS based on CSS 2.1 with added modules from CSS3, whereas EPUB 2 was based on a specific subset of CSS 2. Refer to EPUB Style Sheets for more information.
WOFF EPUB 3 now requires Reading Systems to support both the OpenType and WOFF font formats for embedded fonts in conjunction with the CSS @font-face rules.
Semantic Inflection Addition of the epub:type attribute to semantic inflection
Text to Speech Multiple features to assist Text-to-Speech (TTS) engines have been added. These include Package-level Pronunciation LexiconsSSML, PLS pronunciation lexicons and CSS3 Speech for enhanced text-to-speech playback
Reading System Object The epubReadingSystem object provides an interface through which a Scripted Content Document can query information about a user's Reading System. The object exposes properties of the Reading System (its name and version), and provides the hasFeature() method which can be invoked to determine which features it supports.

Features that have been removed or deprecated:

Feature Comment
DTBook Now deprecated in favor of HTML and CSS markup for audio accessibility
Out-of-Line XML Islands A controversial and ultimately unused feature
Triggers The trigger element provided declarative control of audio and video content (cf. EPUB 3.0.1 trigger element). Authors are advised to use the native controls provided by the [HTML] audio and video elements.
Bindings EPUB no longer supports the use of bindings in the Package Document to provide an alternative scripted fallback for foreign resources embedded in an object element. The [HTML] object element's intrinsic fallback mechanism (embedded content) can be used to provide a Core Media Type fallback.
Tours The Package Document schema no longer includes the tours element (which was deprecated in OPF 2.0.1) and dropped entirely in EPUB 3.
Filesystem Container OCF 3.0 [OCF3] only defines a single-file (ZIP-based) container, and no longer defines a "Filesystem Container" abstraction. This change, along with new restrictions in Publications 3.0 restricting references to remote resources means that the only instantiation of an EPUB Publication defined at this time is the EPUB ZIP Container, and that EPUB files must in general contain all constituent parts of the Publication, with certain well-defined exceptions. For more info, please see the discussion here.
Guide Use of the optional guide element in the Package Document has been deprecated in favor of the EPUB Navigation Document landmarks feature. Refer to EPUB Navigation Documents [ContentDocs30] for more information.
NCX The NCX has been superseded in favor of HTML-based (TOC.html) EPUB Navigation Documents.
2.0.1 meta element The meta element defined in [OPF2] has been obsoleted and replaced by the new meta element, but may be included as an optional repeatable child of the metadata element for forwards compatibility purposes.

Appendix B - EPUB 3 Basics

This section provides some details about the actual structure and markup of EPUB 2 and 3 files by taking a brief view of the actual markup.

Structure and Naming

The EPUB specs do not require particular folder structure or naming of files making up the EPUB, with the exception of the mimetype file and the contents of the META-INF folder. Although the specification has not changed significantly since EPUB 2, the recommended naming of files and folders is currently

Best practices would suggest the following structure,

mimetype
META-INF/container.xml
META-INF/encryption.xml
package.opf
EPUB/html
EPUB/css
EPUB/fonts/
EPUB/js
EPUB/images
EPUB/svg

But in practice this is largely up to the author.

The OPF Package File

This example of EPUB 2 and 3 package files is based on two versions of Alice in Wonderland, one EPUB 2, the other EPUB3. As Alice is a very simple document, the changes are not radical but are critical. Almost all the differences are in the package (OPF) document.

Here is the EPUB 2 version.

<?xml version="1.0"?>
<package xmlns="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0" unique-identifier="pubid">
  <metadata>
    <dc:title>Alice's Adventures in Wonderland</dc:title> 
    <dc:creator>Lewis Carroll</dc:creator> 
    <dc:date xmlns:opf="http://www.idpf.org/2007/opf" opf:event="creation">2013-08-29</dc:date>
    <dc:subject>fiction</dc:subject> 
    <dc:language>en-GB</dc:language> 
    <dc:coverage>England - 19th Century</dc:coverage> 
    <dc:rights>Public Domain</dc:rights> 
    <dc:publisher>D. Appleton and Co</dc:publisher> 
    <dc:identifier id="pubid">fab106a7-1f9f-4716-8c80-08932fe21b66</dc:identifier>
  </metadata>
  <manifest>
  	 <!-- fonts -->
    <item id="font0" href="fonts/MinionPro.otf" media-type="application/vnd.ms-opentype"/>
    ...
    <!-- navigation -->
    <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml" />
    <!-- body content -->
    <item id="titlepage" href="titlepage.xhtml" media-type="application/xhtml+xml"/>
    <item id="chapter01" href="chapter01.xhtml" media-type="application/xhtml+xml"/>
    ...
  	 <!-- styling -->  
    <item id="css" href="style.css" media-type="text/css"/>
    <!-- images -->
    <item id="img01a" href="images/alice01a.gif" media-type="image/gif"/>
    ...
  </manifest>
 
  <spine toc="ncx">
    <itemref idref="titlepage"/>
    <itemref idref="chapter01"/>
    ...
  </spine>
</package>

Alternatively, here is the EPUB 3 package:

<?xml version="1.0"?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="pub-id" version="3.0" >
  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"
      xmlns:dcterms="http://purl.org/dc/terms/"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:opf="http://www.idpf.org/2007/opf">
    <dc:title>Alice's Adventures in Wonderland</dc:title> 
    <dc:creator>Lewis Carroll</dc:creator> 
    <dc:date>1865-07-04</dc:date>
    <dc:subject>fiction</dc:subject>
    <dc:language>en-GB</dc:language> 
    <dc:coverage>England - 19th Century</dc:coverage> 
    <dc:rights>Public Domain</dc:rights> 
    <dc:publisher>D. Appleton and Co</dc:publisher> 
    <dc:identifier id="pub-id">urn:uuid:7408D53A-5383-40AA-8078-5256C872AE41</dc:identifier>
    <meta property="dcterms:modified">2016-03-14T11:23:26Z</meta>
    <meta name="cover" content="coverpage" />
  </metadata>
  <manifest>
  	 <!-- fonts -->
    <item id="font0" href="fonts/MinionPro.otf" media-type="application/vnd.ms-opentype"/>
    ...
    <!-- navigation -->
    <item id="nav" href="nav.xhtml" media-type="application/xhtml+xml" properties="nav"/>
    <!-- body content -->
    <item id="titlepage" href="titlepage.xhtml" media-type="application/xhtml+xml"/>
    <item id="chapter01" href="chapter01.xhtml" media-type="application/xhtml+xml"/>
    ...
  	 <!-- styling -->  
    <item id="css" href="style.css" media-type="text/css"/>
    <!-- images -->
    <item id="img01a" href="images/alice01a.gif" media-type="image/gif" properties="cover-image"/>
    <item id="img02a" href="images/alice02a.gif" media-type="image/gif"/>
    ...
  </manifest>
 
  <spine>
    <itemref idref="titlepage"/>
    <itemref idref="chapter01"/>
    ...
  </spine>
</package>

As one can see, the changes are not large, but there are a few key changes that MUST be present for the document to be a valid EPUB 3.

  • The version attribute MUST be 3.0.
  • The meta element with the property dcterms:modified must be present and have a valid date.
  • If the document has a cover page, it must be declared in the meta element with the name of "cover". There must be then a cover image in the manifest with the property "cover-image"
  • One of the content documents MUST be the EPUB 3 nav document, which declares the property "nav"
  • The metadata MUST include the dcterms:modified element which holds the date the last time the document was modified

Finally, the spine element in the package must NOT declare an ncx unless a NCX navigation file is present in ADDITION to the HTML nav file.

Navigation

The EPUB nav document is a very flexible entity. It can be very simple such as in this EPUB 3 Alice file:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
  <head>
    <meta charset="utf-8" />
      <link rel="stylesheet" type="text/css" href="style.css" />
  </head>
  <body class="reflow">
    <nav xmlns:epub="http://www.idpf.org/2007/ops" epub:type="toc" id="toc">
      <ol>
        <li class="toc" id="chapter01">
          <a href="chapter01.xhtml">I. Down the Rabbit-Hole</a>
        </li>
        <li class="toc" id="chapter02">
          <a href="chapter02.xhtml">II. The Pool of Tears</a>
        </li>
      </ol>
    </nav>
  </body>
</html>

Alternatively, the nav doc can leverage the new semantics introduced in EPUB 3 to produce rich, flexible navigation for the user.

Appendix C - Sample EPUB Files

This appendix provides a guide to a series of example files of EPUB 3. The files are intended to illustrate best-practices for a variety of common EPUB files. Each of the files listed below is available as both a fully built EPUB and as source code on github. Naturally, given the complexity and breadth of the EPUB spec, the possibilities are nearly endless. The intent of these examples is not to cover all possibilities but to provide guidance in best practices.

Name EPUB Sources Online Example Comment
Tiny-EPUB tiny3.epub tiny-epub3 tiny3.epub The simplest EPUB 3 possible
Tiny-FXL tiny3-FXL.epub tiny-fxl-epub3 tiny3-FXL.epub A minimalist fixed-layout EPUB 3
Tiny-SVG tiny3-SVG.epub tiny-svg-epub3 tiny3-SVG.epub A minimalist EPUB 3 with SVG
Tiny-RTL tiny3-RTL.epub tiny-rtl-epub3 tiny3-RTL.epub A minimalist EPUB 3 with RTL text
Alice3 alice3.epub alice3-source alice.epub A simple, basic epub
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.