The content.opf File

Mukli Krisztián edited this page Sep 29, 2016 · 7 revisions

Overview

The content.opf file is the most important part of the EPUB package, because it defines the structure of the eBook and the metadata. It is also the file that will tend to cause EPUB validation errors for newcomers, so please be careful with the syntax and markup. The OPF file is an XML document, and it uses a defined set of tags to encode data (similar to HTML) specified by the IDPF.

The content.opf contains four sections as follows:

  • Metadata Section – This section contains data about the eBook such as the title, author, and product description. The eReading devices vary in how they utilize this metadata, but certain elements are required for a valid EPUB.
  • Manifest Section – This section is a list of all the content files, media, fonts, and stylesheets used in the eBook. The files can be listed in any order. However, you should not include a file in the Manifest Section that is not in the EPUB package. Also, you should not have undeclared files in the EPUB package that have not been declared in the Manifest Section.
  • Spine Section - This section contains linear instructions on how the eBook is ordered. The content files should be listed from top to bottom the same way you would read a book from left to right.
  • Guide Section – This section contains links to the cover, beginning of the eBook, and the HTML Table of Contents. eReading devices vary widely in how this information is interpreted and rendered.

The basic XML layout of the content.opf file for the EPUB 2.0.1 standard is as follows:

<?xml version="1.0" encoding="utf-8" ?>
<package version="2.0" xmlns="http://www.idpf.org/2007/opf" unique-identifier="BookId">
  <!-- Metadata section -->
  <!-- Manifest section -->
  <!-- Spine section -->
  <!-- Guide section -->
</package>

The first statement declares that this file is a valid XML file using UTF-8 encoding. The UTF-8 encoding allows for inserting characters outside the ASCII character set such as fancy quotes, letters with accents, and words from foreign languages. The package element contains all four sections of the content.opf. Now, let’s discuss the XML structure section by section.

Metadata Section

Metadata is essentially data about data, and it is not actual content. Different eReading devices will render the metadata declared in this section in different ways. As an example, the Kindle Fire will display the title specified in the Metadata Section on the top of the viewport inside the eBook on every single page.

As another example, the iBooks app will list the keywords from the Metadata Section when browsing the user’s eBook library, although the Kindle completely ignores them. It is difficult to predict how the eReading devices will render this metadata in the future, so it is best to be as accurate as possible. Metadata in the EPUB package also has the potential to aid in Search Engine Optimization (SEO) to help market your eBook. Although, it is unclear how that will work at this stage, because separate metadata entered when the eBook is uploaded to the various eBook stores seems to affect the algorithms rather than the metadata actually inside the EPUB package.

Important Note: Much of the metadata entered in the Metadata Section such as author, title, and description has to be re-entered when uploaded to the major eBook stores. This is frustrating, but it’s the way it is right now. However, you should always enter accurate metadata in the content.opf file.

It is essential that you have a unique identifier for your eBook (a series of digits and/or letters). A Universally Unique Identifier (UUID) is a randomly generated series of numbers and letters where there is a one in bazillion chance that the same one will be generated twice. You can obtain a UUID online at no cost using the BB eBooks Meta Pad. You can also use an ISBN number, if you prefer to go that route. However, this guide does not recommending utilizing the ISBN system for eBooks, since most eBook stores do not require them. Please note that you need one ISBN for the EPUB and one ISBN for the MOBI/KF8 file if you want to use the ISBN system, and it must not be the same ISBN as your print book edition.

Important Note: Besides the IDPF guidelines on the Metadata Section, Amazon, Barnes & Noble, and iBookstore have some additional requirements on how the cover image is referenced inside the EPUB package. Fortunately, they all have the same exact requirement. This is discussed below.

Per the IDPF specification, certain metadata is required inside every single eBook. The IDPF recognizes the open-access metadata standards from an organization called Dublin Core, which is based in Singapore. Unfortunately, trying to read through their requirements is rather challenging unless you received extensive wedgies in high school and/or have the intellectual aptitude of a Singaporean. That is why this section of the guide will go into great detail to help alleviate the confusion.

Tip: There are an online metadata-generator at BB Meta Pad to help you construct the metadata of your eBook and avoid errors.

Some sample XML of the Metadata Section for the EPUB 2.0.1 standard is as follows:

<metadata xmlns:opf="http://www.idpf.org/2007/opf" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <!-- Required metadata -->
    <dc:title>[TITLE]</dc:title>
    <dc:creator opf:file-as="[LASTNAME, NAME]" opf:role="aut">[NAME LASTNAME]</dc:creator>
    <dc:publisher>[PUBLISHER NAME]</dc:publisher>
    <dc:date opf:event="publication">[YYYY-MM-DD]</dc:date>
    <dc:identifier id="BookId" opf:scheme="[ISBN/UUID]">urn:isbn:[ISBN]/[UUID]</dc:identifier><!--Use the Same for the toc.ncx file -->
    <dc:language>[LANGUAGE CODE]</dc:language>
    <meta name="cover" content="[COVER_NAME].jpg" /> <!--Required for KindleGen-->
    <!-- Optional metadata -->
    <dc:subject>[LIST / OF / SUBJECTS]</dc:subject><!-- Recommended at least 7 keywords (in 7 different lines) -->
    <dc:description>[DESCRIPTION]</dc:description><!-- Don't use HTML -->
    <dc:rights>Copyright © [YEAR] [COPYRIGHT HOLDER]. All rights reserved.</dc:rights>
    <dc:type>Text</dc:type><!-- Do not change -->
    <dc:source>[SOURCE URL/ISBN/UUID]</dc:source>
    <dc:relation>[RELATION URL/ISBN/UUID]</dc:relation>
    <dc:coverage>[Worldwide/Territorial]</dc:coverage>   
    <dc:contributor opf:file-as="[LASTNAME, NAME]" opf:role="[MARC CODE]">[NAME LASTNAME]</dc:contributor>
</metadata>

Required Metadata Help

eBook’s Title: This is the content within the dc:title element. If you want to file your eBook as “Adventures of Huck Finn, The” rather than “The Adventures of Huck Finn”, you can do so by entering it that way. Do not use the file-as attribute in the dc:title markup.

eBook’s Author: This is the name of the primary author and that should appear within the dc:creator element. The opf:role="aut" attribute is not required by the IDPF, but it is “suggested.” It simply specifies that the creator is the author. You should always put the author’s name in this field and never the publishing company, editor, or someone else. If you want to file the author’s name with their last name first, simply write that in the file-as attribute within the dc:creator XML. The file-as attribute is not required and should be omitted if you want to file the eBook under the author’s first name.

eBook Publisher: Even if you are self-publishing, this metadata is required by the IDPF. If you do not have a publishing company, simply use the author’s name.

Publishing Date: This is the date the eBook is published. It should be in the YYYY-MM-DD format (e.g. 2012-07-06 for July 6th, 2012). If you make a modification to your eBook, you should always update this value.

ISBN or UUID: As mentioned previously ISBNs are expensive and most eBook stores do not require them. However, you do have to use some sort of unique identifier. A UUID is a good solution for those who don’t want to buy an ISBN. They are a combination of letters and numbers in a 8-4-4-12 format that are randomly generated. You can generate one at the BB eBooks website using the BB Meta Pad or at Fam Kruithof's site. Please note the difference in syntax between the ISBN and UUID and feel free to cut and paste the preceding example, replacing the unique identifier with your own. If you insist upon using an ISBN, please keep in mind that it must be different for both the EPUB and MOBI versions. Also, ensure that you use this exact same metadata in your toc.ncx Table of Contents or your EPUB may fail validation. Consult the NCX portion of this guide for more details.

Language: Enclosing the code for the eBook’s language within the dc:language XML is required within the OPF file. If you are submitting your eBook for sale, this will most likely be en-us for American English or en-gb for British English. Please visit the Wikipedia to view the codes for all the world’s languages.

Meta cover: Amazon, Barnes & Noble, and the iBookstore require a reference to the cover image within the Metadata Section. The My_Cover_ID refers to the id of the cover image that you will define in the Manifest Section.

Optional Metadata Help

Keywords: These are labels that apply to your eBook, similar to the way bloggers tag their posts. Every individual one gets placed inside a dc:subject element. So, if your steamy romance could be described with the keywords “Romance, Steamy and Hot, Erotic, Love, Women, Caribbean, Buns”, you would place each keyword within the XML element like <dc:subject>Romance</dc:subject>, <dc:subject>Steamy and Hot</dc:subject>, etc. You can list as many as you like, but seven is generally recommended. You can use BISAC Subject Heading List for inspirations

Description: This is typically the backjacket description or blurb of the eBook. You should only have one long paragraph and do not enter HTML here; this will cause your EPUB to fail validation. Fancy quotes are okay, but make sure your text editor can support UTF-8 encoding.

Rights/Copyright Information: This is a standard statement on the rights of the eBook such as “All Rights Reserved” or “Public Domain”. It is not widely recognized by eReading devices, but it does no harm to add this metadata. It is placed within the dc:rights XML markup.

Type: This is from the Dublin Core metadata specification for dc:type. For eBooks, this will always be “Text”. Please note that eReading devices rarely use this metadata.

Source: This is more Dublin Core metadata, which defines the dc:source XML markup as “A Reference to a resource from which the present resource is derived.” If your eBook is part of a series or a portion of another larger periodical, you may consider adding this metadata by including an ISBN or UUID for the relevant publication. You can also reference the print version of your book by its ISBN. If your eBook is a compilation of a blog post, you can consider placing the URL to your blog.

Relation: This is some more Dublin Core metadata that also seems rather ambiguous. The dc:relation XML markup is defined as “A reference to a related resource.” If your eBook is a spin-off from another publication, you may consider placing a URL, ISBN, or UUID. However, eReading devices rarely make use of this metadata.

Coverage: This defines the coverage of your copyright, which is typically “Worldwide” or “Territorial”. If you eBook is public domain, you can simply say “Public Domain.” dc:coverage is the applicable XML markup and you don’t typically see this in metadata. It is part of the Dublin Core specification and included here for completeness.

Extra Metadata Help

Contributors: With the dc:contributor element, you can add one or more entries of additional people who contributed to the eBook publication. This can include an illustrator, editor, and even the rubricator (we’re not sure what that guy does). You state what their contribution is with the attribute opf:role plus a three-letter code indicating the nature of the contribution. The example in the above XML is for an editor. The three-letter codes come from the Marc Code List for Relators, which is an initiative from the United States Library of Congress. Like the dc:creator element, you can add an attribute with the file-as attribute if you want to file the last name first.

Manifest Section

The Manifest Section is a listing of all HTML, CSS, media files, and other assets inside the EPUB package. It is divided into self-closing, individual XML elements denoted as item. For each item element there are three required attributes:

  • href – specifies the relative path from the content.opf location to your asset
  • id – a unique identifier in the content.opf file. Each id value should follow the same syntax conventions as id values in HTML (i.e. must start with a letter, must not have special characters, and it must be unique).
  • media-type – the MIME Type (or Internet Media Type) of the asset.

The media-type values are ways to specify file formats and they follow the same standard used by email clients and websites. The MIME Type (now called the “Internet Media Type”) is the convention utilized in the EPUB standard, and an exhaustive list of all MIME Types is here. Below is a list of MIME Types commonly used in eBook production for your convenience:

  • toc.ncx Meta TOC File - application/x-dtbncx+xml
  • .html Content files - application/xhtml+xml
  • .css Stylesheets – text/css
  • .jpg, .jpeg, or .jpe images – image/jpeg
  • .png images – image/png
  • .gif images – image/gif
  • .svg images – image/svg+xml
  • .ttf True Type Fonts – font/truetype
  • .otf OpenType Fonts – font/opentype
  • .mp3 Audio file – audio/mpeg
  • .mp4 Video File – video/mp4

An example listing of the Manifest Section is below. While the values of the id attributes are arbitrary, you should assign them based on some sort of naming convention so that your XML is human-readable. For example, it is much easier to assign your first HTML content file with an id of content001 rather than xsd324-sd2784f or something else that is meaningless to human eyes.

Important Note: Recall that everything in the content.opf file is case-sensitive.

<manifest>
    <item href="Images/cover.jpg" id="cover.jpg" media-type="image/jpeg" />
    <item href="Text/cover.xhtml" id="cover" media-type="application/xhtml+xml" />
    <item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml" />
    <item href="Styles/style.css" id="css" media-type="text/css" />
    <item href="Text/title-page.xhtml" id="title-page" media-type="application/xhtml+xml" />
    <item href="Text/colophon.xhtml" id="colophon" media-type="application/xhtml+xml" />
    <item href="Text/dedication.xhtml" id="dedication" media-type="application/xhtml+xml" />
    <item href="Text/epigraph.xhtml" id="epigraph" media-type="application/xhtml+xml" />
    <item href="Text/toc.xhtml" id="toc" media-type="application/xhtml+xml" />
    <item href="Text/foreword.xhtml" id="foreword" media-type="application/xhtml+xml" />
    <item href="Text/preface.xhtml" id="preface" media-type="application/xhtml+xml" />
    <item href="Text/acknowledgements.xhtml" id="acknowledgements" media-type="application/xhtml+xml" />
    <item href="Text/chapter01.xhtml" id="chapter01" media-type="application/xhtml+xml" />
    <item href="Text/chapter02.xhtml" id="chapter02" media-type="application/xhtml+xml" />
    <item href="Text/chapter03.xhtml" id="chapter03" media-type="application/xhtml+xml" />
    <item href="Text/glossary.xhtml" id="glossary" media-type="application/xhtml+xml" />
    <item href="Text/bibliography.xhtml" id="bibliography" media-type="application/xhtml+xml" />    
    <item href="Text/index.xhtml" id="index" media-type="application/xhtml+xml" />
    <item href="Text/loi.xhtml" id="loi" media-type="application/xhtml+xml" />
    <item href="Text/lot.xhtml" id="lot" media-type="application/xhtml+xml" />
    <item href="Text/notes.xhtml" id="notes" media-type="application/xhtml+xml" />
    <item href="Text/copyright-page.xhtml" id="copyright-page" media-type="application/xhtml+xml" />
</manifest>

This Manifest Section is fairly standard. It contains the HTML content broken into some separate pieces along with some additional media files. You will notice that the href attribute uses relative paths to declare where the files are located. Recall that this EPUB standard is using the directories Text, Styles, Images, and Fonts as sub-directories of the OEBPS folder. You can use an alternative directory structure if you wish, but try to maintain consistency for different eBook projects. Bad links inside the Manifest Section are a common source of errors and will result in failed EPUB validation.

You will also notice that the cover.xhtml file is declared in the manifest. Ensure that this is removed if your EPUB will be the source for the MOBI/KF8 compilation with KindleGen.

The cover.jpg file should have an id value of My_Cover_ID, which was originally defined in the Metadata Section under a meta element. If in metadata section was <meta name="cover" content="cover.jpg" />, then cover id in Manifest section will be id=cover.jpg. This ensures that Kindle, Nook, and iBooks users can access the cover directly from their devices, and it is the image displayed in the reader’s eBook library.

Tip: This guide uses an indentation scheme for XML, but you are not required to use one. However, it certainly helps with readability.

Important Note: It does not matter the order in which you specify the assets in the Manifest Section. The order of the eBook is defined in the Spine Section.

Spine Section

The Spine Section specifies the exact linear order of the eBook. Analogous to the “spine” of a print book: the first section listed is the start of the eBook, while the last section listed is the back of the eBook. This section is constructed entirely of self-closing itemref XML elements. The only required attribute is idref, which refers to the same value as the id attribute in the Manifest Section.

Below is a sample Spine Section that correlates to the Manifest Section example above:

<spine toc="ncx">
    <itemref idref="cover" /> <!-- Remove for Kindle -->
    <!-- Front matter -->
    <itemref idref="title-page" />
    <itemref idref="colophon" />
    <itemref idref="dedication" />
    <itemref idref="epigraph" />
    <itemref idref="toc" />
    <itemref idref="foreword" />
    <itemref idref="preface" />
    <itemref idref="acknowledgements" />
    <!-- Body matter -->
    <itemref idref="chapter01" />
    <itemref idref="chapter02" />
    <itemref idref="chapter03" />
    <!-- Back matter -->
    <itemref idref="glossary" />
    <itemref idref="bibliography" />
    <itemref idref="index" />
    <itemref idref="loi" />
    <itemref idref="lot" />
    <itemref idref="notes" />
    <itemref idref="copyright-page" />
</spine>

The opening toc="ncx" attribute for the spine element is required. This defines the NCX Table of Contents for the eBook, and the ncx value is the id declared for the toc.ncx file in the Manifest Section. As you can see from this example, when the reader goes to the beginning of the eBook, they will be at the cover.xhtml file. As the reader keeps paging down, they will cycle through title-page.html, colophon.html, and dedication.html, until finally they get to the last piece of content in copyright-page.xhtml. Most eReading devices insert an automatic page break as they jump from one itemref element to the other.

eReading devices utilize the Spine Section to build the reading order of the eBook, so making a mistake here can be rather embarrassing (e.g. making the chapters appear as if they are in the wrong order). Please exercise caution when constructing the Spine Section, and labeling your id attributes in a logical order in the Manifest Section can be extremely helpful.

Tip: While not required, you can add the attribute linear="no" to any of the itemref elements. This means that the section will be skipped if the reader is paging through the eBook. However, you can permit access to the content by creating a hyperlink. This may be a useful feature if you want to create an educational eBook with hidden answer keys. The IDPF standard has an example.

Guide Section

The Guide Section provides extra metadata that declares target locations for the extra buttons that are available on some eReaders such as “Cover” or “Beginning”. It was supposed to be a way for eBook designers to annotate where commonly used sections such as footnotes, the bibliography, and index were located in the HTML, so that the reader could have easy access to them on her device. A full list of the type attribute you can define in the Guide Section is available at the IDPF EPUB 2.0.1 standard. Unfortunately, the interpretation of the XML in the Guide Section varies widely from device to device. Due to the confusion, the IDPF is actually getting rid of this section for EPUB 3 in favor of a different standard.

Due to the poor adoption of the standards laid out by the IDPF for this section, this guide recommends using only three XML entries (and only two for Kindle-source EPUBs), but we specify every available type in the starter kit file:

<guide>
    <reference href="coverpage.html" type="cover" title="Cover" /> <!-- Remove for Kindle -->
    <reference href="content/htmltoc.html" type="toc" title="Table of Contents" />
    <reference href="content/content002.html" type="text" title="Beginning" />
</guide>

In this example of the Guide section, clicking “Cover” in the eReader would go to coverpage.html, clicking “Table of Contents” would go htmltoc.html, and clicking “Beginning” would go to the first part of the story after the front matter (i.e. content002.html). Also, for the Kindle, when the reader opens the MOBI/KF8 file for the first time, they will automatically start at where “Beginning” is defined.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.