A library for inspecting Zhooks, Ochooks and EPUB ebooks, and converting between them.
Invented by Inventive Labs. Released under the MIT license.
More info: http://ochook.org/peregrin
Ruby, at least 1.8.x.
You must have ImageMagick installed — specifically, you must have the 'convert' utility provided by ImageMagick somewhere in your PATH.
Required Ruby gems:
Peregrin from the command-line
You can use Peregrin to inspect a Zhook, Ochook or EPUB file from the command-line. It will perform very basic validation of the file and output an analysis.
$ peregrin strunk.epub [EPUB] Cover images/cover.png Components  cover.xml title.xml about.xml main0.xml main1.xml main2.xml main3.xml main4.xml main5.xml main6.xml Resources  css/main.css images/cover.png Chapters - Title - About - Chapter 1 - Introductory - Chapter 2 - Elementary Rules of Usage - Chapter 3 - Elementary Principles of Composition - Chapter 4 - A Few Matters of Form - Chapter 5 - Words and Expressions Commonly Misused - Chapter 6 - Words Commonly Misspelled Properties  title: The Elements of Style identifier: urn:uuid:6f82990c-9394-11df-920d-001cc0a62c0b language: en creator: William Strunk Jr. subject: Non-Fiction
Note that file type detection is quite naive — it just uses the path extension, and if the extension is not .zhook or .epub, it assumes the path is an Ochook directory.
You can also use Peregrin to convert from one format to another. Just provide two paths to the utility; it will convert from the first to the second.
$ peregrin strunk.epub strunk.zhook [Zhook] Cover cover.png Components  index.html Resources  css/main.css cover.png Chapters - Title - About - Chapter 1 - Introductory - Chapter 2 - Elementary Rules of Usage - Chapter 3 - Elementary Principles of Composition - Chapter 4 - A Few Matters of Form - Chapter 5 - Words and Expressions Commonly Misused - Chapter 6 - Words Commonly Misspelled Properties  title: The Elements of Style identifier: urn:uuid:6f82990c-9394-11df-920d-001cc0a62c0b language: en creator: William Strunk Jr. subject: Non-Fiction
The three formats are represented in the Peregrin::Epub, Peregrin::Zhook and Peregrin::Ochook classes. Each format class responds to the following methods:
- read(path) - creates an instance of the class from the path
- new(book) - creates an instance of the class from a Peregrin::Book
Each instance of a format class responds to the following methods:
- to_book(options) - returns a Peregrin:Book object
Here's what a conversion routine might look like:
zhook = Peregrin::Zhook.read('foo.zhook') epub = Peregrin::Epub.new(zhook.to_book(:componentize => true)) epub.write('foo.epub')
Between the three supported formats, there is an abstracted concept of "book" data, which holds the following information:
- components - an array of Components that make up the linear content
- chapters - an array of Chapters (with title, src and children)
- properties - an array of Property metadata tuples (key/value + attributes)
- resources - an array of Resources contained in the ebook, other than components
- cover - the Resource that should be used as the cover of the ebook
There will probably be some changes to the shape of this data over the development of Peregrin, to ensure that the Book interchange object retains all relevant information about an ebook without lossiness. But for the moment, it's being kept as simple as possible.
All this rhyming on "ook" put me in mind of the Took family. There is no deeper meaning.
- Metadata files like OPF, OCX now first-class citizens called 'blueprints'
- Page progression direction from EPUB3 (@nono)
- Fixed-layout attributes for components (@nono)
- Basic EPUB3 and EPUB fixed-layout read support (@klacointe)