Which parsers should be implemented and how? #117

qnga · 2020-02-17T17:09:44Z

Currently supported in Kotlin:

Readium AudioBook "packaged" in a directory (not an archive and the path given to PublicationParser points to the directory containing the manifest)
Cbz
DiViNa "packaged" in a directory (like Audiobook)
Epub

Currently supported in Swift

Cbz
Epub
Readium Manifest (the path given to PublicationParser.parse is the manifest URL).
PDF and LCPDF

Remarks

Laurent has requested a W3C manifest parser (Create a W3C WebPub Manifest parser kotlin-toolkit#197)
Readium Audiobook has now a W3C competitor (https://w3c.github.io/audiobooks/)
Packaged publications can have remote resources, so the boundary between HTTPContainer and ArchiveContaineris confused.

Ideas

Container type (archive, directory, no container) should be decoupled from format type since packaging is not (always) uniquely defined by the format. Mickaël suggests to use composition pattern to handle access to resources via different methods.
Ideally, when a manifest URL is given to an app, the latter should offer the possibility to download all resources for offline reading, and so internally "package" the publication. So, either publication (re)making tools should be exposed to apps, or a packager should be available in the streamer or in shared.
PublicationParser, with the internal parser factory that I suggested for the streamer API (Clarify and refine the streamer API on mobile platforms #116), should check the file extension and maybe sniff the file to determine which container and which parser have to be used.
The following parsers should be available: W3CManifest, W3CAudiobook, ReadiumManifest (DiViNa, Readium Audiobook, Lcpdf), WildPackage (Cbz and zipped audiobooks without manifest, but maybe with a tracklist), Epub, Pdf
The following packaging schemes should be supported: Readium Packaged Publication, Epub, WildPackage, Pdf, Lightweight Packaging Format

The text was updated successfully, but these errors were encountered:

HadrienGardeur · 2020-02-17T21:35:59Z

Container type (archive, directory, no container) should be decoupled from format type since packaging is not (always) uniquely defined by the format. Mickaël suggests to use composition pattern to handle access to resources via different methods.

👍 to that.

Ideally, when a manifest URL is given to an app, the latter should offer the possibility to download all resources for offline reading, and so internally "package" the publication. So, either publication (re)making tools should be exposed to apps, or a packager should be available in the streamer or in shared.

It makes sense in the context of "Web Publications" (where we consume a manifest) but can be tricky for a number of reasons:

the publication can be updated
if the publication is "protected" we might need to store its ressources in a "protected" space instead of a generic container/package

I think that this is a "good to have" feature, but shouldn't be our priority right now.

The following parsers should be available: W3CManifest, W3CAudiobook, ReadiumManifest (DiViNa, Readium Audiobook, Lcpdf), WildPackage (Cbz and zipped audiobooks without manifest, but maybe with a tracklist), Epub, Pdf

W3CManifest and W3CAudiobook are the same thing since they both use the Publication Manifest.

The following packaging schemes should be supported: Readium Packaged Publication, Epub, WildPackage, Pdf, Lightweight Packaging Format

Do we really need package-level features for PDF? Are you talking more specifically about LCPDF?

It's worth pointing out that aside from PDF, these are all ZIP based containers which might share a lot of utilities between them.

mickael-menu · 2020-02-18T09:11:51Z

Container type (archive, directory, no container) should be decoupled from format type since packaging is not (always) uniquely defined by the format. Mickaël suggests to use composition pattern to handle access to resources via different methods.

I meant the composite pattern. For example, we could have some low-level Container implementations for HTTP, ZIP, file system, etc. and some higher level implementations to coalesce them depending on the needs. Such as a CompositeContainer which would take several sub-containers and route the read access depending on the requested resources.

So a CBZ parser would return directly a ZIPContainer, but for more complicated resources (webpub), it might return a CompositeContainer composed of a HTTPContainer and a FileContainer for some local resources. This could also be used to implement caching at the container level.

Using this, we could also expose "virtual" resources such as the serialized manifest.json, media-overlays, position-lists, etc. without having to add custom routes to the HTTP server (which wouldn't work for publications not served through the server anyway).

Thanks to the composite pattern, it's very easy to extend without modifying the existing classes, and the outside only see a single Container interface.

Ideally, when a manifest URL is given to an app, the latter should offer the possibility to download all resources for offline reading, and so internally "package" the publication. So, either publication (re)making tools should be exposed to apps, or a packager should be available in the streamer or in shared.

I'm not sure, but wouldn't this cause copyright issues? Or security ones if the web content was protected with authentification?

qnga · 2020-02-18T09:36:03Z

W3CManifest and W3CAudiobook are the same thing since they both use the Publication Manifest.

I was not sure Audiobook was a proper subset of Manifest both in definition and processing, but maybe it is.

Do we really need package-level features for PDF? Are you talking more specifically about LCPDF?

Shouldn't be simple PDF files supported? I guess it is very easy to implement.

I'm not sure, but wouldn't this cause copyright issues? Or security ones if the web content was protected with authentification?

I think many books would allow such a behaviour. Users would be responsible for using this tool legally, as any piece of software (including book reading). However, there are indeed some technical limitations and I agree this is not a priority. I just wanted to keep it in mind for design purposes.

mickael-menu · 2020-05-26T12:06:27Z

This is addressed in this new proposal: Streamer API.

mickael-menu mentioned this issue Feb 18, 2020

Clarify and refine the streamer API on mobile platforms #116

Closed

mickael-menu closed this as completed May 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which parsers should be implemented and how? #117

Which parsers should be implemented and how? #117

qnga commented Feb 17, 2020

HadrienGardeur commented Feb 17, 2020

mickael-menu commented Feb 18, 2020 •

edited

qnga commented Feb 18, 2020

mickael-menu commented May 26, 2020

Which parsers should be implemented and how? #117

Which parsers should be implemented and how? #117

Comments

qnga commented Feb 17, 2020

Currently supported in Kotlin:

Currently supported in Swift

Remarks

Ideas

HadrienGardeur commented Feb 17, 2020

mickael-menu commented Feb 18, 2020 • edited

qnga commented Feb 18, 2020

mickael-menu commented May 26, 2020

mickael-menu commented Feb 18, 2020 •

edited