Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which parsers should be implemented and how? #117

Closed
qnga opened this issue Feb 17, 2020 · 4 comments
Closed

Which parsers should be implemented and how? #117

qnga opened this issue Feb 17, 2020 · 4 comments

Comments

@qnga
Copy link
Contributor

qnga commented Feb 17, 2020

Currently supported in Kotlin:

  • Readium AudioBook "packaged" in a directory (not an archive and the path given to PublicationParser points to the directory containing the manifest)
  • Cbz
  • DiViNa "packaged" in a directory (like Audiobook)
  • Epub

Currently supported in Swift

  • Cbz
  • Epub
  • Readium Manifest (the path given to PublicationParser.parse is the manifest URL).
  • PDF and LCPDF

Remarks

Ideas

  • Container type (archive, directory, no container) should be decoupled from format type since packaging is not (always) uniquely defined by the format. Mickaël suggests to use composition pattern to handle access to resources via different methods.
  • Ideally, when a manifest URL is given to an app, the latter should offer the possibility to download all resources for offline reading, and so internally "package" the publication. So, either publication (re)making tools should be exposed to apps, or a packager should be available in the streamer or in shared.
  • PublicationParser, with the internal parser factory that I suggested for the streamer API (Clarify and refine the streamer API on mobile platforms #116), should check the file extension and maybe sniff the file to determine which container and which parser have to be used.
  • The following parsers should be available: W3CManifest, W3CAudiobook, ReadiumManifest (DiViNa, Readium Audiobook, Lcpdf), WildPackage (Cbz and zipped audiobooks without manifest, but maybe with a tracklist), Epub, Pdf
  • The following packaging schemes should be supported: Readium Packaged Publication, Epub, WildPackage, Pdf, Lightweight Packaging Format
@HadrienGardeur
Copy link

Container type (archive, directory, no container) should be decoupled from format type since packaging is not (always) uniquely defined by the format. Mickaël suggests to use composition pattern to handle access to resources via different methods.

👍 to that.

Ideally, when a manifest URL is given to an app, the latter should offer the possibility to download all resources for offline reading, and so internally "package" the publication. So, either publication (re)making tools should be exposed to apps, or a packager should be available in the streamer or in shared.

It makes sense in the context of "Web Publications" (where we consume a manifest) but can be tricky for a number of reasons:

  • the publication can be updated
  • if the publication is "protected" we might need to store its ressources in a "protected" space instead of a generic container/package

I think that this is a "good to have" feature, but shouldn't be our priority right now.

The following parsers should be available: W3CManifest, W3CAudiobook, ReadiumManifest (DiViNa, Readium Audiobook, Lcpdf), WildPackage (Cbz and zipped audiobooks without manifest, but maybe with a tracklist), Epub, Pdf

W3CManifest and W3CAudiobook are the same thing since they both use the Publication Manifest.

The following packaging schemes should be supported: Readium Packaged Publication, Epub, WildPackage, Pdf, Lightweight Packaging Format

Do we really need package-level features for PDF? Are you talking more specifically about LCPDF?

It's worth pointing out that aside from PDF, these are all ZIP based containers which might share a lot of utilities between them.

@mickael-menu
Copy link
Member

mickael-menu commented Feb 18, 2020

Container type (archive, directory, no container) should be decoupled from format type since packaging is not (always) uniquely defined by the format. Mickaël suggests to use composition pattern to handle access to resources via different methods.

I meant the composite pattern. For example, we could have some low-level Container implementations for HTTP, ZIP, file system, etc. and some higher level implementations to coalesce them depending on the needs. Such as a CompositeContainer which would take several sub-containers and route the read access depending on the requested resources.

So a CBZ parser would return directly a ZIPContainer, but for more complicated resources (webpub), it might return a CompositeContainer composed of a HTTPContainer and a FileContainer for some local resources. This could also be used to implement caching at the container level.

Using this, we could also expose "virtual" resources such as the serialized manifest.json, media-overlays, position-lists, etc. without having to add custom routes to the HTTP server (which wouldn't work for publications not served through the server anyway).

Thanks to the composite pattern, it's very easy to extend without modifying the existing classes, and the outside only see a single Container interface.

Ideally, when a manifest URL is given to an app, the latter should offer the possibility to download all resources for offline reading, and so internally "package" the publication. So, either publication (re)making tools should be exposed to apps, or a packager should be available in the streamer or in shared.

I'm not sure, but wouldn't this cause copyright issues? Or security ones if the web content was protected with authentification?

@qnga
Copy link
Contributor Author

qnga commented Feb 18, 2020

W3CManifest and W3CAudiobook are the same thing since they both use the Publication Manifest.

I was not sure Audiobook was a proper subset of Manifest both in definition and processing, but maybe it is.

Do we really need package-level features for PDF? Are you talking more specifically about LCPDF?

Shouldn't be simple PDF files supported? I guess it is very easy to implement.

I'm not sure, but wouldn't this cause copyright issues? Or security ones if the web content was protected with authentification?

I think many books would allow such a behaviour. Users would be responsible for using this tool legally, as any piece of software (including book reading). However, there are indeed some technical limitations and I agree this is not a priority. I just wanted to keep it in mind for design purposes.

@mickael-menu
Copy link
Member

This is addressed in this new proposal: Streamer API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants