Skip to content

research: ebook format conversion engine (ebook-convert parity) #211

@forkwright

Description

@forkwright

Description

Harmonia has no format conversion capability. The format quality scoring (#165) decides which format is "best" for a given book but does not produce alternate formats: if a library holds epub-only and a user wants mobi for a device, no path exists.

Motivation

ebook-convert is calibre's other headline feature. For many users, "I need to convert this epub to mobi for my Kindle" or "this pdf needs to be an epub to be readable" is the only reason calibre stays installed. Without conversion, retiring calibre is incomplete — users fall back to calibre-convert CLI or external tools, and harmonia is not sovereign over the ebook pipeline.

Proposed solution

Two-phase path: v1 shells out, v2 evaluates native Rust.

Research questions

  1. Conversion matrix scope. Which pairs must work in v1? epub↔mobi, epub↔azw3, pdf→epub (best-effort), epub→pdf, docx/odt→epub. Which are realistic via open tools?
  2. Tool selection for v1 (shell-out).
    • calibre-convert itself — ironic but viable if treated as a headless dependency (calibre-bin, no GUI). Least risk, most coverage, best fidelity.
    • pandoc — native, Haskell, strong for docx/md/odt, weaker for epub/mobi.
    • kepubify — epub→kepub only. Useful adjacent.
    • Rust crates (rbook, epub-rs, mobi-rs) — immature.
      Build a coverage + quality matrix.
  3. Native Rust feasibility (v2). Can a combination of rbook + lopdf + mobi-rs cover the core pairs with acceptable fidelity? What's the LOC + maintenance cost vs the shell-out?
  4. Layout fidelity benchmarks. A test corpus of N books with known-good calibre outputs. Match calibre byte-for-byte is one bar; "close enough" needs a rubric.
  5. Trigger model. On-demand (UI), automatic on import (quality-driven), or both? Replace source or produce alongside?

Deliverables

  • Research note kanon/workflow/research/harmonia-R*-ebook-format-conversion.md
  • Phase line-item for v1 (shell-out) with acceptance criteria
  • Decision record on v2 native path (go / defer / drop)

Phase target

Phase 6 (polish). Not a blocker for Phase 3.5 canonical-storage completion.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions