Turn scanned paper books into clean EPUBs — OCR by a vision LLM, assembled with pandoc.
scan-to-ebook converts photos or scans of a paper book (PNG / JPG / HEIC / HEIF), a PDF file, or a Google Drive file link — into an EPUB you can read in Books.app or on a Kindle. It runs each page through an OpenRouter vision model for OCR, cleans and merges the text with the Python standard library, and builds the EPUB with pandoc.
It was built for hard Vietnamese corpora and verified with zero OCR errors on an early-Quốc-ngữ journal (Nam Phong, 1917 — 75 pages) and on a 152-image iPhone-scanned book (119 HEIC + 33 JPG). See the samples to judge the output before running your own book.
Two standalone pipelines:
OCR Prose Pipeline (scan2ebook all)
- Pure-stdlib runtime — no third-party Python packages. pandoc and rclone are the only external tools, and only pandoc is required.
- Book-aware OCR — a context pre-pass reads a handful of pages first to detect the title, proper names, spelling conventions, two-page spreads, and the color cover (auto-embedded), then feeds that back into every page's prompt for consistent results — and keeps cover/back-matter decoration out of the TOC.
- Image, PDF, or Google Drive input — a folder of page images, a local PDF, or a publicly-shared Drive file link. PDFs (scanned or born-digital with a broken text layer) are rendered to per-page images and OCR'd.
- Cross-platform HEIC/HEIF — iPhone photos auto-converted at import.
- Resumable & cost-gated — already-OCR'd pages are skipped on re-run; a smoke run OCRs 10 pages and estimates full cost before you commit.
- Agent-friendly CLI —
doctorself-check,--dry-run,--json/--json-lines, and--yesfor non-interactive runs.
Manga EPUB3 Fixed-Layout Pipeline (scan2ebook manga)
- Zero-cost offline mode — EPUB3 pre-paginated RTL assembly with no OCR or pandoc, just images → EPUB. Optional auto-cover detection via LLM (strictly opt-in, requires OPENROUTER_API_KEY, costs ~$0.01/book).
- Flexible input — folder of images, .mobi/.azw3, .cbz/.cbr, or Google Drive file/folder.
- Series metadata — auto-derive title from series name + index if title omitted.
OCR Prose Pipeline:
brew install pandoc rclone # pandoc required; rclone only for upload
git clone <repo> ~/workspace/scan-to-ebook && cd ~/workspace/scan-to-ebook
python3 -m venv .venv && .venv/bin/pip install -e .
cp .env.example .env && $EDITOR .env # set OPENROUTER_API_KEY=...
.venv/bin/scan2ebook doctor # check python / pandoc / key
.venv/bin/scan2ebook init my-book --from <folder | book.pdf | drive-link>
.venv/bin/scan2ebook all my-book --smoke # OCR 10, preview, then confirm full runManga EPUB3 Pipeline:
.venv/bin/scan2ebook manga my-manga --from ./images --author "Author Name" --series "My Series" --series-index 1The finished book is at ~/scan2ebook/<slug>/dist/<slug>.epub.
Full walkthroughs — preparing scans, editing context, manual fixes, upload — are in the User guide (includes both pipelines). Automating it in CI or from an agent? See For agents and automated pipelines.
The default OCR model is qwen/qwen3.7-plus, picked from a benchmark against
google/gemini-3.1-pro-preview on a modern translated book and an old-spelling
scan (Nam Phong, 1917). It reads every page (including dense old-text pages where
Gemini blanked or truncated), preserves archaic spelling (nhời, nhơn,
hyphenation), and costs ~$0.003/page — roughly 15× cheaper than Gemini with
no read-failures. A typical 100-page book costs ≈ $0.30–0.40. The pipeline is
model-agnostic — any OpenRouter vision model works via --model or OCR_MODEL;
see Operations → Model swap for the full benchmark.
Three finished EPUBs below — each a 20-page clean-cache run on qwen/qwen3.7-plus.
Download and open in Books.app / Kindle to judge quality before spending anything:
| Sample (20 pages) | Book | Cost | Highlight |
|---|---|---|---|
tho-ngu-ngon-la-fontaine-20pages.epub |
Thơ Ngụ-Ngôn (La Fontaine, 1951) | $0.059 | old-spelling verse, line breaks preserved |
ke-nam-vung-20pages.epub |
Kẻ Nằm Vùng (Viet Thanh Nguyen) | $0.076 | dense modern prose, 60 footnotes |
truong-hoc-don-ba-20pages.epub |
Trường Học Đờn Bà (André Gide, 2008) | $0.050 | blank divider pages correctly skipped |
See samples/README.md for the full input→output chain.
These files are short excerpts included only to demonstrate OCR quality — see the Legal note.
- Product overview — problem, audience, value, non-goals
- Architecture — pipeline stages, data flow, design decisions
- User guide — install, preparing scans, running the pipeline, editing
- For agents — the non-interactive CLI path for CI / agents
- Operations — cost, OpenRouter credit/key caps, rclone, swapping models, debugging
This tool is for personal use with books you physically own. Do not publish its
output or share generated EPUBs beyond your own devices — copyright compliance is
the user's responsibility. The sample files under samples/ are short
excerpts included only to demonstrate OCR quality, not a redistribution of any book.
MIT.
