scan-to-ebook

Turn scanned paper books into clean EPUBs — OCR by a vision LLM, assembled with pandoc.

scan-to-ebook converts photos or scans of a paper book (PNG / JPG / HEIC / HEIF), a PDF file, or a Google Drive file link — into an EPUB you can read in Books.app or on a Kindle. It runs each page through an OpenRouter vision model for OCR, cleans and merges the text with the Python standard library, and builds the EPUB with pandoc.

It was built for hard Vietnamese corpora and verified with zero OCR errors on an early-Quốc-ngữ journal (Nam Phong, 1917 — 75 pages) and on a 152-image iPhone-scanned book (119 HEIC + 33 JPG). See the samples to judge the output before running your own book.

Features

Two standalone pipelines:

OCR Prose Pipeline (scan2ebook all)

Pure-stdlib runtime — no third-party Python packages. pandoc and rclone are the only external tools, and only pandoc is required.
Book-aware OCR — a context pre-pass reads a handful of pages first to detect the title, proper names, spelling conventions, two-page spreads, and the color cover (auto-embedded), then feeds that back into every page's prompt for consistent results — and keeps cover/back-matter decoration out of the TOC.
Image, PDF, or Google Drive input — a folder of page images, a local PDF, or a publicly-shared Drive file link. PDFs (scanned or born-digital with a broken text layer) are rendered to per-page images and OCR'd.
Cross-platform HEIC/HEIF — iPhone photos auto-converted at import.
Resumable & cost-gated — already-OCR'd pages are skipped on re-run; a smoke run OCRs 10 pages and estimates full cost before you commit.
Agent-friendly CLI — doctor self-check, --dry-run, --json / --json-lines, and --yes for non-interactive runs.

Manga EPUB3 Fixed-Layout Pipeline (scan2ebook manga)

Zero-cost offline mode — EPUB3 pre-paginated RTL assembly with no OCR or pandoc, just images → EPUB. Optional auto-cover detection via LLM (strictly opt-in, requires OPENROUTER_API_KEY, costs ~$0.01/book).
Flexible input — folder of images, .mobi/.azw3, .cbz/.cbr, or Google Drive file/folder.
Series metadata — auto-derive title from series name + index if title omitted.

Quickstart

OCR Prose Pipeline:

brew install pandoc rclone                                   # pandoc required; rclone only for upload
git clone <repo> ~/workspace/scan-to-ebook && cd ~/workspace/scan-to-ebook
python3 -m venv .venv && .venv/bin/pip install -e .
cp .env.example .env && $EDITOR .env                         # set OPENROUTER_API_KEY=...
.venv/bin/scan2ebook doctor                                  # check python / pandoc / key
.venv/bin/scan2ebook init my-book --from <folder | book.pdf | drive-link>
.venv/bin/scan2ebook all my-book --smoke                     # OCR 10, preview, then confirm full run

Manga EPUB3 Pipeline:

.venv/bin/scan2ebook manga my-manga --from ./images --author "Author Name" --series "My Series" --series-index 1

The finished book is at ~/scan2ebook/<slug>/dist/<slug>.epub.

Full walkthroughs — preparing scans, editing context, manual fixes, upload — are in the User guide (includes both pipelines). Automating it in CI or from an agent? See For agents and automated pipelines.

OCR model & samples

The default OCR model is qwen/qwen3.7-plus, picked from a benchmark against google/gemini-3.1-pro-preview on a modern translated book and an old-spelling scan (Nam Phong, 1917). It reads every page (including dense old-text pages where Gemini blanked or truncated), preserves archaic spelling (nhời, nhơn, hyphenation), and costs ~$0.003/page — roughly 15× cheaper than Gemini with no read-failures. A typical 100-page book costs ≈ $0.30–0.40. The pipeline is model-agnostic — any OpenRouter vision model works via --model or OCR_MODEL; see Operations → Model swap for the full benchmark.

Three finished EPUBs below — each a 20-page clean-cache run on qwen/qwen3.7-plus. Download and open in Books.app / Kindle to judge quality before spending anything:

Sample (20 pages)	Book	Cost	Highlight
`tho-ngu-ngon-la-fontaine-20pages.epub`	Thơ Ngụ-Ngôn (La Fontaine, 1951)	$0.059	old-spelling verse, line breaks preserved
`ke-nam-vung-20pages.epub`	Kẻ Nằm Vùng (Viet Thanh Nguyen)	$0.076	dense modern prose, 60 footnotes
`truong-hoc-don-ba-20pages.epub`	Trường Học Đờn Bà (André Gide, 2008)	$0.050	blank divider pages correctly skipped

See samples/README.md for the full input→output chain. These files are short excerpts included only to demonstrate OCR quality — see the Legal note.

Documentation

Product overview — problem, audience, value, non-goals
Architecture — pipeline stages, data flow, design decisions
User guide — install, preparing scans, running the pipeline, editing
For agents — the non-interactive CLI path for CI / agents
Operations — cost, OpenRouter credit/key caps, rclone, swapping models, debugging

Legal

This tool is for personal use with books you physically own. Do not publish its output or share generated EPUBs beyond your own devices — copyright compliance is the user's responsibility. The sample files under samples/ are short excerpts included only to demonstrate OCR quality, not a redistribution of any book.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
assets		assets
docs		docs
plans/260608-1954-ocr-model-benchmark		plans/260608-1954-ocr-model-benchmark
samples		samples
src/scan_to_ebook		src/scan_to_ebook
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scan-to-ebook

Features

Quickstart

OCR model & samples

Documentation

Legal

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scan-to-ebook

Features

Quickstart

OCR model & samples

Documentation

Legal

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages