bookvoice is a deterministic, pay-as-you-go CLI pipeline that converts text-based source documents (.pdf, .epub) into audiobook outputs.
- Convert a source document (
.pdf,.epub) into deterministic audio outputs (wav, optionalm4a/mp3). - Process the whole book or only selected chapters.
- Resume interrupted runs from a manifest.
- Keep reproducible artifacts for audit, replay, and troubleshooting.
What it is not:
- It is not a DRM bypass tool.
- It is not intended for copyrighted material without proper rights.
poetry installpoetry run bookvoice --helppoetry run bookvoice credentials --set-api-keypoetry run bookvoice build input.pdf --out out/
poetry run bookvoice build input.epub --out out/poetry run bookvoice build --config bookvoice.yamlpoetry run bookvoice build input.pdf --out out/
poetry run bookvoice build input.epub --out out/
poetry run bookvoice build --config bookvoice.yamlCommon options:
--config: load command defaults from YAML (input_pdfandoutput_dircan come from file).- Source input accepts
.pdfand.epub. --chapters: process only selected 1-based chapters (5,1,3,7,2-4,1,3-5).--model-translate,--model-rewrite,--model-tts,--tts-voice.--provider-translator,--provider-rewriter,--provider-tts(currentlyopenai).--prompt-api-key: hidden API-key prompt for this run.--interactive-provider-setup: prompts provider/model/voice values.--store-api-key/--no-store-api-key.--rewrite-bypass/--no-rewrite-bypass.--language: output language for translate/rewrite/tts (for examplecs,en).--output-format:wav,m4a,mp3, orm4a,mp3.--package-mode: legacy compatibility mode (none,aac,mp3,both).--package-chapters/--no-package-chapters.--package-chapter-numbering:sourceorsequential.--package-naming:deterministicorreader_friendly.--package-encoding-bitrate: explicit target bitrate (96k,128k,160k).--package-encoding-profile:balanced,voice, ormusic.--package-keep-merged/--no-package-keep-merged.
Compatibility note:
--package-moderemains supported for existing users and maps to the new output-format intent.- When no output-format flag is provided, deterministic WAV output remains the default behavior.
Runtime feedback during build:
- Deterministic progress lines per stage (
extract,clean,split,chunk,translate,rewrite,tts,merge,package,manifest). - Structured phase logs (
[phase]) for stage start/complete/failure. - Output is concise and CI-friendly, with no credential material in logs.
Example output excerpt:
[progress] command=build | 1/10 stage=extract
[phase] level=INFO stage=extract event=start
[phase] level=INFO stage=extract event=complete
...
[progress] command=build / 10/10 stage=manifest
[phase] level=INFO stage=manifest event=complete
poetry run bookvoice chapters-only input.pdf --out out/
poetry run bookvoice chapters-only input.epub --out out/
poetry run bookvoice chapters-only input.pdf --out out/ --chapters 1-3poetry run bookvoice translate-only input.pdf --out out/
poetry run bookvoice translate-only input.epub --out out/
poetry run bookvoice translate-only input.pdf --out out/ --chapters 2-4
poetry run bookvoice translate-only input.pdf --out out/ --reader-output-format epub
poetry run bookvoice translate-only input.pdf --out out/ --reader-output-format epub,pdf
poetry run bookvoice translate-only --config bookvoice.yamlBehavior:
- Runs stages
extract,clean,split,chunk,translate,manifest. - Persists deterministic text artifacts (
raw,clean,chapters,chunks,translations,translated_document) andrun_manifest.json. - Does not execute
rewrite,tts, ormerge. - Optional
--reader-output-formatcontract acceptsnone,epub,pdf, orepub,pdf. epubreader export is emitted deterministically fromtext/translated_document.jsonto<run-root>/reader/<source>.<lang>.<scope>.translated.epub.pdfreader export is emitted deterministically fromtext/translated_document.jsonto<run-root>/reader/<source>.<lang>.<scope>.translated.pdf.- Supports the same provider/model/runtime precedence and secure credential flow as
build.
poetry run bookvoice tts-only out/run-<id>/run_manifest.jsonBehavior:
- Runs only
tts,merge,package, andmanifest. - Requires valid
text/rewrites.jsonandtext/chunks.jsonartifacts from a prior run. - Preserves deterministic part naming and artifact schemas used by full
build/resume. - Reapplies deterministic merged-output postprocessing (silence trim + peak normalization) and WAV metadata tagging on every replay.
- Does not execute upstream text stages (
extractthroughrewrite).
poetry run bookvoice list-chapters input.pdf
poetry run bookvoice list-chapters input.epub
poetry run bookvoice list-chapters --chapters-artifact out/run-*/text/chapters.jsonpoetry run bookvoice resume out/run-<id>/run_manifest.jsonpoetry run bookvoice credentials
poetry run bookvoice credentials --set-api-key
poetry run bookvoice credentials --clear-api-keyDefault models/voice:
- Translate model:
gpt-4.1-mini - Rewrite model:
gpt-4.1-mini - TTS model:
gpt-4o-mini-tts - TTS voice:
echo
Resolution precedence:
- Runtime values (
provider_*,model_*,tts_voice,api_key,rewrite_bypass):CLI explicit input > secure credential storage > environment > config/defaults - Command fields (
input_path/input_pdf,output_dir,chapter_selection): explicit CLI option/argument overrides--configvalues.
Environment keys:
OPENAI_API_KEYBOOKVOICE_PROVIDER_TRANSLATORBOOKVOICE_PROVIDER_REWRITERBOOKVOICE_PROVIDER_TTSBOOKVOICE_MODEL_TRANSLATEBOOKVOICE_MODEL_REWRITEBOOKVOICE_MODEL_TTSBOOKVOICE_TTS_VOICEBOOKVOICE_REWRITE_BYPASSBOOKVOICE_LANGUAGEBOOKVOICE_OUTPUT_FORMATBOOKVOICE_PACKAGE_MODE(legacy compatibility)BOOKVOICE_PACKAGE_CHAPTERSBOOKVOICE_PACKAGE_CHAPTER_NUMBERINGBOOKVOICE_PACKAGE_KEEP_MERGEDBOOKVOICE_PACKAGE_NAMING_MODEBOOKVOICE_PACKAGE_ENCODING_BITRATEBOOKVOICE_PACKAGE_ENCODING_PROFILEBOOKVOICE_READER_OUTPUT_FORMAT(none,epub,pdf,epub,pdf)
ConfigLoader.from_yaml supported keys:
input_path(required, backward-compatible alias:input_pdf)output_dir(required)languageprovider_translatorprovider_rewriterprovider_ttsmodel_translatemodel_rewritemodel_ttstts_voicerewrite_bypass(true/false,1/0,yes/no)api_keychunk_size_chars(positive integer)chapter_selectionresume(true/false,1/0,yes/no)output_format(wav,m4a,mp3,m4a,mp3)package_mode(legacy compatibility:none,aac,mp3,both)package_chapters(true/false,1/0,yes/no)package_chapter_numbering(source/sequential)package_keep_merged(true/false,1/0,yes/no)package_naming(deterministic/reader_friendly)package_encoding_bitrate(for example128k)package_encoding_profile(balanced/voice/music)reader_output_format(none,epub,pdf,epub,pdf)extra(string-to-string mapping)
Example bookvoice.yaml:
input_pdf: tests/files/canonical_synthetic_fixture.pdf
output_dir: out
provider_translator: openai
provider_rewriter: openai
provider_tts: openai
model_translate: gpt-4.1-mini
model_rewrite: gpt-4.1-mini
model_tts: gpt-4o-mini-tts
tts_voice: echo
rewrite_bypass: false
chapter_selection: 1-3
language: cs
output_format: m4a,mp3
package_chapter_numbering: sequential
package_naming: deterministic
package_encoding_profile: voice
package_keep_merged: true
reader_output_format: epub,pdfFor deterministic local verification, prefer the repository-owned synthetic PDF fixture
at tests/files/canonical_synthetic_fixture.pdf.
An EPUB counterpart is also available at tests/files/canonical_synthetic_fixture.epub.
ConfigLoader.from_env supported keys:
BOOKVOICE_INPUT_PATH(required, backward-compatible alias:BOOKVOICE_INPUT_PDF)BOOKVOICE_OUTPUT_DIRBOOKVOICE_LANGUAGEBOOKVOICE_CHUNK_SIZE_CHARSBOOKVOICE_CHAPTER_SELECTIONBOOKVOICE_RESUMEBOOKVOICE_PROVIDER_TRANSLATORBOOKVOICE_PROVIDER_REWRITERBOOKVOICE_PROVIDER_TTSBOOKVOICE_MODEL_TRANSLATEBOOKVOICE_MODEL_REWRITEBOOKVOICE_MODEL_TTSBOOKVOICE_TTS_VOICEBOOKVOICE_REWRITE_BYPASSBOOKVOICE_OUTPUT_FORMATBOOKVOICE_PACKAGE_MODEBOOKVOICE_PACKAGE_CHAPTERSBOOKVOICE_PACKAGE_CHAPTER_NUMBERINGBOOKVOICE_PACKAGE_KEEP_MERGEDBOOKVOICE_PACKAGE_NAMING_MODEBOOKVOICE_PACKAGE_ENCODING_BITRATEBOOKVOICE_PACKAGE_ENCODING_PROFILEBOOKVOICE_READER_OUTPUT_FORMATOPENAI_API_KEY
Each build creates a deterministic run directory:
out/run-<hash>/text/raw.txtout/run-<hash>/text/clean.txtout/run-<hash>/text/chapters.jsonout/run-<hash>/text/chunks.jsonout/run-<hash>/text/translations.jsonout/run-<hash>/text/translated_document.jsonout/run-<hash>/text/rewrites.jsonout/run-<hash>/audio/chunks/001_01_<title-slug>.wavout/run-<hash>/audio/parts.jsonout/run-<hash>/audio/bookvoice_merged.wav(or chapter-scope variant)out/run-<hash>/audio/package/chapter_<NNN>_<title-slug>.m4a|.mp3(when enabled)out/run-<hash>/audio/packaged.jsonout/run-<hash>/run_manifest.json
audio/parts.json includes deterministic chapter_index, part_index, part_id,
final emitted filename, source source_order_indices, and per-part
provider/model/voice metadata.
audio/bookvoice_merged*.wav is postprocessed deterministically in-place:
leading/trailing silence trimming followed by peak normalization to 95%.
Merged WAV outputs include RIFF LIST/INFO tags: INAM (title), ISBJ
(chapter/part context), and ICMT (source identifier).
When packaging is enabled, chapter-split AAC (.m4a) and/or MP3 outputs are emitted
under audio/package/ with configurable naming (deterministic or reader_friendly).
Chapter numbering can follow source indices or sequential ordering.
Packaged chapter metadata tags are written deterministically for both formats:
- Canonical payload:
title,album,track,chapter_context,source_identifier. - MP3 (ID3):
title,album,track,comment(chapter context),publisher(source/run). - M4A (MP4 atoms):
title,album,track,description(chapter context),comment(source/run). Player support fordescription/publishermay vary by platform;title/album/trackremain primary.run_manifest.jsonextraincludes compact chapter/part mapping and referenced structure indices for resume/rebuild stability, packaging intent metadata, resolved output language (output_language), packaged-tag summary metadata (packaging_tags_*), emitted packaged artifact references (packaging_emitted_*), and translate-only reader-export contract metadata (reader_export_*planned output keys) plus the canonical translated-document artifact path (translated_document).text/chunks.jsonincludes planner metadata undermetadata.plannerand chunk-levelboundary_strategymetadata.
Filename examples:
001_01_chapter-one.wav007_03_cesky-nazev-uvod.wav(non-ASCII title normalized to ASCII slug)
build failed at stage extract: verify input PDF path andpdftotextavailability.build failed at stage translate/rewrite/tts: verify API key and model/provider config.build failed at stage credentials: configure a working keyring backend or use--no-store-api-key.list-chapters failed at stage chapters-artifact: verify artifact path points to validtext/chapters.json.resume failed at stage resume-manifest: manifest missing or malformed JSON.resume failed at stage resume-artifacts: artifact JSON is missing/corrupted; remove broken artifact and rerunresume.
If you install Bookvoice on Windows from GitHub Releases (portable ZIP or installer), use:
docs/WINDOWS_USER_GUIDE.md
Implemented today:
- Real OpenAI translation (
chat/completions). - Real OpenAI rewrite-for-audio (
chat/completions), plus--rewrite-bypass. - Real OpenAI TTS per segmented part (
audio/speech) with deterministic<chapter>_<part>_<title-slug>.wavnaming. - Structure-aware segment planning with chapter-local merging and paragraph-preferred boundaries (
chunk_size_charsdefault1800, planner hard ceiling9300chars). - Resumable artifact-driven pipeline with run manifest and cost summary.
- Chapter listing and chapter-scope processing (
--chapters). - Secure API-key storage via
keyring(bookvoice credentials).
Still intentionally limited:
- Packaging and metadata tagging rely on local
ffmpegruntime and codec/container support.
- Fallback chapter chunking now targets sentence-complete boundaries and prefers
.before!/?. - If no boundary exists near the target size, the chunker extends forward to the next sentence boundary within a bounded safety margin.
- If no sentence boundary exists within that margin (for example very long punctuation-free text), the chunker performs a deterministic forced split and marks the chunk with
boundary_strategy = "forced_split_no_sentence_boundary". - During cleanup, decorative drop-cap initials split across lines (for example
E+VERY) are conservatively merged when safe guards pass. - When deterministic splitting still lands mid-sentence, chunk-boundary repair can carry the minimum continuation prefix from the next chunk to complete the sentence.
Current limitation:
- Drop-cap merging is conservative by design and can miss borderline layouts to avoid incorrect merges.
PDF Input
|
v
[Extract Text] --> [Clean/Normalize] --> [Split Chapters] --> [Plan Segments + Chunk]
| |
| v
| [Translate]
| |
| v
| [Rewrite for Audio]
| |
v v
Artifacts + Cache <-------------------------------- [TTS Synthesis]
|
v
[Postprocess + Merge + Tags]
|
v
[Optional Chapter Packaging: M4A/MP3]
|
v
Run Manifest + Outputs
Build and smoke-check a self-contained bookvoice.exe using PyInstaller:
poetry run python -m pip install pyinstaller
poetry run pyinstaller --noconfirm --clean packaging/windows/pyinstaller/bookvoice.spec --distpath dist/windows/pyinstaller --workpath build/windows/pyinstaller
./dist/windows/pyinstaller/bookvoice/bookvoice.exe --helpSee detailed maintainer instructions in docs/WINDOWS_PYINSTALLER.md.
For Inno Setup installer packaging, see docs/WINDOWS_INNO_SETUP.md.
poetry run pytestbookvoice/: core package modules.bookvoice/models/: shared typed dataclasses.bookvoice/io|text|llm|tts|audio|telemetry/: pipeline stage modules.docs/: architecture and module overviews.
docs/ARCHITECTURE.md: data model and orchestration strategy.docs/ARTIFACTS.md: generated run artifacts and file formats.docs/MODULES.md: module responsibilities and dependencies.docs/ROADMAP.md: phased implementation plan and milestones.
MIT. See LICENSE.