Script to help convert from scanned PDF to DjVu+OCR. Dependencies: pdfsandwich tesseract pdf2djvu
Shell
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
LICENSE-MIT.md
README.md
benchmark.sh
pdf2djvu-ocr.sh

README.md

pdf2djvu-ocr

IMPORTANT (QUALITY) DISCLAIMER

This script is still young and the resulting .djvu files are not so good, often bigger than the original and with medium to low quality. I hope people will help me improve this. So before converting huge amount of documents do some performance/quality benchmarking.

Description

This Script follow the discussion on SuperUser to help convert from scanned PDF to DjVu+OCR.

Dependencies

Usage

The default behavior, i.e. call without arguments, will look for PDF files in the current working repository (glob: ./*.pdf) :

pdf2djvu-ocr

Otherwise you can specify a path

pdf2djvu-ocr /path/to/files/**/*.pdf