copyable-pdf

A lightweight, dependency-minimal bash script to convert scanned PDFs into searchable PDFs using Tesseract OCR.

copyable-pdf takes a PDF input, converts each page to an image, performs OCR (Optical Character Recognition) using Tesseract, and merges them back into a single, searchable PDF document.

Features

OCR: Make scanned documents searchable and copyable.
Parallel Processing: Uses multiple cores for faster OCR.
Dependency Check: Automatically checks for missing tools.
Customizable: Set language and DPI.

Installation

Via Homebrew

brew tap maxgfr/homebrew-tap
brew install copyable-pdf

Manual Installation

Clone the repository:

git clone https://github.com/maxgfr/copyable-pdf.git
cd copyable-pdf

Make the script executable:
```
chmod +x script.sh
```

(Optional) Move to your bin directory:

mv script.sh /usr/local/bin/copyable-pdf

Dependencies

Ensure you have the following installed:

tesseract: For OCR.
poppler: For pdftoppm and pdfunite.

On macOS (Homebrew):

brew install tesseract poppler

On Ubuntu/Debian:

sudo apt-get install tesseract-ocr poppler-utils

Usage

copyable-pdf [options] input.pdf

Options

Option	Description	Default
`-l, --lang <code>`	Language code (e.g., `fra`, `eng`)	`eng`
`-o, --output <path>`	Custom output file path	`input_ocr.pdf`
`-d, --dpi <num>`	DPI resolution for OCR	`300`
`-j, --jobs <num>`	Number of parallel jobs	Auto-detect
`-t, --text`	Generate an additional .txt file	`false`
`-m, --markdown`	Generate an additional .md file	`false`
`-k, --keep`	Keep temporary files (debug)	`false`
`-v, --verbose`	Verbose output	`false`
`-h, --help`	Show help message	-

Examples

Basic usage:

copyable-pdf document.pdf

Specify language (French) and higher DPI:

copyable-pdf -l fra -d 600 document.pdf

Explicitly set output filename:

copyable-pdf -o searchable_doc.pdf scan.pdf

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
.releaserc		.releaserc
.version-hook.sh		.version-hook.sh
LICENSE		LICENSE
README.md		README.md
script.sh		script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

copyable-pdf

Features

Installation

Via Homebrew

Manual Installation

Dependencies

Usage

Options

Examples

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

maxgfr/copyable-pdf

Folders and files

Latest commit

History

Repository files navigation

copyable-pdf

Features

Installation

Via Homebrew

Manual Installation

Dependencies

Usage

Options

Examples

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages