pdf-image-to-text

A utility to convert a PDF made of images to text.

Requirements

ImageMagick (the library uses convert utility coming from ImageMagick lib)
an OCR (tesseract-ocr >= 3.01 with french lang)
Python 3

Install

(pip 9.0.1 works)

pip install pdf-image-to-text --process-dependency-links

Usage

Common use (output to a json file)

pdf-image-to-text.py <file.pdf> > <file.txt>

Generic use (output to standard output)

pdf-image-to-text.py <file.pdf>

License : BSD (3-Clause)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pdf-image-to-text.py		pdf-image-to-text.py
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pdf-image-to-text.py

pdf-image-to-text.py

setup.cfg

setup.cfg

setup.py

setup.py

Repository files navigation

pdf-image-to-text

Requirements

Install

Usage

About

Releases

Packages

Languages

License

ouhouhsami/pdf-image-to-text

Folders and files

Latest commit

History

Repository files navigation

pdf-image-to-text

Requirements

Install

Usage

About

Resources

License

Stars

Watchers

Forks

Languages