process_images

Takes a folder of images, binarizes and deskews them and converts them to a PDF for OCRing

It uses the deskew function from wand and the iSauvola algorithm from doxapy.

Usage: python3 -i input_folder -o output_file.pdf

The resulting PDF can then be processed with OCRMyPDF. OCRMyPDF also has its own deskewing algorithm, but I've found that using it increases the file size considerably.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
process_images.py		process_images.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

process_images

About

Releases

Packages

Languages

politikundbildung/process_images

Folders and files

Latest commit

History

Repository files navigation

process_images

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages