Skip to content

Takes a folder of images, binarizes and deskews them and converts them to a PDF for OCRing

Notifications You must be signed in to change notification settings

politikundbildung/process_images

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

process_images

Takes a folder of images, binarizes and deskews them and converts them to a PDF for OCRing

It uses the deskew function from wand and the iSauvola algorithm from doxapy.

Usage: python3 -i input_folder -o output_file.pdf

The resulting PDF can then be processed with OCRMyPDF. OCRMyPDF also has its own deskewing algorithm, but I've found that using it increases the file size considerably.

About

Takes a folder of images, binarizes and deskews them and converts them to a PDF for OCRing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages