Skip to content

crontab/scheduled task friendly script to crawl a directory of screenshots and create another directory of their OCR outputs

License

Notifications You must be signed in to change notification settings

labtec901/Auto-OCR-Screenshot-Directory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Auto-OCR-Screenshot-Directory

I have a huge directory of almost 20,000 screenshots I've taken over the years, and sometimes want to find a screenshots without knowing specifically when I took it. This script uses Google Tesseract to run OCR on all the screenshots in the directory structure, and generates a .txt file for each one, which can then be searched in windows explorer.

The Screenshot Directory

image

The OCR Text Directory

image

The script will automatically skip files it has OCR'd before, which makes it easy to set to run on a schedule to periodically update your screenshots folder. I have a scheduled task set up to run this script every night.

  • ocr_directory.py is the main file.
  • setup.py is a py2exe config file I wrote for generating an .exe to use with scheduled tasks/crontab.
  • Google Tesseract is required for this script to run. If you don't install it to the default location, change the path defined for the tesseract.exe file. I used the windows installer provided by UB Mannheim.

About

crontab/scheduled task friendly script to crawl a directory of screenshots and create another directory of their OCR outputs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages