Skip to content

mnott/smartocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SmartOCR

An intelligent bridge between Hazel and DEVONthink Pro for automated document OCR processing.

Overview

SmartOCR is an AppleScript that automates the process of running Optical Character Recognition (OCR) on PDF documents. It works by creating a seamless workflow between Hazel (file automation tool) and DEVONthink Pro (document management system).

The script handles:

  • Detecting when files are ready for processing (not locked by other applications)
  • Launching DEVONthink Pro if not already running
  • Synchronizing the database
  • Applying OCR smart rules to the documents
  • Providing detailed logs and status updates

Requirements

Installation

  1. Download the SmartOCR.scpt file from this repository
  2. Place it in a location accessible to Hazel (e.g., ~/Library/Scripts/Folder Action Scripts)
  3. Configure Hazel and DEVONthink Pro as detailed below

Configuration

Step 1: Configure DEVONthink Pro

  1. Open DEVONthink Pro
  2. Create a Smart Rule for OCR processing:
    • Go to Tools → Smart Rules
    • Click the + button to create a new Smart Rule
    • Set up the rule as shown in the screenshot:
      • Name: "OCR Ablegen" (or your preferred name)
      • Search in: Select your target folder (e.g., "02 - Ablegen")
      • Set conditions to match your needs (e.g., Extension is PDF, Kind is PDF/PS)
      • Add an OCR action
      • Optionally add a Move action to relocate processed files

DEVONthink Smart Rule Configuration

Step 2: Configure Hazel

  1. Open Hazel Preferences
  2. Create a new rule for the folder you want to monitor
  3. Set up the rule as shown in the screenshot:
    • Name: "OCR via DTP" (or your preferred name)
    • Set condition to match PDF files (Kind is PDF)
    • Add an action to Run AppleScript and select the SmartOCR.scpt file
    • No need to configure options unless you want to override default settings

Hazel Rule Configuration

Step 3: Customize Script Settings (Optional)

The script has several configurable parameters at the top:

property CONFIG : {
    databaseName:"Ablegen",       -- Name of your DEVONthink database
    smartRules:{"OCR Ablegen"},   -- Name(s) of your OCR smart rule(s)
    sourceFolder:"/Volumes/Daten/Cloud/02 - Ablegen", -- Path to your folder
    launchDelay:5,                -- Seconds to wait after launching DEVONthink
    syncDelay:3,                  -- Seconds to wait after database sync
    maxRetries:25,                -- Maximum retry attempts for locked files
    retryInterval:5               -- Seconds between retry attempts
}

Modify these values to match your specific setup.

How It Works

  1. Hazel detects new or modified PDF files in the monitored folder
  2. Hazel triggers the SmartOCR script for each matching file
  3. The script:
    • Checks if the file is locked by another process
    • Launches DEVONthink Pro if not already running
    • Updates the indexed folder in the DEVONthink database
    • Executes the specified smart rule(s) to perform OCR
  4. The OCR'd document is now searchable in DEVONthink

Troubleshooting

If the script isn't working as expected:

  1. Check logs: Run the script manually in Script Editor to see detailed logs
  2. Verify paths: Ensure the folder paths in the configuration match your system
  3. Smart rule names: Confirm the smart rule name in DEVONthink exactly matches what's in the script
  4. File access: Make sure Hazel has permission to access the files and folders

Batch Processing

You can also process multiple files at once by running the script directly in Script Editor. It will prompt you for a folder to process.

License

This script is released under the WTFPL License.

About

An intelligent bridge between Hazel and DEVONthink Pro for automated document OCR processing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors