Add watched directory functionality #466
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is a followup to issue #465.
This PR will add watched directory functionality to OCRmyPDF. This PR includes the following changes:
watcher.py
: This file uses python's watchdog to watch a folder for new files, and then immediately send them to OCRmyPDF. Results are stored in a separate output folder. Optionally, output folder files may be organized by year and month. These are all configurable by environment variables.Dockerfile
: We now include the new requirements (listed inrequirements/watcher.txt
) in building the docker image. This docker image can now be launched directly with the watcher.docs
: I've updated the existing watched folders section to now reference the docker image with the watcher.py script. This includes examples of usage.Here is how you test this:
After this, drop a pdf into
./tmp/incoming/
. Output should look something like:Afterwards, verify in
./tmp/
that the output file was OCR'ed correctly.What do you think?