-
Notifications
You must be signed in to change notification settings - Fork 358
Tags from consumer sub directories #50
Comments
That looks like a very neat idea! With this, we could have a TODO folder in the consumption directory and throw in documents we need to take care of. I should really make some guidelines on how to contribute and how the code is organised in general at some point. That's based on og paperless code, and quite a few things moved and had their responsibilities changed. master always reflects the latest release. Branch dev is for changes that will be in the next release. Use that. The feature-X branches are for experimental stuff.
|
Okay, cool. I'll come up with something. The remove empty directories thing is probably an edge case of my current setup where I consume files via an ocrmypdf docker container and "forward" them to paperless. In normal cases you will most likely want the folders to stick around anyways as you keep reusing them. |
Neat. What options do you use with ocrmypdf? Still considering what to support with paperless. Right now it uses --skip by default, so that OCR is only done when required. --redo and --force are configurable, as well as --pages and --output-type. |
I'm running with I did not check into if it is possible to make ocrmypdf fail on tesseract timeout. As it currently is you will end up with a not fully processed PDF which I find quite bad. |
Thank you. On a side note, tesseract uses twice as much cpu time when two languages are specified. |
Ouch. You know if that's linear? That would be crazy! |
Wow, good to know regarding the languages!! Regarding the main idea of this issue: That is a really neat idea. I might not be able to use it for semantic tags (I would have to setup different targets on the scanner, might become confusing at it's interface), but one could use it as a source-tag (e.g. scanner or samba which then write to different sub folders) |
See #23. |
The names of sub directories in the consumer directory will be added as tags for the document to be consumed. To enable this, set: PAPERLESS_CONSUMER_RECURSIVE=1 PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=1 Fixes #50
I made a small patch to allow me to set tags based on sub directories of the consumer directory (jayme-github/paperless#2) and I would like to know if you would accept that feature to your fork.
Looking at the feature-ocrmypdf branch you switched back to using INotify again, so my patch would also bring back the ability to run a recursive consumer.
Let me know what you think and I'll rebase against whatever branch makes sense.
The text was updated successfully, but these errors were encountered: