Skip to content
This repository has been archived by the owner on Feb 16, 2023. It is now read-only.

Create tags from sub directories #69

Merged
merged 1 commit into from
Nov 30, 2020

Conversation

jayme-github
Copy link
Contributor

@jayme-github jayme-github commented Nov 29, 2020

The names of sub directories in the consumer directory will be added as tags for the document to be consumed.
To enable this, set:
PAPERLESS_CONSUMER_RECURSIVE=1
PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=1

Fixes #50


While this basically works, I had a bad time trying to test this. Debugging tests is a bit hard in general because of the async nature but in particular this always gives me table locking errors (for document_tags). While that might make sense, I also tried to not create tags in the document_consumer (just Tags.objects.get()) which seems to still lock the table.

I'm not an expert at Django ORM etc. et all, so maybe you have idea on how to work around this @jonaswinkler

@jonaswinkler
Copy link
Owner

Well, the test case clearly complains about 'Space Tag' not being found, so something might be amiss here.

@jayme-github
Copy link
Contributor Author

Well, the test case clearly complains about 'Space Tag' not being found, so something might be amiss here.

Exactly what I said, see: https://travis-ci.org/github/jonaswinkler/paperless-ng/jobs/746576607#L529

@jonaswinkler
Copy link
Owner

OH. okay.

@jonaswinkler
Copy link
Owner

Swap out TestCase base class for TransactionTestCase, which does not put each test case in a single transaction.

@jayme-github
Copy link
Contributor Author

Swap out TestCase base class for TransactionTestCase, which does not put each test case in a single transaction.

Sweet, thanks!

@jonaswinkler
Copy link
Owner

any reason you switched to pyinotify? this is old and hasn't seen changes since 5 years. I just did some more testing and wanted to merge.

@jonaswinkler
Copy link
Owner

And it works in all cases. The only thing you missed was the read_delay=1000, which is important for certain scenarios.

@jayme-github
Copy link
Contributor Author

any reason you switched to pyinotify? this is old and hasn't seen changes since 5 years. I just did some more testing and wanted to merge.

I was unable to get the tests green with inotify_simple/inotifyrecursive as read_delay does not seem to work reliably with recursive watchers (not getting any events at all) and disabling read_delay breaks test_slow_write_and_move for example. As this felt very fragile over all (as the tests sometimes do succeed locally) I wanted to see if pyinotify would make a difference (as it's not ctypes but a C extension).

@jonaswinkler
Copy link
Owner

jonaswinkler commented Nov 30, 2020

Since it applies to folders as well. make a folder, instantly copy a file, and it won't get picked up. not ideal, i've got some ideas on how to make that better, but for now, it works.

And inotify_simple has been used in this project for a very long time, and people didnt complain :)

@jayme-github
Copy link
Contributor Author

Since it applies to folders as well. make a folder, instantly copy a file, and it won't get picked up. not ideal, i've got some ideas on how to make that better, but for now, it works.

Yeah. That's why I wanted to see if pyinotify would handle that any better (e.g. create the sub-watch immediately)

And inotify_simple has been used in this project for a very long time, and people didnt complain :)

Sure, but neither with recursive mode nor with read_delay. ;-)

Anyways. I pushed back the inotifyrecursive version plus a sleep in the test case after creating the directories. Maybe the read_delay could be lowered quite a bit. As it seems it's mostly useful during move, a shorter period should be fine as well.

The names of sub directories in the consumer directory will be added as
tags for the document to be consumed.
To enable this, set:
PAPERLESS_CONSUMER_RECURSIVE=1
PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS=1

Fixes the-paperless-project#50
@jonaswinkler
Copy link
Owner

Well, you see what i was trying to test in that test case. Some scanners like to do that. Write files to file.~df, and move to file.pdf when done. At some point I'll write a check that curates supported file extensions from registered parsers and checks against that, which would remove the need for read_relay.

@jayme-github
Copy link
Contributor Author

Well, you see what i was trying to test in that test case. Some scanners like to do that. Write files to file.~df, and move to file.pdf when done. At some point I'll write a check that curates supported file extensions from registered parsers and checks against that, which would remove the need for read_relay.

Oh, okay. I did not recognize that as another scanner quirk. :) In that case the longer timeout does make sense ofc.

@jonaswinkler jonaswinkler merged commit c5dbd7a into jonaswinkler:dev Nov 30, 2020
mweimerskirch pushed a commit to mweimerskirch/paperless-ng that referenced this pull request Feb 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants