New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Delay consumption after MODIFY inotify events #4626
Conversation
Hello @frozenbrain, thank you very much for submitting this PR to us! This is what will happen next:
Please allow up to 7 days for an initial review. We're all very excited about new pull requests but we only do this as a hobby. |
Thanks for the PR. Makes sense to me but I’ll leave this one for the backend folks |
Ps. I’d be fine with this going into beta |
Based on the test results, this is adding the file multiple times to the task queue, not just once. |
I just realized that I ran my local tests in the wrong branch, sorry about that. Anyways you're right @stumpylog, MODIFY events now add the files to the consumption queue before the file is actually closed which isn't much better (or arguably even worse). I've modified the PR so that now files waiting to be consumed are removed from the notified_files dict after MODIFY events. This should delay consumption until after the next CLOSE_WRITE event. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #4626 +/- ##
=======================================
Coverage 96.01% 96.01%
=======================================
Files 371 371
Lines 14153 14155 +2
Branches 1110 1110
=======================================
+ Hits 13589 13591 +2
Misses 560 560
Partials 4 4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me now. I've targeted this to beta instead
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns. |
Proposed change
Fujitsu recently released a firmware update for their iX1600 scanner allowing you to scan directly to a network folder without the need for a PC. Currently Paperless delays consumption after CLOSE_WRITE and MOVED_TO inotify events, however Fujitsu's implementation of network scanning seems to open and close the PDF file multiple times (with a longer open-write-close phase at the end) which causes Paperless to try to consume the file prematurely.
This PR proposes to add the MODIFY event to the list of delaying inotify events.
To better visualize the issue, I've attached two logs of inotifywait during a scan of the same (physical) document. I have PAPERLESS_CONSUMER_INOTIFY_DELAY set to 3 seconds in my Paperless installation. In before.txt you can see Paperless trying to read the file (L122) 3 seconds after the last CLOSE_WRITE event (L76) while the scanner is still writing data. In after.txt you can see Paperless correctly waiting until the file has been fully written.
Type of change
Checklist:
pre-commit
hooks, see documentation.