Feature: Dynamic document storage paths #916

mskg · 2022-05-10T15:31:27Z

Proposed change

What I like about Paperless is the flexibility to provide a file-based document structure, as this allows for easy consumption of the documents with file-based replication. But, the "one size fits all" solution with PAPERLESS_FILENAME_FORMAT is not enough.

The change is simple:

Added storage pathes, where each entry is a different PAPERLESS_FILENAME_FORMAT
Each document get's a storage path assigned via existing matching mechanisms (all, exact, regular, fuzzy, auto), but is adjustable by the user at any time
file_handling.py has been modified to use the per document filename format
The implementation is complete, also adding bulk-actions, filtering, proposals to the UI and admin interface. I hope I didn't miss anything

Usage

I use specific type of storage layouts for different purposes:

Everything related to warranty claims is a flat directory with file names like Warranty/2022-04 Correspondent Title.pdf
Insurance communication go into Insurance/Correspondent/..., whereas related receipts get sorted by year Insurance/Correspondent/Year/...

This also allows easy file-based consumption of related documents. The maintenance/assignment gets a no-brainer with auto learning enabled.

Compatibility

The change is non-intrusive and fully backward compatible.

Not using storage pathes or not having one assigned forces the system into using PAPERLESS_FILENAME_FORMAT.

Known limitations

If an existing storage path get's adjusted, this can only be fixed by running document_renamer again
There is a breaking change implemented regarding file naming. The implementation removes empty pathes and parts of the filename that would otherwise resolve to none. This is feature-flagged with PAPERLESS_FILENAME_REMOVE_NONE, defaulting to No and such being backward compatible with the existing behaviour.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Other (please explain)

Checklist:

I have read & agree with the contributing guidelines.
If applicable, I have tested my code for new features & regressions on both mobile & desktop devices, using the latest version of major browsers.
If applicable, I have checked that all tests pass, see documentation.
I have run all pre-commit hooks, see documentation.
I have made corresponding changes to the documentation as needed.
I have checked my modifications for any breaking changes.

mskg · 2022-05-10T15:42:17Z

I add the missing documentation part, once you're willing to integrate this.

shamoon · 2022-05-10T15:48:18Z

Thanks for the PR! We will take a look asap. In the meantime this should be targeted to dev and rebased on that branch.

mskg · 2022-05-10T15:49:43Z

Thanks for the PR! We will take a look asap. In the meantime this should be targeted to dev and rebased on that branch.

Sure. Please take a look and give an indication if you're willing to integrate. If yes, I retarget it - should not be a big thing.

shamoon · 2022-05-10T23:33:16Z

This was closed automatically because it was targeting that outdated branch, I will change it to dev

Edit: still needs a re-base

mskg · 2022-05-12T16:13:21Z

@shamoon I merged with dev

shamoon · 2022-05-12T16:24:19Z

@shamoon I merged with dev

Thank you, but I think it needs a re-base, theres still a lot of changes that arent yours.

As for the PR itself, I was waiting for others to chime in. Personally, I dont think this is a feature I would use. My preference is to just store docs in some simple way and let paperless handle things from there rather than complicate the file storage. I bet there are some that would, perhaps a vocal group, but IMHO if its not going to get very wide use I dont think I would be in favor of including it as-is since its so prominent, etc (in that case perhaps it could be made optional or something like that?).

Obviously its not like we have telemetry data or something to have a real clear answer for this, so again, I'd love to hear others' thoughts.

shamoon · 2022-05-19T19:03:29Z

The docs look good so far, thanks for writing those.

I don't think the document_renamer moves existing documents into new storage paths. Am I missing something there or is that unimplemented?

I think it does? E.g. if you edit a storage path with this feature it doesnt move immediately but then running document_renamer does move it.

mskg · 2022-05-19T19:05:09Z

I> I think it does? E.g. if you edit a storage path with this feature it doesnt move immediately but then running document_renamer does move it.

It does - as you described it. It always compares the calculated path with db values and acts if required. It only touches files with non-compliant pathes. The implementation is unchanged.

qcasey · 2022-05-19T19:14:20Z

if you edit a storage path with this feature it doesnt move immediately but then running document_renamer does move it.

Right, I see that (emphasis mine). However I don't see the documents move when they are uploaded before the Storage Path rule. For example:

Upload a document containing the words "test"
Create a Storage Path rule matching the word "test"
Run the document_renamer
Notice the document is not renamed

The same document is matched and renamed according to the Storage Path rule if I upload it after the rule is created.

mskg · 2022-05-19T19:19:51Z

if you edit a storage path with this feature it doesnt move immediately but then running document_renamer does move it.

Right, I see that (emphasis mine). However I don't see the documents move when they are uploaded before the Storage Path rule. For example:

Upload a document containing the words "test"

Create a Storage Path rule matching the word "test"

Run the document_renamer

Notice the document is not renamed

The same document is matched and renamed according to the Storage Path rule if I upload it after the rule is created.

This is only available for tags - the command is called document_retagger.

I would propose to not implement this, as this would require conflict resolution - priorities of rules. That’s not the case for tags - there can be many.

That’s probably also why it doesn’t exist for correspondents, and types, too.

qcasey · 2022-05-19T19:20:43Z

~~Also we should probably add how this feature handles precedence to the docs and UI.~~

~~A Document can have multiple Correspondents/Tags/Document Types but only one Storage Path. This feature appears to pick the oldest rule and apply that, regardless of more specific rules created later.~~

Nevermind, I see your comment above. That's news to me, I was under the misconception we could have multiple.

mskg · 2022-05-19T19:25:17Z

Nevermind, I see your comment above. That's news to me, I was under the misconception we could have multiple.

I saw many comments of people wanting to have many correspondents (sender, receivers). Definitely worth thinking about 😉

qcasey

Overall this is much more straightforward than I initially expected. Of course I'll defer to shamoon on the remaining UI stuff but I've tested this a bit and I personally don't have any more concerns.

src-ui/src/app/components/document-list/bulk-editor/bulk-editor.component.ts

shamoon · 2022-05-19T21:02:28Z

Ok great, Im pretty happy with these changes, I do think they were important, appreciate your patience. Also after spending some time with this PR I can see how it will probably be useful to others.

My only outstanding concern is about validating the input. On the one hand users can make their own bad decisions =) on the other hand we might want to help prevent people making unintended mistakes. The link to the docs / example in the hint is helpful but especially for variables I still think we should validate them, for example if someone typed {cerated_year}. I added a very crude test for this, basically it tries to format the string the way generate_filename does with dummy replacements just so that it can catch a KeyError and then the frontend can show a helpful error, see screenshot.

Really, thats it! Let me know if theres a better way to do the validation, ofc.

mskg · 2022-05-19T21:16:20Z

Really, thats it! Let me know if theres a better way to do the validation, ofc.

Syntax wise {bad variables} are the only real error that would result in the whole formula not being used. If we validate that - 90% of all errors.

Possible addition: Show him a formatted example while he types. Than we might also catch the unexpected result.

Really appreciate your work 👍

shamoon · 2022-05-19T21:37:16Z

Yes that would be cool or I even thought about some kind of visual drag-and-drop tool to build the path but that seems like another PR.

As long as you Python folks are OK with my validate_path implementation, then as they say. LGTM 😄

* Added devcontainer * Add feature storage pathes * Exclude tests and add versioning * Check escaping * Check escaping * Check quoting * Echo * Escape * Escape : * Double escape \ * Escaping * Remove if * Escape colon * Missing \ * Esacpe : * Escape all * test * Remove sed * Fix exclude * Remove SED command * Add LD_LIBRARY_PATH * Adjusted to v1.7 * Updated test-cases * Remove devcontainer * Removed internal build-file * Run pre-commit * Corrected flak8 error * Adjusted to v1.7 * Updated test-cases * Corrected flak8 error * Adjusted to new plural translations * Small adjustments due to code-review backend * Adjusted line-break * Removed PAPERLESS prefix from settings variables * Corrected style change due to search+replace * First documentation draft * Revert changes to Pipfile * Add sphinx-autobuild with keep-outdated * Revert merge error that results in wrong storage path is evaluated * Adjust styles of generated files ... * Adds additional testing to cover dynamic storage path functionality * Remove unnecessary condition * Add hint to edit storage path dialog * Correct spelling of pathes to paths * Minor documentation tweaks * Minor typo * improving wrapping of filter editor buttons with new storage path button * Update .gitignore * Fix select border radius in non input-groups * Better storage path edit hint * Add note to edit storage path dialog re document_renamer * Add note to bulk edit storage path re document_renamer * Rename FILTER_STORAGE_DIRECTORY to PATH * Fix broken filter rule parsing * Show default storage if unspecified * Remove note re storage path on bulk edit * Add basic validation of filename variables Co-authored-by: Markus Kling <markus@markus-kling.net> Co-authored-by: Trenton Holmes <holmes.trenton@gmail.com> Co-authored-by: Michael Shamoon <4887959+shamoon@users.noreply.github.com> Co-authored-by: Quinn Casey <quinn@quinncasey.com>

github-actions · 2023-04-17T04:03:18Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.

mskg requested review from a team as code owners May 10, 2022 15:31

mskg changed the title ~~Feature document storage~~ Feature: Dynamic document storage pathes May 10, 2022

qcasey deleted the branch paperless-ngx:dev May 10, 2022 23:11

qcasey closed this May 10, 2022

shamoon reopened this May 10, 2022

shamoon changed the base branch from v1.7.1 to dev May 10, 2022 23:33

Markus Kling added 17 commits May 12, 2022 16:26

Added devcontainer

fba1ce2

Add feature storage pathes

fc4ca04

Exclude tests and add versioning

ceff28b

Check escaping

cad8528

Check escaping

dac7ffa

Check quoting

1843d47

Echo

2904bdb

Escape

4497f85

Escape :

9ddec93

Double escape \

9edfa34

Escaping

78e3f2a

Remove if

e48f759

Escape colon

ecec24d

Missing \

156b367

Esacpe :

68ad326

Escape all

ff98f08

test

98b3301

Fix select border radius in non input-groups

d441b89

shamoon added 2 commits May 19, 2022 12:21

Better storage path edit hint

33e1507

Add note to edit storage path dialog re document_renamer

143c0ff

qcasey approved these changes May 19, 2022

View reviewed changes

Add note to bulk edit storage path re document_renamer

97bd3d4

mskg commented May 19, 2022

View reviewed changes

src-ui/src/app/components/document-list/bulk-editor/bulk-editor.component.ts Outdated Show resolved Hide resolved

shamoon added 5 commits May 19, 2022 12:43

Rename FILTER_STORAGE_DIRECTORY to PATH

a59df96

Fix broken filter rule parsing

06073da

Show default storage if unspecified

7fe06be

Remove note re storage path on bulk edit

8b8dbb4

Add basic validation of filename variables

28f7e1f

shamoon approved these changes May 19, 2022

View reviewed changes

shamoon merged commit 69ef26d into paperless-ngx:dev May 19, 2022

shamoon changed the title ~~Feature: Dynamic document storage pathes~~ Feature: Dynamic document storage paths May 19, 2022

shamoon added this to the Next Release milestone May 20, 2022

shamoon mentioned this pull request Oct 11, 2022

Fix: creating new storage path on document edit fails to update menu #1777

Merged

7 tasks

shamoon mentioned this pull request Feb 17, 2023

[BUG] Filesystem path not updating after changing storage path #2695

Closed

shamoon deleted the feature-document-storage branch February 17, 2023 17:24

github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Dynamic document storage paths #916

Feature: Dynamic document storage paths #916

mskg commented May 10, 2022 •

edited

mskg commented May 10, 2022

shamoon commented May 10, 2022

mskg commented May 10, 2022

shamoon commented May 10, 2022 •

edited

mskg commented May 12, 2022

shamoon commented May 12, 2022

shamoon commented May 19, 2022

mskg commented May 19, 2022 •

edited

qcasey commented May 19, 2022

mskg commented May 19, 2022 •

edited

qcasey commented May 19, 2022 •

edited

mskg commented May 19, 2022

qcasey left a comment

shamoon commented May 19, 2022 •

edited

mskg commented May 19, 2022

shamoon commented May 19, 2022

github-actions bot commented Apr 17, 2023

Feature: Dynamic document storage paths #916

Feature: Dynamic document storage paths #916

Conversation

mskg commented May 10, 2022 • edited

Proposed change

Usage

Compatibility

Known limitations

Type of change

Checklist:

mskg commented May 10, 2022

shamoon commented May 10, 2022

mskg commented May 10, 2022

shamoon commented May 10, 2022 • edited

mskg commented May 12, 2022

shamoon commented May 12, 2022

shamoon commented May 19, 2022

mskg commented May 19, 2022 • edited

qcasey commented May 19, 2022

mskg commented May 19, 2022 • edited

qcasey commented May 19, 2022 • edited

mskg commented May 19, 2022

qcasey left a comment

Choose a reason for hiding this comment

shamoon commented May 19, 2022 • edited

mskg commented May 19, 2022

shamoon commented May 19, 2022

github-actions bot commented Apr 17, 2023

mskg commented May 10, 2022 •

edited

shamoon commented May 10, 2022 •

edited

mskg commented May 19, 2022 •

edited

mskg commented May 19, 2022 •

edited

qcasey commented May 19, 2022 •

edited

shamoon commented May 19, 2022 •

edited