-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
organize: simplified default folder naming #842
Conversation
papis.id.compute_an_id and DB infrastructure produce UIDs more robustly
paths.py functions no longer have to take files lists as arguments. However, the database now must be aware of a document before invoking get_document_folder. This means running doc.save() without first running doc.set_folder() will fail due to an unset info file path. From what I can tell all user interfaces call set_folder even for metadata-only entries. However, it is common to use from_data in the code. I've added a warning to the docstring but ideally the folder arg should not be optional.
Thanks a lot for this PR, @pmiam! A quick question: does this by default lead to folder names that are just md5 hashes? I'm not sure I'm a big fan of this as it makes the folder names not very human-readable (in fact, I think even the current default isn't great as it starts with the hash, and i think folders being organised by e.g. author family name would be more intuitive). Of course, this can be changed in the settings and doesn't present a problem as long as one uses the |
Thank you for the warm welcome!
It does, although it also makes it so the My personal preference is for machine-managed, flat directory trees. Frankly, that makes the most sense to me as a default, and I think it encourages use of the Papis tui or another front end, but I see your point. Especially as I was writing this, I noticed that a lot of work has been done to make human readable file names very expressive and robust to collision. I've made sure to leave that unaffected. We could set a human friendly template as the default configuration easily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can further simplify the changes to papis.commands.add
, but overall this looks like a good direction!
I agree with @jghauser that we should try for a better default folder name: HASHHASHHASH-author-author
was not the worst, but just using the MD5 hash looks very opaque. I think Zotero also stores files on disk in some storage/PIMHYJGK
type folder and it's very hard to find things without the interface.
@pmiam @jghauser Maybe the default could be something based on time-added
and not papis_id
? (both keys should always be available) E.g. {doc[time-added]}-{doc[author]}
?
This is a very intriguing idea! It would avoid collisions and easily allows sorting by a very useful property (a bit of a shame that it would lead to oldest at the top by default, though i guess that cannot be avoided). What I personally use is |
No worries, just reverse it with
Hm, I think we have two issues here
We can probably make that another PR too. In this PR, it's sufficient to move |
Totally agree, I'm gonna respond in the issue that you opened! |
This writes the info.yaml as early as possible. papis_id is also available in the ref template. Note: computing papis_id so early means unique _deterministic_ hashes can not be guaranteed!. _Random_ uids are 100% guaranteed.
This reverts commit 1154713.
This reverts commit db6a2f9.
Co-authored-by: Alex Fikl <alexfikl@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few small nitpicks around the logger messages mostly, but this should be good to go! ❤️
Co-authored-by: Alex Fikl <alexfikl@gmail.com>
Co-authored-by: Alex Fikl <alexfikl@gmail.com>
…log. Co-authored-by: Alex Fikl <alexfikl@gmail.com>
c0ced16
to
083c270
Compare
I'll get right on that, sorry. I've been working between celebrations today. Happy Easter! |
No worries, it's fixed and I'll merge as soon as the tests pass!
Happy Easter! |
get_document_hash_folder
compute_an_id
One test needed to be updated (02fa5a6) to accommodate these changes.
A side effect of these changes are that the path creating functions which used to be independent of the database now need the database to know about the document before they're called. I think that sounds worse than it is, but let me know your thoughts. With the updated test, everything passes. Also, I manually tested the
use-cache = False
case and it seems to work.Also, I also exposed a (possibly pre-existing) problem where if doc.save() is called before doc.set_folder(), the save fails due to an unset info file path.
These changes might lead to improved duplicate checking being discussed at issue #841