Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move metadata from info.json files to DB #148

Open
hkalexling opened this issue Jan 14, 2021 · 7 comments
Open

Move metadata from info.json files to DB #148

hkalexling opened this issue Jan 14, 2021 · 7 comments
Labels
enhancement New feature or request rfc Request for Comments

Comments

@hkalexling
Copy link
Member

hkalexling commented Jan 14, 2021

Background

Currently, all metadata except tags are stored in the info.json files in each title folder. The data includes reading progress, sorting options, custom cover images, and custom display names. The reason why we use info.json can be found here #37 (comment).

The Issue

  1. Reading and writing to the info.json files are much slower than a proper DB
  2. Some data like tags and thumbnails can only be stored in DB, so when the library is renamed or moved, the data would be lost (see [Bug Report] Lost Tags after library root directory change #146)
  3. Some users might want to keep their library unchanged

Proposed Solution

We can have two tables in DB:

==========
TITLES
----------
id
path
signature
==========
==========
TITLE_INFO
----------
id
tags
progress
... other info of the title
==========

In the TITLES table, signature is a (mostly) unique value for a title. We can calculate it using the following procedure:

  1. Get all entries in the title (a list of cbz/cbr files)
  2. Get the file sizes of the entries as an array and sort it
  3. Join the array as a long string
  4. Calculate the CRC32 checksum of the string, and use that as the signature of the title

If anything in the title changes, the checksum would likely change as well.

On library scan, if a title's path and signature match a row in the TITLES table, we assign the corresponding id to the title, and it can then retrieve its information from the TITLE_INFO table. If a title's signature matches the DB record, but the path doesn't (or the other way around), we still use the id, and we update the unmatched field to the correct value. In this way, even if a title is moved or renamed, we can still match it in the DB because its signature is still the same.

Conclusion

This issue serves as an RFC, so any comments and suggestions are welcome!

@hkalexling hkalexling added enhancement New feature or request help wanted Extra attention is needed labels Jan 14, 2021
@hkalexling hkalexling pinned this issue Jan 14, 2021
@Leeingnyo
Copy link
Member

Sounds good to me.

Proposed fault tolerance of matching same titles:

  • directories
    • allow to be moved, renamed
    • not allowing any updates (except renamed) of nested contents if moved, renamed
  • files: allow to be moved, renamed

There might be other requirements, but I think this tolerance is enough to use, since people usually move or rename entire root titles. Above all, this prevent to generate thumbnails repeatedly! 😄

by the way, the calculated signatures are cached automatically?

@hkalexling
Copy link
Member Author

@Leeingnyo Thanks for the feedback! I took some time to implement this (not pushed yet), and I am leaning towards simply using the inode numbers as the signatures for both titles and entries. On most file systems, the inode number of a file/folder is preserved when the file is moved, renamed, or even edited.

Some operations that would cause the inode number to change:

  • Reboot/remount on some file systems
  • Replaced with a copied file
  • Moved to a different device

But since we are also comparing the file paths, we won't lose information as long as the above changes do not happen together with a file/folder rename, with no library scan in between.

The difference between using the inode number and the original plan mentioned above is that the inode number stays the same even when the file/folder content changes, but I think this is not an issue.

The inode number and filesize/modification date are metadata, and reading them is very fast, so I don't think we need to cache the signatures. I tested it a bit, and the scanning time does not appear to be much longer. But I am not sure how this would affect the scanning performance for network-mounted drives (see #118), so I would need to test this a bit before releasing the changes.

Again, feel free to let me know what you think!

@Leeingnyo
Copy link
Member

oh I see! Then it has more generous fault tolerance. Great!
You mean that signature of titles, entries equals inode number of directories, files (directly gotten from a single node, no nested jobs), right? not as wrote in dev branch

@hkalexling
Copy link
Member Author

Oh I should have made it clearer that for titles we do generate the signatures recursively: https://github.com/hkalexling/Mango/blob/5779d225f6afece178aa5a8785f34045e84a4253/src/util/signature.cr#L10-L51

@hkalexling
Copy link
Member Author

Update:

With the new metadata and library caching features in v0.24.0, Mango can handle large libraries pretty well, so we don't desperately need this feature any more. I am keeping this open so maybe we can revisit it someday.

@hkalexling hkalexling changed the title [Plan/RFC] Move metadata from info.json files to DB Move metadata from info.json files to DB Mar 19, 2022
@afknst
Copy link

afknst commented Apr 18, 2022

Please have a look at #295

@hkalexling
Copy link
Member Author

Yeah good point the JSON files are less resilient than the DB. Let me see what we can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request rfc Request for Comments
Projects
None yet
Development

No branches or pull requests

3 participants