Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MMap based chunk cache and disk based caching #406

Merged

Conversation

felixbuenemann
Copy link
Collaborator

This changes the chunk cache to use MMap for allocating cache chunks.

This significantly reduces plexdrive memory usage, since most allocations bypass the Go garbage collector.

If persistent chunk cache is enabled (--chunk-disk-cache), the cache is persisted to the file or device specified by --chunk-file (defaults to "chunks.dat" in the config directory).

If no persistent cache is used, plexdrive will use anonymous mmap and will efficiently use the swap file as disk cache, if the size of the chunk cache exceeds the amount of free memory.

The chunk cache uses crc32 to verify the integrity of cache chunks to prevent against corruption, eg. if plexdrive crashes. The checksum will be verified lazily once a cached chunk is loaded.

Another change is that plexdrive now signals changed files to fuse, so it can enable kernel caching of open files. This means that files cached in memory by the kernel can be accessed with nearly no cpu usage by plexdrive.

Since it is possible for files on google drive to change, plexdrive now downloads them using the cached RevisionID of the drive object, which prevents chunks from different versions of the file to be mixed up. So you always get the version that corresponds to the cached metadata in the cache.bolt.

Since additional metadata is stored per file, the cache.bolt in the configuration directory should be deleted, when upgrading to this new version of plexdrive.

The code for this was mostly written two years ago and has been used successfully in production since then. I've rebased the code to the current master and did some quick testing to ensure nothing broke. I've kept the original commits since the sheer number of changes would have been a major effort to compress into more focused commits. So there are quite a few changes that are replaced by later commits in the feature branch.

Resolves #360

We didn't raise the index version when swapping the chunk fields, so
this swaps the field by guessing the existing field order. Should be
removed once code is table.
and skip storing pageToken if it hasn't changed.
This avoids enconding/decodind and unneeded copies, but it makes the
chunk cache file endianess dependent. To detect endianess difference the
PD magic bytes are not stores as uint16 so the flip to DP if endianess
has changes and the cache becomes invalid.
Note that resizing is currently not supported, since we lack information
to discover the last position of the journal.
Instead of creating a separate mmap region for each chunk this maps one
or multiple regions at the maximum allowed size and creates the chunks
as slices from these mappings.

This should in theory also work with 32-Bit Linux systems and LFS
support, were a single mmap region will be limited to about 2GB, so
multiple mappings are used if the chunk file is bigger than 2GB.
This isn't needed, since we're at EOF anyways, so nothing after the toc
needs to be aligned and the size itself doesn't have to be aligned.
This should ensure consistent downloads, since it was previously
possible that the cached MD5 sum of a recently changed object was used
as the chunk cache / request id for the download. By also using the
cached revision id, this should no longer be a problem.
This ensures that drive objects have stable node hashes, enables keeping
kernel buffer caches across file opens and invalidates the buffer cache
when changes are detected.

Also added code to ensure the root node is a folder.
Since both the persistent cache to the chunk-file and the non-persistent
cache use mmap, the naming was confusing.
@felixbuenemann felixbuenemann merged commit 420a892 into plexdrive:master Jan 30, 2022
@felixbuenemann
Copy link
Collaborator Author

felixbuenemann commented Jan 31, 2022

There's more detailed info on the cache storage data structure in #389.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Disk Cache for Plexdrive v5
1 participant